Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vh2E4-008bxB-2q for pgsql-bugs@arkaria.postgresql.org; Sat, 17 Jan 2026 09:04:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vh2E2-0071Po-2q for pgsql-bugs@arkaria.postgresql.org; Sat, 17 Jan 2026 09:04:07 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vh2E2-0071PE-1q for pgsql-bugs@lists.postgresql.org; Sat, 17 Jan 2026 09:04:06 +0000 Received: from mail-ej1-x62f.google.com ([2a00:1450:4864:20::62f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vh2E0-000y2d-2n for pgsql-bugs@lists.postgresql.org; Sat, 17 Jan 2026 09:04:06 +0000 Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-b876bf5277dso569857266b.0 for ; Sat, 17 Jan 2026 01:04:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768640642; x=1769245442; darn=lists.postgresql.org; h=content-transfer-encoding:in-reply-to:autocrypt:cc:content-language :from:references:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=584gyPh7Zr3EOtgw5j/aqlNsn9HkQHPUzDJXIoeTDws=; b=i7gMketuk7s/PLXqhTa1mhWNrwkTDnPUefKxLLESfZF/WM7Yw+TIHxrnwxYT0yagtc FeZTFaG13gt5bp8h4hU02vFcS06EHDBYK9WuJqI3Kk67/FG5AUT1YMga33Y9KnXDa0vp D3LGDF0+m3npwe22rk1l5YiGMP6uUjqsemWh2SwhfgPHrP1CkeVSanb3Ia7O6cTy/I0c mMzvdyBt51PJa6RbAD/pJ970n7co/7QM+x49lfYz2Lzhs60eeP3npKEgFl/8+6bh7ODi RJIsOKkuIAfVp/yQZIADjGzpLNcal46Evs8wfDacr3tbI5JcxSwhbMxS4P3Pz1CF97HY GrMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768640642; x=1769245442; h=content-transfer-encoding:in-reply-to:autocrypt:cc:content-language :from:references:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=584gyPh7Zr3EOtgw5j/aqlNsn9HkQHPUzDJXIoeTDws=; b=uKqAUW1aa87VC+g1hfgoMvsZyxBtx8uPMvOjfa4JnJHprPrqVlb7EpgWkbBJhGsncK im+Ua8Odhfcn0lNazYISkfiOpxBxF7FsDEc87321f+Inq7Wjvsm9+YH+CEmM8Oz1A8TE Gmr182PbAwyu0pCLOSIjEcPiG5rjVrXjxg0EaiJ3WWXGb6mSNvK2K7Ur4rFLM29Xc4xW pLYhX1oOxTGDdj4LHIOFQc7IsVQ+o/h766smMsPuB4hA3rmmqhypvUOhbFFltfQXBXZw StweCDp/5s3sViPUc33whDbvSU15EyRuP0YGk/QTQzFcOM7HDTuhu2R8gO6fkE4NpsmE 28og== X-Gm-Message-State: AOJu0YztGRuI9w4/rxDvkekLGzzguNbWsFFzVaIkR7t3Zk4Smfs8gQCh U46BO8U0AYpIPMgebNxoTn2fix9eyQXzLOZystDJZuuCK32bmIjPWWWj X-Gm-Gg: AY/fxX6McrSw9Oqp1YpNKRJKpXuDgk3sGQkrhKZGSKZPYReAgsQjgbitlJatvBiHDFN NgzjeN3+xb16l1r6ea30qQwAjellv6ZPZa2K6DH6SqMbIJsLE21i5GbExVL6BGd8xWpWt58zu79 Pmz9mBfSrHp7nyJeFuPw9G2TswssaP7aLtUgWBhzQD0WdTE5uexqN8agq3GkyxUXxWwj1+DkXm0 0gevwZH3+98LrMO934FDRUSI25hb9MjLMJGaePgib49vpecJ3VYgaNGg5P5ZxaJnzmFreIg2aDw /umPYTEfbgk7CSAOmgmkb6nQ1v5OnU1wP14bHblcSYJGm8sTQJAmtxiegpkIpZfFFWdKmcx+Vqq 6xleA/GAisTah+iqsOZZ7hmtHy8dJiprmaQmY6GW6o7XTch40QN9Xac9qHy0wGMiuQS6+pu29Zf wsCENftNikL3UA4kmhUCezS7fkBr60CnjtAtJt+/VYcymtbaKIit+LeCHskODMJq8PixDmuL8= X-Received: by 2002:a17:907:3d92:b0:b87:8c1:1ea8 with SMTP id a640c23a62f3a-b8793857ba6mr505189966b.7.1768640641640; Sat, 17 Jan 2026 01:04:01 -0800 (PST) Received: from [192.168.188.100] (089144204191.atnat0013.highway.webapn.at. [89.144.204.191]) by smtp.googlemail.com with ESMTPSA id a640c23a62f3a-b8795a31317sm491562466b.65.2026.01.17.01.04.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 17 Jan 2026 01:04:01 -0800 (PST) Message-ID: Date: Sat, 17 Jan 2026 10:03:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: BUG #19373: One backend hanging in AioIoUringExecution blocking other backends To: surya poondla References: <19373-aac0a0ee0aac6a8b@postgresql.org> From: =?UTF-8?Q?Michael_Kr=C3=B6ll?= Content-Language: de-AT Cc: pgsql-bugs@lists.postgresql.org Autocrypt: addr=michael.kroell@gmail.com; keydata= xsFNBFQQshIBEACpevAxKp0fkn9uLvkrnV1tR7dGd5RgT0RW7gOjnzfOLSx26QZPjnsioGnx mtN3RPbqtxqo8g7z3dwimhXB9Xuo0hVR5q0jCEIvik0uEYdbyE4i8KFlsqxfbFgvX0P5s0NE FeidFNLxdxhdX0C2tGtt5aVNJpRNzl9/V36AmD2Hk/5PEGdIKvD9B25xDUCBhpkYAuFQdHPW Y3gtRBRwl1NjhGRTjx1MNGjv9uWs8AH8wseheIX7Tnye5fYuTNz4SphlguoaYoLJqnDR1YW3 vix11rhy2jUiaQpInjtdpsuK33KD1Khx2zMHVwCgRThj6m+MPSRXhKYxX2BbkY+6JTdSHqh+ MVEET2FKTWTYkFEm85dmNIkPBr0uW37tr78yDOJXf3gx1U93sVdSarWN2kXqIH6zdbg7ON/U ahFQBzKaf7hhqmDRBl2iap7xwjYqnoFj+NOgrvvWI99SVM5vdO1nRUaGy6IQd39Fr0nfi5OK ec86HCnawl14lRGohlBz1SBbiSz2op8k4ybsTnGLpVq2aaY2MJpaL0z18K04DbzXTXdSZnHN pl/KqyFbADR3HNJLNSK3WqtObTorYxmc3IojEHGI3uZ2O0aTuBRyqkPPdMu2uIwerLUk6Y6O WzojCoTWFvbAbFlqnmkMjdAVhnA+rOGcNVylhgVspkbRQKr0/wARAQABzSlNaWNoYWVsIEty w7ZsbCA8bWljaGFlbC5rcm9lbGxAZ21haWwuY29tPsLBdQQTAQIAHwIbAwQLBwMCAxUCAwMW AgECHgECF4AFAlQgNhcCGQEACgkQHDj7HFdsS0VvCw//a0tzkGpVzQtGhZcIzQNqI7c3sBcm hapNgtRxW9xWjrVX4MZgu+iyxKfOORsqMTIMAaluuOiXlmzyDSLHf+0bCehM1lb/3fHXiUjP HYXpvSSbe1XNqrrD/BXvzOOJwFMSGggkea8l++jEOkYJKSeW6xZcOQKxroy2L+n/H0injIkP lRiEE9RG6iS4wyy42gdZO5v6OLQwBo1SVw5h5fiIAmeqsrNDKiP+/MdLGAshwUu8O4QeCCOB cKanCKFGopWG28aZgw9owmKeLS9F4LmzfHKge0iakgZ2uUEMHVhoRZmoAlSOn8Ttf2eDMgbR F+l16ZnNsWfVUGg23MyvttDNcwFhLm0hKhrWzkO2uRaewZUHSLfebNAW23M/58psRIGovNFK y7Yky2XWBZBGkJnWOkrG3hVYwrTCE+EncV1HTKxyVh6S8S2gKmGl3A5faRi8TKRYywzohJbU O25+Qx7ELE5Rk5074piVwcxtltSfTb0XrQV67BPY0ax0LFYYBxrCYvKuhl9qzBfvPrttwKCO 5kI5WqNJeY6U3KgHRQd6O/2qSWg2KPU9wxTl+buZ99MY7z/rChlsjqXfoHbCg3wBrenGzOsz AQfgyDIJDRdHgyn1r1lBiEgrXiKPUliEXRgbkLqeR67PwnfB8SaQcpSSlgUEzTpue6+bN4S+ pyrH6NDOwU0EXW47YwEQAJ3csFunjw1jBTyfakzvq81kaquRMYmMNSp6Bfx/0TdaO7Q4vS88 /3oZpGfF+kCADPTSDGfUwOu+zYodSiKnsuCZP9/oEa3rQq8DwNmY5/Sf15o704Rfi/wT4Rvf H/iRgB1xT/FAvYzGiX41/ZGKb2FLprvURte04mgyCy9YW3M9TaoqxoyOskgAsu+EZ6/febAI a3Kd2kejPkKAKNelT48rQOLRQlXXjLjZRj7WLT9uITQHAKC3LwPh79YbobgTAfIKr779P5P6 /UTtdjCtEZOwFtQhbcCLc6ShWBXqcYll+xt6Z9E0x/bQZ6JRPel/PwgoEcITiz+c8A3JiRGH NSGIdb78dp2tyLAJHDbCDg8Wtep08F9cqVoSa7IgpJooMqWoSCMR4y7uavLPmgSVUrgEeiOd 4+gysqKHWqQyGwpmAr8hex/PeGl1MvDjciddsj3ykYtEVzaPfGQxZ2/EWO5SU+szJbp1u2VH jG7nyI0TSogUDdGYRN4ipUdBVmOIYpvVCXsd4j9p1EOv2lxQG6HQ43/HM8YrZC0rpnf6m5xB 60dxH0BdirRj5gn26yPtFAmUgkoTXalGti++MMHgcTmOrgstQRyh+96cPZyBqco32a0K+8M0 yOUC+IPYmhKFXDans2qgqhm7PjCXktWTcSklEFJhracw8S6RjT3/u6UBABEBAAHCwXYEGAEK ACAWIQS/0D5W0F763d3Js4EcOPscV2xLRQUCXW47YwIbDAAKCRAcOPscV2xLRekyD/9/4YuZ WHk4/qV4xmQpPzKi77YR39Uxo17uTEluGNkagInyjApd0WnZNERmtV0EGKIcLlbDj/ua3vmI dOz/MnCFv0sliGwMjo07+NnTFDM8bMqLC7yXb7/vXcNJjJ6xK2cVoZVm+ZeieQEruyEBzPO0 fvwefIRe2HLIPyrCodP/m7QRZWk3CcddrimKIe6Q4NNqkixHCebDRET4uwdY0tCPBpveVJXZ gaGKxgFKCJur2S9JjgIEZuLpmrq3tl3UC35nf4zwrqlLSORHIqU278jMBE+qahyIPzB0IsKh VpP+PmJeu/UBUo/pXoT4AHglcms7jMkrPPXvEEUf307oapG+jXd8uDye6Q1q6zm5wObuHah/ m5TjmK2VQ8JAWaKNzwYaR6RoujsuGHrYqptEg08wY18nhlYo597UEUb/h9o/kUjsbqb+dyWU TXiSO64/3GpnwTawSuzuMZRKYbKVYaEiJ4bSGz0lXU1d7Wx1HqxsjD55tzLhHzjnfBW4cKg3 WCWtjyszVqRprAgN+u2wlxno1EwcGoaMg6CP63cNp8H84eAe97/yn2w7U0J7VKhriDrCdCqV /qTmpUyY0Tg479ujcz7eVTydecGU8RIiOl/Ealhw6f6zGiBxW62zyglm7ZnKO4+jnGz0mTve Fhhxwv/ApGu4ZfhVRHkd8lOs9W2L0Q== In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 1/17/26 01:09, surya poondla wrote: > Restarting the cluster did not go through until the hanging leader > PID was > ``SIGKILL``ed > > Am I understanding this correctly as "Normal shutdown (SIGTERM or pg_ctl > stop) did not complete, and postmaster remained waiting on > until AioIoUringExecution was force killed" ? Remained waiting until the PG backend process in wait_event AioIoUringExecution was SIGKILLed. > I’m interested in digging into this and am wondering about the below > 1. What filesystem and storage was this instance running on Two of Disk model: INTEL SSDPE2KX010T7 Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes in a software RAID 1 on ext4 > 2. Was this a parallel sequential scan, was any index access involved? I not have the exact query plan for this one as it depends on the bind parameters passed and there are three ARRAY type bind parameters for the query in question. Typically the query plan looks like this: Gather (cost=1825.51..30731.85 rows=6391 width=678) Workers Planned: 2 -> Nested Loop (cost=825.51..29092.75 rows=2663 width=678) -> Parallel Bitmap Heap Scan on offer o (cost=825.09..22161.55 rows=2676 width=542) Recheck Cond: (id = ANY ('{3048845,2121345,2840302,2807790,3273743,2798121,2017850,3226237,1501236,2449122,2891576,2927727,3526960,3467910,2929690,3299523,3458918,2840304,1707208,2101471,245> Filter: (vfb_in_de AND ((loc)::text = ANY ('{de,at}'::text[]))) -> Bitmap Index Scan on offer_id_npr_lzf_idx (cost=0.00..822.62 rows=14268 width=0) Index Cond: (id = ANY ('{3048845,2121345,2840302,2807790,3273743,2798121,2017850,3226237,1501236,2449122,2891576,2927727,3526960,3467910,2929690,3299523,3458918,2840304,1707208,2101471> -> Index Scan using gh_haendler_pkey on gh_haendler (cost=0.42..2.58 rows=1 width=8) Index Cond: (h_id = o.h_id) Filter: (COALESCE(multimerchants_template_id, o.h_id) <> ALL ('{4957}'::integer[])) In the case of the problematic query/params/backend there were *two* workers with the associated leader PID found in pg_stat_activity. > 3. By any chance do you have a reproducible test case? Unfortunately not: We had this happily running in production for multiple weeks without issues on three identical machines and it happened once on one. We could not reproduce it on our development/testing machines. At the moment we have switched to io_method=worker which at least for the index driven use cases on those machines won't make a big difference. We could configure one of those three boxes again with io_uring but will not know ahead if we'll trigger this issue ever again and would need to have a solid specific monitoring in place before. At the same time, we've upgraded to Kernel 6.12 by now on those boxes and if the issue was related to an interaction dependant on the io_uring versions, this might be another reason, we'll likely not see the same issue again. > 4. Can you share what shared_preload_libraries you are using? pg_stat_statements is the only one used there. Thank you for having a look. Sorry for not being able to provide more specifics. BR, Michael > Regards, > Surya Poondla