Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sosfz-00EUR2-Ds for pgsql-admin@arkaria.postgresql.org; Thu, 12 Sep 2024 22:52:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sosfy-002c4L-Pc for pgsql-admin@arkaria.postgresql.org; Thu, 12 Sep 2024 22:52:34 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sosfy-002c4D-9I for pgsql-admin@lists.postgresql.org; Thu, 12 Sep 2024 22:52:34 +0000 Received: from mail-lf1-x136.google.com ([2a00:1450:4864:20::136]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sosfu-000s3a-MN for pgsql-admin@postgresql.org; Thu, 12 Sep 2024 22:52:32 +0000 Received: by mail-lf1-x136.google.com with SMTP id 2adb3069b0e04-5365c512b00so2003248e87.3 for ; Thu, 12 Sep 2024 15:52:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726181548; x=1726786348; darn=postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=MsW7sMCx92oiPSBf/oZHnbxyqcXHstUf5h3ZJd31y3c=; b=gRV0rmy2P8O25iGjuKqWhMBfpZmO9ljjU0KBR82RCSaTNz8jXa5VYth7BURhjCRMto V/j9pF/p2dac3cAH5rTBHqzSGbF7zpXyb4OwJX5f2vQzd0UPj1Lgl3PDF7w2JzK4M9PN vJL+0BXavjWrNrtxPEvgE1BTjk0eBBdSBoiKyE0aHGqKNdemozdhLt3KCOrBdULc8mgu kH5C6Dp2xLyB7PU9h1/8vV+wLMkZkJFpagVjnz/4oNPYF3C3fku1QlFCnN/Fo3k5WfAp 2svXc2xIBsbWXIjP3pCzXTDaVb1Zq87gcEjQH0NGNr8qoqRBwGabMvkiC+65ijGjaHpK hlxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726181548; x=1726786348; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=MsW7sMCx92oiPSBf/oZHnbxyqcXHstUf5h3ZJd31y3c=; b=p7vruR3TkSFL+QEbgyx48pVU1r+t5R2kF8C/1EaXgV6JcgyEb1pDDTppcqP3k8VkUt 0Oz2yMcr4rrLvpzyIUdKjPCBbq/pcHUaJD6DaUa4lmi5Pc3lG4L3P+3HfIh+5SEk8LPd IT8ooX2TVyMOQbrUSIIjksEy9fPStgWnNUAp6bBHfvuWuNFAByrDRHlrz3P6pB1nYgj5 cStjDzSoly/yhCONrvH8eUSlMm+ZOmNAZMGk+AxF9Cq0+tDQt9qQehUBtX3xcmMFLupS jjXITdqMCE/zr+MvzbpkoIlmkPgyCXYBoXD/uEcs0hYemUtS38pbhRzfDD7qO8+1hyLt HWgQ== X-Gm-Message-State: AOJu0YyfG3CxQm9JcOoVxeRrYMnRRxja5Bc1Z8p1FgcqtqLJy/wNmBwy /C5f4BDgZMcx/p1rPPaI7I2QYMj+cSbKIPK2R0lLCVHz2TJ3eBjfynFwsThABkGAhd7aPLG8qES fQWPPYjLLTCVJbzGwPw3k6yAy9aPT67wK X-Google-Smtp-Source: AGHT+IERXzvYZqtJSDtwukdM5eSHjI5p25IkoxbvgaZQuGeYT4+7sxw+qmHXr1ZnSZwe3sCEUEjM5hmAFOoV0F434TE= X-Received: by 2002:a05:6512:10d4:b0:52e:9b68:d2da with SMTP id 2adb3069b0e04-53678fb4635mr2728596e87.9.1726181547815; Thu, 12 Sep 2024 15:52:27 -0700 (PDT) MIME-Version: 1.0 From: Wells Oliver Date: Thu, 12 Sep 2024 15:51:50 -0700 Message-ID: Subject: Query plan getting less efficient over time with frequent updates and deletes.. To: pgsql-admin Content-Type: multipart/alternative; boundary="000000000000f9c0590621f3f507" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f9c0590621f3f507 Content-Type: text/plain; charset="UTF-8" Hi all: we have a table which receives frequent daily updates and deletes on the order of 100-600k. The overall row length is approximately 80m. This table has 50 indexes and 303 columns and is quite frequently queried by humans and applications. I've been in the habit of using pg_repack maybe once a month on this table because I can't quite figure out why querying gets bogged down. The vacuum and analyze thresholds are set such that the table is both auto vacuumed and analyzed every night. My questions/concerns: - this gets vacuumed and analyzed due to auto thresholds every night, so it's surprising that the planner's choices would change and choose a worse plan right? - I look at pg_stat_user_tables to asses n_dead_tup along with n_tup_del and n_tup_upd to try and assess if there is table bloat and this seems to be under control - I look at pgstattuple to check free_space and try and guess how much is being given back to the server for new rows, it's roughly 20-28% and it seems OK? - what other factors are at play here? index bloat? disk IO? etc? something I can eyeball better in pg_stat_user_tables, or pgstattuple? How can I make this table behave properly, with all of its updates, and without restoring to frequent pg_repack? The much faster version of the query plan does a bitmap heap scan with a bitmap index scan. The slower version just does an index scan on the same index. Here are the EXPLAIN ANALYZE VERBOSE results of a routine SELECT query with a few columns in a WHERE clause. I've abbreviated these otherwise it would just be too much text, glad to share more. After a fresh pg_repack, the former is performed for a while (like two weeks) then it reverts to the latter and everyone is unhappy. Fast, better! Sort (cost=20279.85..20281.77 rows=768 width=380) (actual time=218.563..219.431 rows=7892 loops=1) Output: p.season, p.game_date, ... p.pitch_seq Sort Key: p.game_date DESC, p.game_bam_id, ((p.at_bat_index + 1)), p.pitch_seq Sort Method: quicksort Memory: 4215kB -> Bitmap Heap Scan on stats.pitches p (cost=58.49..20243.04 rows=768 width=380) (actual time=11.319..167.570 rows=7892 loops=1) Output: p.season, p.game_date, ... p.pitch_seq Recheck Cond: (p.batter_common_key = 2181560) Filter: ((p.pitch_type IS NOT NULL) AND p.is_pitch AND (COALESCE(p.pitch_code, p.pitch_result) IS NOT NULL) AND (p.season = ANY ('{2024,2023,2022}'::integer[])) AND (p.pitch_type <> ALL ('{UN,XX,PO,IB,AB}'::text[])) AND (p.game_type = ANY ('{R,F,D,L,W,C,S}'::text[]))) Rows Removed by Filter: 2824 Heap Blocks: exact=10327 -> Bitmap Index Scan on pitches_batter_common_key_idx (cost=0.00..58.26 rows=5026 width=0) (actual time=6.625..6.625 rows=10716 loops=1) Index Cond: (p.batter_common_key = 2181560) Query Identifier: -9080545043453138058 Planning Time: 3.358 ms Execution Time: 224.082 ms Slow, bad! Sort (cost=20381.86..20383.74 rows=753 width=381) (actual time=6344.784..6345.568 rows=7892 loops=1) Output: p.season, p.game_date, ... p.pitch_seq Sort Key: p.game_date DESC, p.game_bam_id, ((p.at_bat_index + 1)), p.pitch_seq Sort Method: quicksort Memory: 4215kB -> Index Scan using pitches_batter_common_key_idx on stats.pitches p (cost=0.60..20345.88 rows=753 width=381) (actual time=6.765..6294.745 rows=7892 loops=1) Output: p.season, p.game_date, ... p.pitch_seq Index Cond: (p.batter_common_key = 2181560) Filter: ((p.pitch_type IS NOT NULL) AND p.is_pitch AND (COALESCE(p.pitch_code, p.pitch_result) IS NOT NULL) AND (p.season = ANY ('{2024,2023,2022}'::integer[])) AND (p.pitch_type <> ALL ('{UN,XX,PO,IB,AB}'::text[])) AND (p.game_type = ANY ('{R,F,D,L,W,C,S}'::text[]))) Rows Removed by Filter: 2824 Query Identifier: -9080545043453138058 Planning Time: 3.197 ms Execution Time: 6349.608 ms -- Wells Oliver wells.oliver@gmail.com --000000000000f9c0590621f3f507 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi = all: we have a table which receives frequent daily updates and deletes on t= he order of 100-600k. The overall row length is approximately 80m. This tab= le has 50 indexes and 303 columns and is quite frequently queried by humans= and applications.

I'= ve been in the habit of using pg_repack maybe once a month on this table be= cause I can't quite figure out why querying gets bogged down. The vacuu= m and analyze thresholds are set such that the table is both auto vacuumed = and analyzed every night.

My questions/concerns:

- this gets vacuumed and analyzed due to= auto thresholds every night, so it's surprising that the planner's= choices would change and choose a worse plan right?
- I look at pg_stat_user_tables to as= ses n_dead_tup along with n_tup_del and n_tup_upd to try and assess if ther= e is table bloat and this seems to be under control
- I look at=C2=A0pgstattuple to check = free_space and try and guess how much is being given back to the server for= new rows, it's roughly 20-28% and it seems OK?
- what other factors are at play here?= index bloat? disk IO? etc? something I can eyeball better in pg_stat_user_= tables, or pgstattuple?

H= ow can I make this table behave properly, with all of its updates, and with= out restoring to frequent pg_repack?

The m= uch faster version of the query plan does a bitmap heap scan with a bitmap = index scan. The slower version just does an index scan on the same index.
<= /div>

Here are the EXPLAIN ANALYZE VERBOSE results = of a routine SELECT query=C2=A0with a few columns in a WHERE clause. I'= ve abbreviated these otherwise it would just be too much text, glad to shar= e more. After a fresh pg_repack, the former is performed for a while (like = two weeks) then it reverts to the latter and everyone is unhappy.

Fast, better!

Sort =C2= =A0(cost=3D20279.85..20281.77 rows=3D768 width=3D380) (actual time=3D218.56= 3..219.431 rows=3D7892 loops=3D1)
=C2=A0 Output: p.season, p.game_date, = ... p.pitch_seq
=C2=A0 Sort Key: p.game_date DESC, p.game_bam_id, ((p.at= _bat_index + 1)), p.pitch_seq
=C2=A0 Sort Method: quicksort =C2=A0Memory= : 4215kB
=C2=A0 -> =C2=A0Bitmap Heap Scan on stats.pitches p =C2=A0(c= ost=3D58.49..20243.04 rows=3D768 width=3D380) (actual time=3D11.319..167.57= 0 rows=3D7892 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Output: p.season, p= .game_date, ... p.pitch_seq
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Recheck Cond: (p= .batter_common_key =3D 2181560)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Filter: ((p.= pitch_type IS NOT NULL) AND p.is_pitch AND (COALESCE(p.pitch_code, p.pitch_= result) IS NOT NULL) AND (p.season =3D ANY ('{2024,2023,2022}'::int= eger[])) AND (p.pitch_type <> ALL ('{UN,XX,PO,IB,AB}'::text[]= )) AND (p.game_type =3D ANY ('{R,F,D,L,W,C,S}'::text[])))
=C2=A0= =C2=A0 =C2=A0 =C2=A0 Rows Removed by Filter: 2824
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 Heap Blocks: exact=3D10327
=C2=A0 =C2=A0 =C2=A0 =C2=A0 -> =C2= =A0Bitmap Index Scan on pitches_batter_common_key_idx =C2=A0(cost=3D0.00..5= 8.26 rows=3D5026 width=3D0) (actual time=3D6.625..6.625 rows=3D10716 loops= =3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Index Cond: (p.ba= tter_common_key =3D 2181560)
Query Identifier: -9080545043453138058
P= lanning Time: 3.358 ms
Execution Time: 224.082 ms

<= br>
Slow, bad!

Sort =C2=A0(cost=3D20381.86..20383.74 rows=3D753 width=3D381) = (actual time=3D6344.784..6345.568 rows=3D7892 loops=3D1)
=C2=A0 Output: = p.season, p.game_date, ... p.pitch_seq
=C2=A0 Sort Key: p.game_date DESC= , p.game_bam_id, ((p.at_bat_index + 1)), p.pitch_seq
=C2=A0 Sort Method:= quicksort =C2=A0Memory: 4215kB
=C2=A0 -> =C2=A0Index Scan using pitc= hes_batter_common_key_idx on stats.pitches p =C2=A0(cost=3D0.60..20345.88 r= ows=3D753 width=3D381) (actual time=3D6.765..6294.745 rows=3D7892 loops=3D1= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Output: p.season, p.game_date, ... p.pitch= _seq
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Index Cond: (p.batter_common_key =3D 21= 81560)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Filter: ((p.pitch_type IS NOT NULL) A= ND p.is_pitch AND (COALESCE(p.pitch_code, p.pitch_result) IS NOT NULL) AND = (p.season =3D ANY ('{2024,2023,2022}'::integer[])) AND (p.pitch_typ= e <> ALL ('{UN,XX,PO,IB,AB}'::text[])) AND (p.game_type =3D A= NY ('{R,F,D,L,W,C,S}'::text[])))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Row= s Removed by Filter: 2824
Query Identifier: -9080545043453138058
Plan= ning Time: 3.197 ms
Execution Time: 6349.608 ms
<= br>
--
--000000000000f9c0590621f3f507--