public inbox for [email protected]  
help / color / mirror / Atom feed
Query plan getting less efficient over time with frequent updates and deletes..
5+ messages / 2 participants
[nested] [flat]

* Query plan getting less efficient over time with frequent updates and deletes..
@ 2024-09-12 22:51 Wells Oliver <[email protected]>
  2024-09-12 23:47 ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Ron Johnson <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Wells Oliver @ 2024-09-12 22:51 UTC (permalink / raw)
  To: pgsql-admin

Hi all: we have a table which receives frequent daily updates and deletes
on the order of 100-600k. The overall row length is approximately 80m. This
table has 50 indexes and 303 columns and is quite frequently queried by
humans and applications.

I've been in the habit of using pg_repack maybe once a month on this table
because I can't quite figure out why querying gets bogged down. The vacuum
and analyze thresholds are set such that the table is both auto vacuumed
and analyzed every night.

My questions/concerns:

- this gets vacuumed and analyzed due to auto thresholds every night, so
it's surprising that the planner's choices would change and choose a worse
plan right?
- I look at pg_stat_user_tables to asses n_dead_tup along with n_tup_del
and n_tup_upd to try and assess if there is table bloat and this seems to
be under control
- I look at pgstattuple to check free_space and try and guess how much is
being given back to the server for new rows, it's roughly 20-28% and it
seems OK?
- what other factors are at play here? index bloat? disk IO? etc? something
I can eyeball better in pg_stat_user_tables, or pgstattuple?

How can I make this table behave properly, with all of its updates, and
without restoring to frequent pg_repack?

The much faster version of the query plan does a bitmap heap scan with a
bitmap index scan. The slower version just does an index scan on the same
index.

Here are the EXPLAIN ANALYZE VERBOSE results of a routine SELECT query with
a few columns in a WHERE clause. I've abbreviated these otherwise it would
just be too much text, glad to share more. After a fresh pg_repack, the
former is performed for a while (like two weeks) then it reverts to the
latter and everyone is unhappy.

Fast, better!

Sort  (cost=20279.85..20281.77 rows=768 width=380) (actual
time=218.563..219.431 rows=7892 loops=1)
  Output: p.season, p.game_date, ... p.pitch_seq
  Sort Key: p.game_date DESC, p.game_bam_id, ((p.at_bat_index + 1)),
p.pitch_seq
  Sort Method: quicksort  Memory: 4215kB
  ->  Bitmap Heap Scan on stats.pitches p  (cost=58.49..20243.04 rows=768
width=380) (actual time=11.319..167.570 rows=7892 loops=1)
        Output: p.season, p.game_date, ... p.pitch_seq
        Recheck Cond: (p.batter_common_key = 2181560)
        Filter: ((p.pitch_type IS NOT NULL) AND p.is_pitch AND
(COALESCE(p.pitch_code, p.pitch_result) IS NOT NULL) AND (p.season = ANY
('{2024,2023,2022}'::integer[])) AND (p.pitch_type <> ALL
('{UN,XX,PO,IB,AB}'::text[])) AND (p.game_type = ANY
('{R,F,D,L,W,C,S}'::text[])))
        Rows Removed by Filter: 2824
        Heap Blocks: exact=10327
        ->  Bitmap Index Scan on pitches_batter_common_key_idx
 (cost=0.00..58.26 rows=5026 width=0) (actual time=6.625..6.625 rows=10716
loops=1)
              Index Cond: (p.batter_common_key = 2181560)
Query Identifier: -9080545043453138058
Planning Time: 3.358 ms
Execution Time: 224.082 ms

Slow, bad!

Sort  (cost=20381.86..20383.74 rows=753 width=381) (actual
time=6344.784..6345.568 rows=7892 loops=1)
  Output: p.season, p.game_date, ... p.pitch_seq
  Sort Key: p.game_date DESC, p.game_bam_id, ((p.at_bat_index + 1)),
p.pitch_seq
  Sort Method: quicksort  Memory: 4215kB
  ->  Index Scan using pitches_batter_common_key_idx on stats.pitches p
 (cost=0.60..20345.88 rows=753 width=381) (actual time=6.765..6294.745
rows=7892 loops=1)
        Output: p.season, p.game_date, ... p.pitch_seq
        Index Cond: (p.batter_common_key = 2181560)
        Filter: ((p.pitch_type IS NOT NULL) AND p.is_pitch AND
(COALESCE(p.pitch_code, p.pitch_result) IS NOT NULL) AND (p.season = ANY
('{2024,2023,2022}'::integer[])) AND (p.pitch_type <> ALL
('{UN,XX,PO,IB,AB}'::text[])) AND (p.game_type = ANY
('{R,F,D,L,W,C,S}'::text[])))
        Rows Removed by Filter: 2824
Query Identifier: -9080545043453138058
Planning Time: 3.197 ms
Execution Time: 6349.608 ms

-- 
Wells Oliver
[email protected] <[email protected]>


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Query plan getting less efficient over time with frequent updates and deletes..
  2024-09-12 22:51 Query plan getting less efficient over time with frequent updates and deletes.. Wells Oliver <[email protected]>
@ 2024-09-12 23:47 ` Ron Johnson <[email protected]>
  2024-09-12 23:56   ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Wells Oliver <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Ron Johnson @ 2024-09-12 23:47 UTC (permalink / raw)
  To: pgsql-admin

On Thu, Sep 12, 2024 at 6:52 PM Wells Oliver <[email protected]> wrote:

> Hi all: we have a table which receives frequent daily updates and deletes
> on the order of 100-600k. The overall row length is approximately 80m. This
> table has 50 indexes and 303 columns and is quite frequently queried by
> humans and applications.
>
> I've been in the habit of using pg_repack maybe once a month on this table
> because I can't quite figure out why querying gets bogged down. The vacuum
> and analyze thresholds are set such that the table is both auto vacuumed
> and analyzed every night.
>

1. You're absolutely positive that the VACUUM and ANALYZE complete every
night?
2. Nightly may not be often enough.


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Query plan getting less efficient over time with frequent updates and deletes..
  2024-09-12 22:51 Query plan getting less efficient over time with frequent updates and deletes.. Wells Oliver <[email protected]>
  2024-09-12 23:47 ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Ron Johnson <[email protected]>
@ 2024-09-12 23:56   ` Wells Oliver <[email protected]>
  2024-09-13 02:41     ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Ron Johnson <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Wells Oliver @ 2024-09-12 23:56 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-admin

Yes, I regularly look at pg_stat_user_tables and in particular
last_autovacuum and last_autoanalyze and these are always the current date
(or within two days) after our nightly processes soon finish.

I wondered if the similar low planning time but the dissimilar longer
execution time might indicate rows are spread out over disk, thereby
negating a bitmap heap scan and the slower query taking longer due to
having to read a lot more disk? Is that a possibility?



On Thu, Sep 12, 2024 at 4:47 PM Ron Johnson <[email protected]> wrote:

> On Thu, Sep 12, 2024 at 6:52 PM Wells Oliver <[email protected]>
> wrote:
>
>> Hi all: we have a table which receives frequent daily updates and deletes
>> on the order of 100-600k. The overall row length is approximately 80m. This
>> table has 50 indexes and 303 columns and is quite frequently queried by
>> humans and applications.
>>
>> I've been in the habit of using pg_repack maybe once a month on this
>> table because I can't quite figure out why querying gets bogged down. The
>> vacuum and analyze thresholds are set such that the table is both auto
>> vacuumed and analyzed every night.
>>
>
> 1. You're absolutely positive that the VACUUM and ANALYZE complete every
> night?
> 2. Nightly may not be often enough.
>
>

-- 
Wells Oliver
[email protected] <[email protected]>


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Query plan getting less efficient over time with frequent updates and deletes..
  2024-09-12 22:51 Query plan getting less efficient over time with frequent updates and deletes.. Wells Oliver <[email protected]>
  2024-09-12 23:47 ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Ron Johnson <[email protected]>
  2024-09-12 23:56   ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Wells Oliver <[email protected]>
@ 2024-09-13 02:41     ` Ron Johnson <[email protected]>
  2024-09-13 16:55       ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Wells Oliver <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Ron Johnson @ 2024-09-13 02:41 UTC (permalink / raw)
  To: pgsql-admin

On Thu, Sep 12, 2024 at 7:56 PM Wells Oliver <[email protected]> wrote:

> Yes, I regularly look at pg_stat_user_tables and in particular
> last_autovacuum and last_autoanalyze and these are always the current date
> (or within two days) after our nightly processes soon finish.
>

"Or within two days".  I used to think that was adequate, but now I vacuum
and analyze some tables multiple times a day.

1.5% autovacuum_X_scale_factor and 200 autovacuum_X_threshold is required
on some tables.

Because there's sooo many indices on that table, you might have to manually
vacuum it with a pretty high PARALLEL value.


> I wondered if the similar low planning time but the dissimilar longer
> execution time might indicate rows are spread out over disk, thereby
> negating a bitmap heap scan and the slower query taking longer due to
> having to read a lot more disk? Is that a possibility?
>

It was 30 years ago.  Modern (like ext2 and newer) filesystems purposefully
spread files across devices.


> On Thu, Sep 12, 2024 at 4:47 PM Ron Johnson <[email protected]>
> wrote:
>
>> On Thu, Sep 12, 2024 at 6:52 PM Wells Oliver <[email protected]>
>> wrote:
>>
>>> Hi all: we have a table which receives frequent daily updates and
>>> deletes on the order of 100-600k. The overall row length is approximately
>>> 80m. This table has 50 indexes and 303 columns and is quite frequently
>>> queried by humans and applications.
>>>
>>> I've been in the habit of using pg_repack maybe once a month on this
>>> table because I can't quite figure out why querying gets bogged down. The
>>> vacuum and analyze thresholds are set such that the table is both auto
>>> vacuumed and analyzed every night.
>>>
>>
>> 1. You're absolutely positive that the VACUUM and ANALYZE complete every
>> night?
>> 2. Nightly may not be often enough.
>>
>>
>
> --
> Wells Oliver
> [email protected] <[email protected]>
>


-- 
Death to America, and butter sauce.
Iraq lobster!


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Query plan getting less efficient over time with frequent updates and deletes..
  2024-09-12 22:51 Query plan getting less efficient over time with frequent updates and deletes.. Wells Oliver <[email protected]>
  2024-09-12 23:47 ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Ron Johnson <[email protected]>
  2024-09-12 23:56   ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Wells Oliver <[email protected]>
  2024-09-13 02:41     ` Re: Query plan getting less efficient over time with frequent updates and deletes.. Ron Johnson <[email protected]>
@ 2024-09-13 16:55       ` Wells Oliver <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Wells Oliver @ 2024-09-13 16:55 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-admin

Well, hmm, does it stand to reason that if it's just a stale plan where
vacuum and/or analyze is needed, I could execute a VACUUM ANALYZE rather
than doing a full pg_repack and it should solve the issue, right? I could
try that next time and see if the query plan reverts to the faster plan.

On Thu, Sep 12, 2024 at 7:42 PM Ron Johnson <[email protected]> wrote:

> On Thu, Sep 12, 2024 at 7:56 PM Wells Oliver <[email protected]>
> wrote:
>
>> Yes, I regularly look at pg_stat_user_tables and in particular
>> last_autovacuum and last_autoanalyze and these are always the current date
>> (or within two days) after our nightly processes soon finish.
>>
>
> "Or within two days".  I used to think that was adequate, but now I vacuum
> and analyze some tables multiple times a day.
>
> 1.5% autovacuum_X_scale_factor and 200 autovacuum_X_threshold is required
> on some tables.
>
> Because there's sooo many indices on that table, you might have to
> manually vacuum it with a pretty high PARALLEL value.
>
>
>> I wondered if the similar low planning time but the dissimilar longer
>> execution time might indicate rows are spread out over disk, thereby
>> negating a bitmap heap scan and the slower query taking longer due to
>> having to read a lot more disk? Is that a possibility?
>>
>
> It was 30 years ago.  Modern (like ext2 and newer) filesystems
> purposefully spread files across devices.
>
>
>> On Thu, Sep 12, 2024 at 4:47 PM Ron Johnson <[email protected]>
>> wrote:
>>
>>> On Thu, Sep 12, 2024 at 6:52 PM Wells Oliver <[email protected]>
>>> wrote:
>>>
>>>> Hi all: we have a table which receives frequent daily updates and
>>>> deletes on the order of 100-600k. The overall row length is approximately
>>>> 80m. This table has 50 indexes and 303 columns and is quite frequently
>>>> queried by humans and applications.
>>>>
>>>> I've been in the habit of using pg_repack maybe once a month on this
>>>> table because I can't quite figure out why querying gets bogged down. The
>>>> vacuum and analyze thresholds are set such that the table is both auto
>>>> vacuumed and analyzed every night.
>>>>
>>>
>>> 1. You're absolutely positive that the VACUUM and ANALYZE complete every
>>> night?
>>> 2. Nightly may not be often enough.
>>>
>>>
>>
>> --
>> Wells Oliver
>> [email protected] <[email protected]>
>>
>
>
> --
> Death to America, and butter sauce.
> Iraq lobster!
>


-- 
Wells Oliver
[email protected] <[email protected]>


^ permalink  raw  reply  [nested|flat] 5+ messages in thread


end of thread, other threads:[~2024-09-13 16:55 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-09-12 22:51 Query plan getting less efficient over time with frequent updates and deletes.. Wells Oliver <[email protected]>
2024-09-12 23:47 ` Ron Johnson <[email protected]>
2024-09-12 23:56   ` Wells Oliver <[email protected]>
2024-09-13 02:41     ` Ron Johnson <[email protected]>
2024-09-13 16:55       ` Wells Oliver <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox