public inbox for [email protected]help / color / mirror / Atom feed
A way to optimize sql about the last temporary-related row 3+ messages / 2 participants [nested] [flat]
* A way to optimize sql about the last temporary-related row @ 2024-06-27 15:20 [email protected] <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: [email protected] @ 2024-06-27 15:20 UTC (permalink / raw) To: [email protected] Hello everyone, Sorry to bother you but I have a query that is driving me crazy. I need to have the last valid record at a temporal level according to a specific parameter. First some data: Linux Rocky 8.10 environment, minimal installation (on VM KVM with Fedora 40). Postgresql 16.3, installed by official Postgresql guide. effective_cache_size = '1000 MB'; shared_buffers = '500 MB'; work_mem = '16MB'; The changes are deliberately minimal to be able to all to simulate the problem. Table script: CREATE TABLE test_table ( pk_id int NOT NULL, integer_field_1 int , integer_field_2 int, datetime_field_1 timestamp, primary key (pk_id) ) -- insert 4M records insert into test_table(pk_id) select generate_series(1,4000000,1); -- now set some random data, distribuited between specific ranges (as in my production table) update test_table set datetime_field_1 = timestamp '2000-01-01 00:00:00' + random() * (timestamp '2024-05-31 23:59:59' - timestamp '2000-01-01 00:00:00'), integer_field_1 = floor(random() * (6-1+1) + 1)::int, integer_field_2 = floor(random() * (200000-1+1) + 1)::int; -- indexes CREATE INDEX idx_test_table_integer_field_1 ON test_table(integer_field_1); CREATE INDEX xtest_table_datetime_field_1 ON test_table(datetime_field_1 desc); CREATE INDEX idx_test_table_integer_field_2 ON test_table(integer_field_2); --vacuum vacuum full test_table; Now the query: explain (verbose, buffers, analyze) with last_table_ids as materialized( select xx from ( select LAST_VALUE(pk_id) over (partition by integer_field_2 order by datetime_field_1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) xx from test_table where integer_field_1 = 1 and datetime_field_1 <= CURRENT_TIMESTAMP ) ww group by ww.xx ), last_row_per_ids as ( select tt.* from last_table_ids lt inner join test_table tt on (tt.pk_id = lt.xx) ) select * /* or count(*) */ from last_row_per_ids; This query, on my PC, takes 46 seconds!!! I was expecting about 2-3 seconds (according with my other queries in this table) but it seems that the xtest_table_datetime_field_1 index is not being used. Do you think there is a way to optimize the query? Thanks so much for the support, Agharta ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: A way to optimize sql about the last temporary-related row @ 2024-06-27 16:16 David Rowley <[email protected]> parent: [email protected] <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: David Rowley @ 2024-06-27 16:16 UTC (permalink / raw) To: [email protected]; +Cc: PostgreSQL General <[email protected]> On Fri, 28 Jun 2024, 3:20 am [email protected], <[email protected]> wrote: > > Now the query: > explain (verbose, buffers, analyze) > with last_table_ids as materialized( > select xx from ( > select LAST_VALUE(pk_id) over (partition by integer_field_2 order by > datetime_field_1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED > FOLLOWING) xx > from test_table > where integer_field_1 = 1 > and datetime_field_1 <= CURRENT_TIMESTAMP > ) ww group by ww.xx > > ), > last_row_per_ids as ( > select tt.* from last_table_ids lt > inner join test_table tt on (tt.pk_id = lt.xx) > > ) > > select * /* or count(*) */ from last_row_per_ids; > > > This query, on my PC, takes 46 seconds!!! > (Away from laptop and using my phone) Something like: select distinct on (integer_field_2) * from test_table where integer_field_1 = 1 and datetime_field_1 <= CURRENT_TIMESTAMP order by integer_field_2,datetime_field_1 desc; Might run a bit faster. However if it's slow due to I/O then maybe not much faster. Your version took about 5 seconds on my phone and my version ran in 1.5 seconds. It's difficult for me to check the results match with each query from my phone. A quick scan of the first 10 or so records looked good. If the updated query is still too slow on cold cache then faster disks might be needed. David > ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: A way to optimize sql about the last temporary-related row @ 2024-06-28 07:20 [email protected] <[email protected]> parent: David Rowley <[email protected]> 0 siblings, 0 replies; 3+ messages in thread From: [email protected] @ 2024-06-28 07:20 UTC (permalink / raw) To: David Rowley <[email protected]>; +Cc: PostgreSQL General <[email protected]> HOO-HA! This is HUGE! Only 2.2 seconds on my data!!!! Amazing! distinct on (field) *followed by "*" *is a hidden gem! Thank you so much and thanks to everyone who helped me! Thank you very much!! Cheers, Agharta Il 27/06/24 6:16 PM, David Rowley ha scritto: > > > On Fri, 28 Jun 2024, 3:20 am [email protected], > <[email protected]> wrote: > > > Now the query: > explain (verbose, buffers, analyze) > with last_table_ids as materialized( > select xx from ( > select LAST_VALUE(pk_id) over (partition by integer_field_2 > order by > datetime_field_1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED > FOLLOWING) xx > from test_table > where integer_field_1 = 1 > and datetime_field_1 <= CURRENT_TIMESTAMP > ) ww group by ww.xx > > ), > last_row_per_ids as ( > select tt.* from last_table_ids lt > inner join test_table tt on (tt.pk_id = lt.xx) > > ) > > select * /* or count(*) */ from last_row_per_ids; > > > This query, on my PC, takes 46 seconds!!! > > > (Away from laptop and using my phone) > > Something like: > > select distinct on (integer_field_2) * from test_table where > integer_field_1 = 1 and datetime_field_1 <= CURRENT_TIMESTAMP order by > integer_field_2,datetime_field_1 desc; > > Might run a bit faster. However if it's slow due to I/O then maybe > not much faster. Your version took about 5 seconds on my phone and my > version ran in 1.5 seconds. > > It's difficult for me to check the results match with each query from > my phone. A quick scan of the first 10 or so records looked good. > > If the updated query is still too slow on cold cache then faster disks > might be needed. > > David > ^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2024-06-28 07:20 UTC | newest] Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-06-27 15:20 A way to optimize sql about the last temporary-related row [email protected] <[email protected]> 2024-06-27 16:16 ` David Rowley <[email protected]> 2024-06-28 07:20 ` [email protected] <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox