public inbox for [email protected]  
help / color / mirror / Atom feed
Best way to check if a table is empty
6+ messages / 3 participants
[nested] [flat]

* Best way to check if a table is empty
@ 2025-03-24 03:42 Marcelo Fernandes <[email protected]>
  2025-03-24 04:03 ` Re: Best way to check if a table is empty David G. Johnston <[email protected]>
  2025-03-24 04:14 ` Re: Best way to check if a table is empty Christophe Pettus <[email protected]>
  0 siblings, 2 replies; 6+ messages in thread

From: Marcelo Fernandes @ 2025-03-24 03:42 UTC (permalink / raw)
  To: [email protected]

Hi folks,

I came up with three strategies to verify whether a table is empty.

I wanted to sound the community to check whether my assumptions are correct for
each of these strategies, and to also discuss which strategy is best.

## Strategy 1 [possibly best?]

SELECT EXISTS (SELECT 1 FROM foo LIMIT 1);

Pros:
  1. Works with any table.
  2. Relatively fast (if the table is well organised).

Cons:
  1. Sequential Scan
  2. If the table is bloated, it reads more buffers.

## Strategy 2

SELECT min(id) FROM foo;

Pros:
  1. Does an index-only scan on a field that presumably has a PK index.
  2. Works well even if the table is bloated.

Cons:
  1. Sequential Scan if the table does not have a PK index.
  2. Reads a few more buffers than Strategy 1 when the table is well organised.
  3. Performs worse if the index is bloated.

## Strategy 3 [worst]

SELECT count(*) FROM foo;

Pros:
  1. Uses a widespread and intuitive operation (count)

Cons:
  1. Very slow on large tables as it performs a Sequential Scan.


How does all of that sound? Are there further strategies I should consider?
Anything I have missed in the Strategies above?

Regards,
Marcelo.






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Best way to check if a table is empty
  2025-03-24 03:42 Best way to check if a table is empty Marcelo Fernandes <[email protected]>
@ 2025-03-24 04:03 ` David G. Johnston <[email protected]>
  2025-03-24 04:15   ` Re: Best way to check if a table is empty David G. Johnston <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: David G. Johnston @ 2025-03-24 04:03 UTC (permalink / raw)
  To: Marcelo Fernandes <[email protected]>; +Cc: [email protected] <[email protected]>

On Sunday, March 23, 2025, Marcelo Fernandes <[email protected]> wrote:

> Hi folks,
>
> I came up with three strategies to verify whether a table is empty.


3 is strictly terrible worse to answer “is live row count > 0”.

Using an index likely serves no/negative benefit since it contains no tuple
liveness information and you now are doing more buffers and lookups (IOW,
your claims about pro/con for min(id) require assumptions you haven’t
stipulated apply here).  Maybe IOS helps though I do wonder whether a
sequential scan skips over known all-dead pages making that relative
benefit go away.

So, absent data and an idea of why, just start looking at heap pages until
you find a live tuple, then stop.  Exists already optimizes for early stop,
the limit is pointless.

Checking table statistics first is probably a net-positive for many use
cases.

But maybe rethink your data and processing models if doing this check on
large bloated tables is what your existing choices have led to.

David J.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Best way to check if a table is empty
  2025-03-24 03:42 Best way to check if a table is empty Marcelo Fernandes <[email protected]>
  2025-03-24 04:03 ` Re: Best way to check if a table is empty David G. Johnston <[email protected]>
@ 2025-03-24 04:15   ` David G. Johnston <[email protected]>
  2025-03-24 04:16     ` Re: Best way to check if a table is empty Christophe Pettus <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: David G. Johnston @ 2025-03-24 04:15 UTC (permalink / raw)
  To: Marcelo Fernandes <[email protected]>; +Cc: [email protected] <[email protected]>

On Sunday, March 23, 2025, David G. Johnston <[email protected]>
wrote:

>
> Maybe IOS helps though I do wonder whether a sequential scan skips over
> known all-dead pages making that relative benefit go away.
>

Well, no, since it tracks known visible, not known non-visible, though for
something like a fast exists test with no where clause if the VM had any 1
bits the answer to the exists would be yes.  No idea if we take that
shortcut.

David J.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Best way to check if a table is empty
  2025-03-24 03:42 Best way to check if a table is empty Marcelo Fernandes <[email protected]>
  2025-03-24 04:03 ` Re: Best way to check if a table is empty David G. Johnston <[email protected]>
  2025-03-24 04:15   ` Re: Best way to check if a table is empty David G. Johnston <[email protected]>
@ 2025-03-24 04:16     ` Christophe Pettus <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Christophe Pettus @ 2025-03-24 04:16 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: Marcelo Fernandes <[email protected]>; [email protected] <[email protected]>



> On Mar 23, 2025, at 21:15, David G. Johnston <[email protected]> wrote:
> 
> No idea if we take that shortcut.

I remember looking into that not too long ago, and the answer's no.







^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Best way to check if a table is empty
  2025-03-24 03:42 Best way to check if a table is empty Marcelo Fernandes <[email protected]>
@ 2025-03-24 04:14 ` Christophe Pettus <[email protected]>
  2025-03-24 04:24   ` Re: Best way to check if a table is empty David G. Johnston <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: Christophe Pettus @ 2025-03-24 04:14 UTC (permalink / raw)
  To: Marcelo Fernandes <[email protected]>; +Cc: [email protected]



> On Mar 23, 2025, at 20:42, Marcelo Fernandes <[email protected]> wrote:
> Cons:
>  1. Sequential Scan
>  2. If the table is bloated, it reads more buffers.

These concerns probably sound worse than they really are.  Assuming the table is being vacuumed reliably, and is receiving inserts, those inserts will tend to be at the start of the table, and so you'll hit a live tuple pretty fast.  There are pathological cases where it will take a long time (someone just delete a DELETE FROM t; from a huge table and the entire thing is dead tuples but no free space), but that's relatively unlikely to happen in a real-world system.

> Pros:
>  1. Does an index-only scan on a field that presumably has a PK index.
>  2. Works well even if the table is bloated.

#1 is not necessarily true.  You're never guaranteed an index-only scan; PostgreSQL may decide that the visibility map information indicates that a regular index scan (which means heap fetches to check visibility) will still happen.
 
#2 is not necessarily true either; it's pretty much false in the same situation that a sequential scan for the first tuple will be slow.  Indexes get bloated, too, and if there are a large number of dead unreclaimed tuples, there are also going to be a lot of dead index entries for it to walk over.

(And to be pedantic, #1 returns TRUE / FALSE while #2 returns <int> / NULL, so they aren't exactly equivalent.  If you are able to handle <int> / NULL, you don't need the EXISTS clause.)

> SELECT count(*) FROM foo;

There's no situation in which this will be faster than #1 or #2.





^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Best way to check if a table is empty
  2025-03-24 03:42 Best way to check if a table is empty Marcelo Fernandes <[email protected]>
  2025-03-24 04:14 ` Re: Best way to check if a table is empty Christophe Pettus <[email protected]>
@ 2025-03-24 04:24   ` David G. Johnston <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: David G. Johnston @ 2025-03-24 04:24 UTC (permalink / raw)
  To: Christophe Pettus <[email protected]>; +Cc: Marcelo Fernandes <[email protected]>; [email protected] <[email protected]>

On Sunday, March 23, 2025, Christophe Pettus <[email protected]> wrote:

>
> (And to be pedantic, #1 returns TRUE / FALSE while #2 returns <int> /
> NULL, so they aren't exactly equivalent.  If you are able to handle <int> /
> NULL, you don't need the EXISTS clause.)
>

Leaving the limit in place, without exists #1 returns either an empty set
or an <int>, never NULL.

Always returning a Boolean seems like a better API choice though; but
standardizing on that final transform doesn’t change the base comparison.

David J.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread


end of thread, other threads:[~2025-03-24 04:24 UTC | newest]

Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-24 03:42 Best way to check if a table is empty Marcelo Fernandes <[email protected]>
2025-03-24 04:03 ` David G. Johnston <[email protected]>
2025-03-24 04:15   ` David G. Johnston <[email protected]>
2025-03-24 04:16     ` Christophe Pettus <[email protected]>
2025-03-24 04:14 ` Christophe Pettus <[email protected]>
2025-03-24 04:24   ` David G. Johnston <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox