public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Querying one partition in a function takes locks on all partitions
3+ messages / 2 participants
[nested] [flat]

* Re: Querying one partition in a function takes locks on all partitions
@ 2025-03-23 13:35 David Rowley <[email protected]>
  2025-03-28 17:00 ` Re: Querying one partition in a function takes locks on all partitions Evgeny Morozov <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: David Rowley @ 2025-03-23 13:35 UTC (permalink / raw)
  To: Evgeny Morozov <[email protected]>; +Cc: pgsql-general

On Sat, 22 Mar 2025 at 05:27, Evgeny Morozov
<[email protected]> wrote:
> select read_partition(1); -- This takes shared locks on entity_1 AND
> entity_2
>
> -- select count(*) from entity where part_id = 1; -- but this would only
> take a shared lock only on entity_1
>
> If another session tries something that takes an exclusive lock on
> another partition, like
>
> alter table entity_2 add column new_column text;

Is this just an example command?  You can't add a column to a
partition directly.

> I would expect that to be able to run concurrently, but it blocks due to
> the shared lock on entity_2. (The way I originally found the problem was
> the opposite: once one client took an exclusive lock on a partition many
> others were blocked from reading from ANY partition.)
>
> This seems like quite the "gotcha", especially when the query plan for
> the function call (logged via autoexplain) shows it only accessing one
> partition (entity_1). Is this expected behavior? If so, is it documented
> somewhere?

It is expected behaviour and, unfortunately, not really documented
anywhere outside of the source code.

What's going on is that PostgreSQL is creating a generic plan for your
query, that is, a plan that will work with any parameter value that
you give to your function. When the generic plan is executed and the
locks are taken for the relations mentioned in the query, and since
the plan is generic, it includes all partitions that could match any
possible parameter value you could pass. When the locks are taken,
it's not yet known which partition will be needed as the partition
pruning that occurs only does so after the locks are taken.

There has been work done for PostgreSQL 18 which swaps the order of
these operations and makes it so that the executor only obtains the
locks on the partitions which will actually be scanned. Hopefully,
we'll see that feature released with PostgreSQL 18 toward the end of
2025.

As for the documentation, it might be worth adding a mention of this
at the end of the following paragraph in [1]:

"During initialization of the query plan. Partition pruning can be
performed here for parameter values which are known during the
initialization phase of execution. Partitions which are pruned during
this stage will not show up in the query's EXPLAIN or EXPLAIN ANALYZE.
It is possible to determine the number of partitions which were
removed during this phase by observing the “Subplans Removed” property
in the EXPLAIN output."

Perhaps something like. "It's important to note that any partitions
removed by the partition pruning done at this time are still locked at
the beginning of execution".

David

[1] https://www.postgresql.org/docs/17/ddl-partitioning.html






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Querying one partition in a function takes locks on all partitions
  2025-03-23 13:35 Re: Querying one partition in a function takes locks on all partitions David Rowley <[email protected]>
@ 2025-03-28 17:00 ` Evgeny Morozov <[email protected]>
  2025-03-31 20:28   ` Re: Querying one partition in a function takes locks on all partitions David Rowley <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Evgeny Morozov @ 2025-03-28 17:00 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: pgsql-general

On 23/03/2025 2:35 pm, David Rowley wrote:
>> alter table entity_2 add column new_column text;
> Is this just an example command?  You can't add a column to a
> partition directly.

Yes, it was just the simplest way I could think of to take an exclusive
lock. But on this note: I guess it takes a lock before checking that the
table is partition because it might otherwise get attached/detached as a
partition between doing the check and taking a lock, but it would be
nice if it took a shared lock to do all checks, then upgraded to an
exclusive lock to actually do the work. Then the user would not have to
wait for an exclusive lock only for the command to fail.


> What's going on is that PostgreSQL is creating a generic plan for your
> query, that is, a plan that will work with any parameter value that
> you give to your function. When the generic plan is executed and the
> locks are taken for the relations mentioned in the query, and since
> the plan is generic, it includes all partitions that could match any
> possible parameter value you could pass. When the locks are taken,
> it's not yet known which partition will be needed as the partition
> pruning that occurs only does so after the locks are taken.

I see, thank you for the explanation. This seems like a bad plan,
though, because even at query preparation time it can be determined that
only one partition will need to be scanned, since the query filters on
the partition key, so it may need to read *any one* partition, but never
all partitions. So in this case, isn't it better to avoid caching a
generic plan at all? Even if the locking issue is fixed in PG 18, isn't
such a plan likely to be sub-optimal in other ways (for a more complex
query)? I don't know anything about the internals of the query planner,
but I have run into other performance issues with SQL functions querying
partitioned tables. One function we have *sometimes* uses very slow
sequential scans instead of index scans, and I've had to work around
that with "set enable_seqscan = off".

In this case, even "set plan_cache_mode = force_custom_plan" doesn't
help somehow. Isn't that supposed to... make PG use a custom a plan?

Here is another weird thing I forgot to menton in the original post: for
a set-returning function whether one partition table is locked or all of
them seems to depend on whether you "select func" or "select from func"!


create function read_partition_rows(which_part int) returns
table(part_id int) as
'select part_id from entity where part_id = which_part;'
language sql stable;

select * from read_partition_rows(1); -- This takes a lock only on entity_1
select read_partition_rows(1); -- but this takes locks on entity_1 and
entity_2!


> There has been work done for PostgreSQL 18 which swaps the order of
> these operations and makes it so that the executor only obtains the
> locks on the partitions which will actually be scanned. Hopefully,
> we'll see that feature released with PostgreSQL 18 toward the end of
> 2025.

Happy to hear that! I hope that makes it in, but in the meanwhile, yes,
it would be nice to at least document this gotcha and any workarounds
for it. The only one I've found is to use a pl/PgSQL function.







^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Querying one partition in a function takes locks on all partitions
  2025-03-23 13:35 Re: Querying one partition in a function takes locks on all partitions David Rowley <[email protected]>
  2025-03-28 17:00 ` Re: Querying one partition in a function takes locks on all partitions Evgeny Morozov <[email protected]>
@ 2025-03-31 20:28   ` David Rowley <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: David Rowley @ 2025-03-31 20:28 UTC (permalink / raw)
  To: Evgeny Morozov <[email protected]>; +Cc: pgsql-general

On Sat, 29 Mar 2025 at 06:00, Evgeny Morozov
<[email protected]> wrote:
>
> On 23/03/2025 2:35 pm, David Rowley wrote:
> >> alter table entity_2 add column new_column text;
> > Is this just an example command?  You can't add a column to a
> > partition directly.
>
> Yes, it was just the simplest way I could think of to take an exclusive
> lock. But on this note: I guess it takes a lock before checking that the
> table is partition because it might otherwise get attached/detached as a
> partition between doing the check and taking a lock, but it would be
> nice if it took a shared lock to do all checks, then upgraded to an
> exclusive lock to actually do the work. Then the user would not have to
> wait for an exclusive lock only for the command to fail.

Unfortunately, that's probably just swapping one problem for another.
Once you have 2 sessions following such a pattern of locking, you're
prone to unnecessary deadlocking.

For example:

-- session1
begin;
lock table t in access share mode; -- gets lock

-- session2
begin;
lock table t in access share mode; -- gets lock
lock table t in access exclusive mode; -- waits

-- session1
lock table t in access exclusive mode; -- deadlock

If you don't bother with the access share lock, there's no deadlock.

David






^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2025-03-31 20:28 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-23 13:35 Re: Querying one partition in a function takes locks on all partitions David Rowley <[email protected]>
2025-03-28 17:00 ` Evgeny Morozov <[email protected]>
2025-03-31 20:28   ` David Rowley <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox