Re: Record last SELECT on a row?

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Record last SELECT on a row?
8+ messages / 6 participants
[nested] [flat]

* Re: Record last SELECT on a row?
@ 2025-12-17 15:24  Ron Johnson <[email protected]>
  0 siblings, 2 replies; 8+ messages in thread

From: Ron Johnson @ 2025-12-17 15:24 UTC (permalink / raw)
  To: pgsql-general

On Wed, Dec 17, 2025 at 2:41 AM Matthias Leisi <[email protected]> wrote:

> An application (which we can’t change) is accessing some Postgres table,
> and we would like to record when the rows in that table were last read
> (meaning: appeared in a SELECT result). The ultimate goal would be that we
> can „age out“ rows which have not been accessed in a certain period of time.
>
> The table contains some ten thousand rows, five columns, and we already
> record created / last updated using triggers. Almost all accesses will
> result in zero, one or very few records returned. Given the modest size of
> the table, performance considerations are not top priority.
>
> If we had full control over the application, we could eg use a function to
> select the records and then update some „last read“ column. But since we
> don’t control the application, that’s not an option. On the other hand, we
> have full control over the database, so we could put some other „object“ in
> lieu of the direct table.
>
> Any other ways this could be achieved?
>

pgaudit might satisfy your needs, since it would only log SELECT statements
on that one table.  You'd still have to grep the log file, so the
information wouldn't be real-time, but that's *probably* not important.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Record last SELECT on a row?
@ 2025-12-17 16:25  Matthias Leisi <[email protected]>
  parent: Ron Johnson <[email protected]>
  1 sibling, 1 reply; 8+ messages in thread

From: Matthias Leisi @ 2025-12-17 16:25 UTC (permalink / raw)
  To: pgsql-general

> pgaudit might satisfy your needs, since it would only log SELECT statements on that one table.  You'd still have to grep the log file, so the information wouldn't be real-time, but that's probably not important.

That’s a viable suggestion, thanks a lot. Real-time is indeed not necessary, a daily (or even a weekly) cleaning of unused data is sufficient. pgaudit was anyway on the table for some other use cases, so that would fit in nicely.

Thanks,
— Matthias

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Record last SELECT on a row?
@ 2025-12-17 16:46  Joe Conway <[email protected]>
  parent: Matthias Leisi <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Joe Conway @ 2025-12-17 16:46 UTC (permalink / raw)
  To: Matthias Leisi <[email protected]>; pgsql-general

On 12/17/25 11:25, Matthias Leisi wrote:
> 
>> pgaudit might satisfy your needs, since it would only log SELECT 
>> statements on that one table.  You'd still have to grep the log file, 
>> so the information wouldn't be real-time, but that's /probably/ not 
>> important.
> 
> That’s a viable suggestion, thanks a lot. Real-time is indeed not 
> necessary, a daily (or even a weekly) cleaning of unused data is 
> sufficient. pgaudit was anyway on the table for some other use cases, so 
> that would fit in nicely.

Possibly try using/abusing RLS?

8<-----------------
psql test
psql (19devel)
Type "help" for help.

create table t1(c1 int, c2 text);
insert into t1 values(1,'a'),(2,'b'),(3,'c'),(42,'zp');
grant select on table t1 to public;

create table a1(c1 int, t1 timestamptz);
create or replace function audit(int)
returns bool as
$$
   insert into a1 values($1, now()) returning true
$$ security definer language sql;
create policy audit_t1 ON t1 for select using (audit(c1));
alter table t1 enable row level security;

create user joe;
set session authorization joe;
select * from t1 where c1=42;

  c1 | c2
----+----
  42 | zp
(1 row)

reset session authorization;
select * from a1;

  c1 |              t1
----+-------------------------------
  42 | 2025-12-17 11:42:51.871843-05
(1 row)
8<-----------------

HTH,

-- 
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com






^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Record last SELECT on a row?
@ 2025-12-17 17:13  Greg Sabino Mullane <[email protected]>
  parent: Ron Johnson <[email protected]>
  1 sibling, 0 replies; 8+ messages in thread

From: Greg Sabino Mullane @ 2025-12-17 17:13 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

On Wed, Dec 17, 2025 at 10:24 AM Ron Johnson <[email protected]>
wrote:

> pgaudit might satisfy your needs, since it would only log SELECT
> statements on that one table.  You'd still have to grep the log file, so
> the information wouldn't be real-time, but that's *probably* not
> important.
>

That would only give you table-level information, unless it was a
dirt-simple query with no joins and an easy to parse where clause.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Record last SELECT on a row?
@ 2025-12-17 18:37  Tom Lane <[email protected]>
  parent: Joe Conway <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Tom Lane @ 2025-12-17 18:37 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; +Cc: Matthias Leisi <[email protected]>; pgsql-general

Joe Conway <[email protected]> writes:
> Possibly try using/abusing RLS?

Cute idea, but I think it doesn't reliably address the problem of
wanting to identify the specific rows that were read.  In your toy
example it'd work, because the generated plan is

regression=> explain verbose select * from t1 where c1=42;
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on public.t1  (cost=0.00..343.38 rows=2 width=36)
   Output: c1, c2
   Filter: ((t1.c1 = 42) AND audit(t1.c1))
(3 rows)

so the WHERE clause is applied before the RLS filter.  But in any
slightly-more-complicated situation, like a non-leakproof WHERE
condition, the order would be reversed so the log would overstate
which rows were read.

If the application's behavior is simple and well-defined, this might
be good enough, of course.

I thought of a way that could possibly do this reliably, but it's
vastly more work than the use-case seems worth:

1. Convert the SELECTs into SELECT FOR UPDATE (you could do this
without changing the application, by interposing a view).  SELECT
FOR SHARE might be good enough, not sure.

2. Write a logical replication output plugin that parses the WAL log
well enough to identify the tuple locks taken by FOR UPDATE.

This should work to log only the rows actually read, because FOR
UPDATE is postponed to the top of the query plan, unlike RLS.

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Record last SELECT on a row?
@ 2025-12-17 21:36  Joe Conway <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Joe Conway @ 2025-12-17 21:36 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Matthias Leisi <[email protected]>; pgsql-general

On 12/17/25 13:37, Tom Lane wrote:
> Joe Conway <[email protected]> writes:
>> Possibly try using/abusing RLS?
> 
> Cute idea, but I think it doesn't reliably address the problem of
> wanting to identify the specific rows that were read.  In your toy
> example it'd work, because the generated plan is
> 
> regression=> explain verbose select * from t1 where c1=42;
>                           QUERY PLAN
> ------------------------------------------------------------
>   Seq Scan on public.t1  (cost=0.00..343.38 rows=2 width=36)
>     Output: c1, c2
>     Filter: ((t1.c1 = 42) AND audit(t1.c1))
> (3 rows)
> 
> so the WHERE clause is applied before the RLS filter.  But in any
> slightly-more-complicated situation, like a non-leakproof WHERE
> condition, the order would be reversed so the log would overstate
> which rows were read.


Sure, of course we have had requests for a leakproofness check bypass[1] 
for some use cases, and this could be one more such case.


> If the application's behavior is simple and well-defined, this might
> be good enough, of course.


FWIW when I read the original email in the thread I got the impression 
that the application behavior was pretty simple WRT this table. But of 
course I could easily be wrong...

> I thought of a way that could possibly do this reliably, but it's
> vastly more work than the use-case seems worth:
> 
> 1. Convert the SELECTs into SELECT FOR UPDATE (you could do this
> without changing the application, by interposing a view).  SELECT
> FOR SHARE might be good enough, not sure.
> 
> 2. Write a logical replication output plugin that parses the WAL log
> well enough to identify the tuple locks taken by FOR UPDATE.


Yeah this seems like a pretty heavy lift.


> This should work to log only the rows actually read, because FOR
> UPDATE is postponed to the top of the query plan, unlike RLS.



[1] 
https://www.postgresql.org/message-id/flat/CAMxA3rsGQh9waorObOZyqrFqZ5uQ0b5D7SL6X6nh2kLhX%3D90vg%40m...

-- 
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com






^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Record last SELECT on a row?
@ 2025-12-18 07:17  Matthias Leisi <[email protected]>
  parent: Joe Conway <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Matthias Leisi @ 2025-12-18 07:17 UTC (permalink / raw)
  To: pgsql-general

>> If the application's behavior is simple and well-defined, this might
>> be good enough, of course.
> 
> 
> FWIW when I read the original email in the thread I got the impression that the application behavior was pretty simple WRT this table. But of course I could easily be wrong...

You are not wrong. The use case is in fact the `userpref` table used by Spamassassin. Left unmaintained, and given a large-enough user base, this has the tendency to grow considerably over time, so we want to gradually remove entries not actually used any more. (And we don’t want to patch Spamassassin core code to do this by itself…)

I’ll give it a try with ab/using RLS, which we use for some other use cases anyway already.

Thanks a lot for the suggestions in the thread, much appreciated!

— Matthias

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Record last SELECT on a row?
@ 2025-12-18 15:03  Kris Deugau <[email protected]>
  parent: Matthias Leisi <[email protected]>
  0 siblings, 0 replies; 8+ messages in thread

From: Kris Deugau @ 2025-12-18 15:03 UTC (permalink / raw)
  To: pgsql-general

Matthias Leisi wrote:
> 
>>> If the application's behavior is simple and well-defined, this might
>>> be good enough, of course.
>>
>>
>> FWIW when I read the original email in the thread I got the impression 
>> that the application behavior was pretty simple WRT this table. But of 
>> course I could easily be wrong...
> 
> You are not wrong. The use case is in fact the `userpref` table used by 
> Spamassassin. Left unmaintained, and given a large-enough user base, 
> this has the tendency to grow considerably over time, so we want to 
> gradually remove entries not actually used any more. (And we don’t want 
> to patch Spamassassin core code to do this by itself…)

For this particular use case, would it be easier to periodically compare 
the list of usernames with userpref data against active accounts on the 
mail system?

-kgd






^ permalink  raw  reply  [nested|flat] 8+ messages in thread

end of thread, other threads:[~2025-12-18 15:03 UTC | newest]

Thread overview: 8+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-12-17 15:24 Re: Record last SELECT on a row? Ron Johnson <[email protected]>
2025-12-17 16:25 ` Matthias Leisi <[email protected]>
2025-12-17 16:46   ` Joe Conway <[email protected]>
2025-12-17 18:37     ` Tom Lane <[email protected]>
2025-12-17 21:36       ` Joe Conway <[email protected]>
2025-12-18 07:17         ` Matthias Leisi <[email protected]>
2025-12-18 15:03           ` Kris Deugau <[email protected]>
2025-12-17 17:13 ` Greg Sabino Mullane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox