public inbox for [email protected]
help / color / mirror / Atom feedRe: Fwd: A million users
9+ messages / 7 participants
[nested] [flat]
* Re: Fwd: A million users
@ 2024-11-13 10:29 Alvaro Herrera <[email protected]>
0 siblings, 2 replies; 9+ messages in thread
From: Alvaro Herrera @ 2024-11-13 10:29 UTC (permalink / raw)
To: Vijaykumar Jain <[email protected]>; +Cc: pgsql-general; [email protected] <[email protected]>
On 2024-Nov-13, Vijaykumar Jain wrote:
> I tried to grant select permissions to 5000 different roles on one table,
> It failed with row size too big already at 2443.
But you can grant select to one "reader" role, and grant that one role
to however many other roles you want. This way you can have an
arbitrary number of roles with indirect access to the table. In
real-world usage, this is more convenient that granting access to
individual roles on individual tables; likely, you'll grant access to
sets of tables/views/functions/etc rather than exactly one, and you can
manage that more easily if you have one intermediate role to modify than
if you have to mess with 5000 individual roles.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Fwd: A million users
@ 2024-11-13 10:52 Dominique Devienne <[email protected]>
parent: Alvaro Herrera <[email protected]>
1 sibling, 2 replies; 9+ messages in thread
From: Dominique Devienne @ 2024-11-13 10:52 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Vijaykumar Jain <[email protected]>; pgsql-general; [email protected] <[email protected]>
On Wed, Nov 13, 2024 at 11:29 AM Alvaro Herrera <[email protected]> wrote:
> On 2024-Nov-13, Vijaykumar Jain wrote:
> > I tried to grant select permissions to 5000 different roles on one table,
> > It failed with row size too big already at 2443.
>
> But you can grant select to one "reader" role, and grant that one role
> to however many other roles you want. This way you can have an
> arbitrary number of roles with indirect access to the table. In
> real-world usage, this is more convenient that granting access to
> individual roles on individual tables; likely, you'll grant access to
> sets of tables/views/functions/etc rather than exactly one, and you can
> manage that more easily if you have one intermediate role to modify than
> if you have to mess with 5000 individual roles.
Hi. Sure, that's a good point, and a best practice IMHO.
But I already do that, and the original question remain,
i.e. how does PostgreSQL with thousands or millions of roles?
In my use case, 1000 LOGIN users, and 10'000 schema related ROLEs,
is possible, with can translate to millions of GRANTs.
It matters to me especially, since I'm using pg_has_role() in RLS predicates,
in addition to using ROLEs for access to schemas as usual (see above).
I'm not in a point to test that myself at this time, but if anyone looked into
how PostgreSQL scales with many roles (O(N)? O(logN)? O(NlogN)?),
I'd be interested in pointers to that research or those benchmarks.
Thanks, --DD
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Fwd: A million users
@ 2024-11-13 11:12 Achilleas Mantzios - cloud <[email protected]>
parent: Alvaro Herrera <[email protected]>
1 sibling, 1 reply; 9+ messages in thread
From: Achilleas Mantzios - cloud @ 2024-11-13 11:12 UTC (permalink / raw)
To: [email protected]
On 11/13/24 12:29, Alvaro Herrera wrote:
> On 2024-Nov-13, Vijaykumar Jain wrote:
>
>> I tried to grant select permissions to 5000 different roles on one table,
>> It failed with row size too big already at 2443.
> But you can grant select to one "reader" role, and grant that one role
> to however many other roles you want. This way you can have an
> arbitrary number of roles with indirect access to the table. In
> real-world usage, this is more convenient that granting access to
> individual roles on individual tables; likely, you'll grant access to
> sets of tables/views/functions/etc rather than exactly one, and you can
> manage that more easily if you have one intermediate role to modify than
> if you have to mess with 5000 individual roles.
Exactly! In the later versions, security gets more and more refined and
strengthened. So ppl should think about moving away from "public" , and
start implementing finer grained schemes of security, as you suggest. +
\dp shows prettier than having 1000+ users listed.
>
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Fwd: A million users
@ 2024-11-13 11:30 Vijaykumar Jain <[email protected]>
parent: Achilleas Mantzios - cloud <[email protected]>
0 siblings, 1 reply; 9+ messages in thread
From: Vijaykumar Jain @ 2024-11-13 11:30 UTC (permalink / raw)
To: Achilleas Mantzios - cloud <[email protected]>; +Cc: pgsql-general <[email protected]>
On Wed, Nov 13, 2024, 4:42 PM Achilleas Mantzios - cloud <
[email protected]> wrote:
>
> Exactly! In the later versions, security gets more and more refined and
> strengthened. So ppl should think about moving away from "public" , and
> start implementing finer grained schemes of security, as you suggest. +
> \dp shows prettier than having 1000+ users listed.
>
I wanted to just communicate the limits.
a lot of postgresql architecture can leverage the resources and scale, but
not all.
i had 100s of 1000s of tables on my setup where i worked last.
if i did \dt it would freeze all the time. i had to exit the pdwl session,
check the source code of how the partition was named and then look for what
I wanted.
if things are pretty with psql or not should not be a criteria for how many
objects you want to have.
i would expect clear exceptions so one knows what the real problem is.
the error I got did not in anyway communicate the role limits for col size
limits.
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Fwd: A million users
@ 2024-11-13 11:40 Vijaykumar Jain <[email protected]>
parent: Vijaykumar Jain <[email protected]>
0 siblings, 0 replies; 9+ messages in thread
From: Vijaykumar Jain @ 2024-11-13 11:40 UTC (permalink / raw)
To: Achilleas Mantzios - cloud <[email protected]>; +Cc: pgsql-general <[email protected]>
On Wed, Nov 13, 2024, 5:00 PM Vijaykumar Jain <
[email protected]> wrote:
>
>
> On Wed, Nov 13, 2024, 4:42 PM Achilleas Mantzios - cloud <
> [email protected]> wrote:
>
>>
>> Exactly! In the later versions, security gets more and more refined and
>> strengthened. So ppl should think about moving away from "public" , and
>> start implementing finer grained schemes of security, as you suggest. +
>> \dp shows prettier than having 1000+ users listed.
>>
>
> I wanted to just communicate the limits.
> a lot of postgresql architecture can leverage the resources and scale, but
> not all.
> i had 100s of 1000s of tables on my setup where i worked last.
> if i did \dt it would freeze all the time. i had to exit the pdwl session,
> check the source code of how the partition was named and then look for what
> I wanted.
> if things are pretty with psql or not should not be a criteria for how
> many objects you want to have.
>
> i would expect clear exceptions so one knows what the real problem is.
> the error I got did not in anyway communicate the role limits for col size
> limits.
>
https://fluca1978.github.io/2018/01/04/PostgreSQLUsers.html
so roles are not the problem.
but if you grant them individually select on the same table for ex. then
the limits are breached based of size of the col not number of permissions.
>
>
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Fwd: A million users
@ 2024-11-13 12:41 Kaare Rasmussen <[email protected]>
parent: Dominique Devienne <[email protected]>
1 sibling, 1 reply; 9+ messages in thread
From: Kaare Rasmussen @ 2024-11-13 12:41 UTC (permalink / raw)
To: Dominique Devienne <[email protected]>; Alvaro Herrera <[email protected]>; +Cc: Vijaykumar Jain <[email protected]>; pgsql-general
Hi Dominique
Hi. Sure, that's a good point, and a best practice IMHO.
But I already do that, and the original question remain,
i.e. how does PostgreSQL with thousands or millions of roles?
Sorry if my original post was unclear, but I don't expect that there will be much more than perhaps a hundred roles. Each may have from a few up to a million users in them, though.
/kaare
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Fwd: A million users
@ 2024-11-13 13:40 Alvaro Herrera <[email protected]>
parent: Kaare Rasmussen <[email protected]>
0 siblings, 0 replies; 9+ messages in thread
From: Alvaro Herrera @ 2024-11-13 13:40 UTC (permalink / raw)
To: Kaare Rasmussen <[email protected]>; +Cc: Dominique Devienne <[email protected]>; Vijaykumar Jain <[email protected]>; pgsql-general
On 2024-Nov-13, Kaare Rasmussen wrote:
> Sorry if my original post was unclear, but I don't expect that there
> will be much more than perhaps a hundred roles. Each may have from a
> few up to a million users in them, though.
In Postgres, a user is a role. So if you have a hundred roles and a
million users that these roles are granted to, that means you'll have
100100 roles. (In the worst case, where you grant all one hundred roles
to each of the million users, you would end up with 100_000_000 rows in
pg_auth_member).
I would expect such a system to work mostly fine. It'll need memory for
the caches used to store contents of system catalogs. I think you
should test it out and see what happens. I haven't seen any published
_actual_ benchmarks on this point.
That said, having a million users is a bit strange. Do you want to give
each visitor to your website a unique Postgres role, or something like
that? I think this is unusual, but it should work.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Fwd: A million users
@ 2024-11-13 18:01 [email protected]
parent: Dominique Devienne <[email protected]>
1 sibling, 1 reply; 9+ messages in thread
From: [email protected] @ 2024-11-13 18:01 UTC (permalink / raw)
To: Dominique Devienne <[email protected]>; Alvaro Herrera <[email protected]>; +Cc: Vijaykumar Jain <[email protected]>; pgsql-general; [email protected] <[email protected]>
Dominique Devienne:
> Hi. Sure, that's a good point, and a best practice IMHO.
> But I already do that, and the original question remain,
> i.e. how does PostgreSQL with thousands or millions of roles?
> In my use case, 1000 LOGIN users, and 10'000 schema related ROLEs,
> is possible, with can translate to millions of GRANTs.
>
> It matters to me especially, since I'm using pg_has_role() in RLS predicates,
> in addition to using ROLEs for access to schemas as usual (see above).
> I'm not in a point to test that myself at this time, but if anyone looked into
> how PostgreSQL scales with many roles (O(N)? O(logN)? O(NlogN)?),
> I'd be interested in pointers to that research or those benchmarks.
I don't have any benchmarks, but the following data point:
We use PostgREST [1] which connects to the database with one
"authenticator" role. For each request it handles, it does a SET ROLE to
a role defined in the Authorization Header (JWT).
Some numbers we are running with currently:
- 1 authenticator role
- ~ 127.000 user roles granted to "authenticator"
- ~ 14.000 "scope" roles granted to the user roles (tenants, groups, ..)
- ~ 15 "context" roles granted to user roles ("admin", "user", ...)
- ~ 50 "access" roles granted to context roles ("view_x", "do_y", ...)
Only the access roles have any direct privileges granted.
We currently have ~ 700 RLS policies defined. Those are created TO the
context roles. The policies check the current role's scope roles to
select "allowed" rows.
In total, we have ~370.000 roles granted to each other (pg_auth_members).
Except for one thing, we have never had any real problems with this. We
didn't observe anything getting massively worse with many roles, even
though we use them extensively. RLS policies need to be carefully
written to get any performance, though.
The one problem we found is:
The first time the authenticator role does a SET ROLE in a session it's
**terribly** slow. With fewer users back then it took 6-7 minutes to do
it. Any SET ROLE afterwards in the same session would be fast. Even more
annoying - killing the session with SET ROLE running would not work
properly and leave zombie processes. Giving the authenticator role the
SUPERUSER privilege avoids the problem and makes it instant. However..
that's not very desirable.
There were some improvements, IIRC in the 17 cycle (?), in that area,
but I had not have the time to test it with that. We are still on v15
and the last time I tested this was ~ two years ago. I still wasn't able
to put together a simple reproducer either.
You should *probably* be better off with your different LOGIN roles, I
assume the role cache builds up much quicker in that case.
Hope that helps.
Best,
Wolfgang
[1]: https://postgrest.org
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Fwd: A million users
@ 2024-11-22 12:24 Eric Hanson <[email protected]>
parent: [email protected]
0 siblings, 0 replies; 9+ messages in thread
From: Eric Hanson @ 2024-11-22 12:24 UTC (permalink / raw)
To: [email protected]; +Cc: Dominique Devienne <[email protected]>; Alvaro Herrera <[email protected]>; Vijaykumar Jain <[email protected]>; pgsql-general; [email protected] <[email protected]>
On Wed, Nov 13, 2024 at 12:02 PM <[email protected]> wrote:
> I don't have any benchmarks, but the following data point:
>
> We use PostgREST [1] which connects to the database with one
> "authenticator" role. For each request it handles, it does a SET ROLE to
> a role defined in the Authorization Header (JWT).
>
> Some numbers we are running with currently:
> - 1 authenticator role
> - ~ 127.000 user roles granted to "authenticator"
> - ~ 14.000 "scope" roles granted to the user roles (tenants, groups, ..)
> - ~ 15 "context" roles granted to user roles ("admin", "user", ...)
> - ~ 50 "access" roles granted to context roles ("view_x", "do_y", ...)
>
> Only the access roles have any direct privileges granted.
>
> We currently have ~ 700 RLS policies defined. Those are created TO the
> context roles. The policies check the current role's scope roles to
> select "allowed" rows.
>
> In total, we have ~370.000 roles granted to each other (pg_auth_members).
>
> Except for one thing, we have never had any real problems with this. We
> didn't observe anything getting massively worse with many roles, even
> though we use them extensively. RLS policies need to be carefully
> written to get any performance, though.
>
> The one problem we found is:
>
> The first time the authenticator role does a SET ROLE in a session it's
> **terribly** slow. With fewer users back then it took 6-7 minutes to do
> it. Any SET ROLE afterwards in the same session would be fast. Even more
> annoying - killing the session with SET ROLE running would not work
> properly and leave zombie processes. Giving the authenticator role the
> SUPERUSER privilege avoids the problem and makes it instant. However..
> that's not very desirable.
>
> There were some improvements, IIRC in the 17 cycle (?), in that area,
> but I had not have the time to test it with that. We are still on v15
> and the last time I tested this was ~ two years ago. I still wasn't able
> to put together a simple reproducer either.
>
> You should *probably* be better off with your different LOGIN roles, I
> assume the role cache builds up much quicker in that case.
>
I'm really interested in how this works. Role-per-user or even the ability
to have many roles (370k??) seems like a dream come true. But I always was
wary of it because:
a) A connection-per-role hits the ceiling pretty quickly because
connections can't be pooled and shared between users and take up a lot of
memory etc.
b) One could try to get around this with an authenticator role as you
describe, but isn't it then possible to do a RESET ROLE and then another
SET ROLE to get access to another user? This of course would have to be
through SQL injection or some such, but it seems like that defeats at least
some of the purpose of RLS.
Did you find some way to prevent RESET ROLE? I once advocated for a NO
RESET option on SET ROLE [1] so that RESET ROLE would be impossible for the
rest of the session. Still think it would be helpful.
Thanks,
Eric
[1]
https://www.postgresql.org/message-id/flat/CACA6kxgdzt-oForijaxfXHHhnZ1WBoVGMXVwFrJqUu-Hg3C-jA%40mai...
^ permalink raw reply [nested|flat] 9+ messages in thread
end of thread, other threads:[~2024-11-22 12:24 UTC | newest]
Thread overview: 9+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-11-13 10:29 Re: Fwd: A million users Alvaro Herrera <[email protected]>
2024-11-13 10:52 ` Dominique Devienne <[email protected]>
2024-11-13 12:41 ` Kaare Rasmussen <[email protected]>
2024-11-13 13:40 ` Alvaro Herrera <[email protected]>
2024-11-13 18:01 ` [email protected]
2024-11-22 12:24 ` Eric Hanson <[email protected]>
2024-11-13 11:12 ` Achilleas Mantzios - cloud <[email protected]>
2024-11-13 11:30 ` Vijaykumar Jain <[email protected]>
2024-11-13 11:40 ` Vijaykumar Jain <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox