Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tBHgs-002srH-Ls for pgsql-general@arkaria.postgresql.org; Wed, 13 Nov 2024 18:02:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tBHgq-00Fpr8-1s for pgsql-general@arkaria.postgresql.org; Wed, 13 Nov 2024 18:02:04 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tBHgp-00Fpqy-IM for pgsql-general@lists.postgresql.org; Wed, 13 Nov 2024 18:02:04 +0000 Received: from dd25110.kasserver.com ([85.13.146.49]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tBHgm-001kul-1Y for pgsql-general@postgresql.org; Wed, 13 Nov 2024 18:02:03 +0000 Received: from [192.168.0.102] (ip-037-201-153-175.um10.pools.vodafone-ip.de [37.201.153.175]) by dd25110.kasserver.com (Postfix) with ESMTPSA id 9CD0DE1A1973; Wed, 13 Nov 2024 19:01:59 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=technowledgy.de; s=kas202308201259; t=1731520919; bh=yxB5kk556EM80XodluOmg8qBMhWR5T5bjWbZLwhuz9Q=; h=Date:From:Subject:To:Cc:References:In-Reply-To:From; b=bMGyn0ADPNbSe828woAMBlKRdZme9M2N/XdVlpsQ9tpQIPRCfGpxl2v1nrc2N+NRn PGhPWQq6tL6PW9kZRI6Z8rjH6UZfYo8TkDHnbhX2tfsRTp3+b4qSqDe0B2IwDwG2as lJp8L1C/YGLlF3cpr5WJfCzFm6tAx/krbr1pf9KMlghkz9ZVzCYLH8La9JG744BErn dIiCzyEUt+Ooz6YBFqKrY8FBX7mAe3eGWiaVIljMqGLuH51X6yT/i6vI2KrtgKruPK RdlX+QrKip8Yk3cZT99RwcXZR4VgRVo29sYMRFfSZIqFdgrlkdRAy3tCeZ1noKDLCL cdtlsBa81Yh8Q== Message-ID: <7d32e088-34a7-421a-9398-80958acb3f64@technowledgy.de> Date: Wed, 13 Nov 2024 19:01:58 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: walther@technowledgy.de Subject: Re: Fwd: A million users To: Dominique Devienne , Alvaro Herrera Cc: Vijaykumar Jain , pgsql-general , "kaare@jasonic.dk" References: <202411131029.qchduffwgzhm@alvherre.pgsql> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spamd-Bar: +++++ X-Spam: Yes List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Dominique Devienne: > Hi. Sure, that's a good point, and a best practice IMHO. > But I already do that, and the original question remain, > i.e. how does PostgreSQL with thousands or millions of roles? > In my use case, 1000 LOGIN users, and 10'000 schema related ROLEs, > is possible, with can translate to millions of GRANTs. > > It matters to me especially, since I'm using pg_has_role() in RLS predicates, > in addition to using ROLEs for access to schemas as usual (see above). > I'm not in a point to test that myself at this time, but if anyone looked into > how PostgreSQL scales with many roles (O(N)? O(logN)? O(NlogN)?), > I'd be interested in pointers to that research or those benchmarks. I don't have any benchmarks, but the following data point: We use PostgREST [1] which connects to the database with one "authenticator" role. For each request it handles, it does a SET ROLE to a role defined in the Authorization Header (JWT). Some numbers we are running with currently: - 1 authenticator role - ~ 127.000 user roles granted to "authenticator" - ~ 14.000 "scope" roles granted to the user roles (tenants, groups, ..) - ~ 15 "context" roles granted to user roles ("admin", "user", ...) - ~ 50 "access" roles granted to context roles ("view_x", "do_y", ...) Only the access roles have any direct privileges granted. We currently have ~ 700 RLS policies defined. Those are created TO the context roles. The policies check the current role's scope roles to select "allowed" rows. In total, we have ~370.000 roles granted to each other (pg_auth_members). Except for one thing, we have never had any real problems with this. We didn't observe anything getting massively worse with many roles, even though we use them extensively. RLS policies need to be carefully written to get any performance, though. The one problem we found is: The first time the authenticator role does a SET ROLE in a session it's **terribly** slow. With fewer users back then it took 6-7 minutes to do it. Any SET ROLE afterwards in the same session would be fast. Even more annoying - killing the session with SET ROLE running would not work properly and leave zombie processes. Giving the authenticator role the SUPERUSER privilege avoids the problem and makes it instant. However.. that's not very desirable. There were some improvements, IIRC in the 17 cycle (?), in that area, but I had not have the time to test it with that. We are still on v15 and the last time I tested this was ~ two years ago. I still wasn't able to put together a simple reproducer either. You should *probably* be better off with your different LOGIN roles, I assume the role cache builds up much quicker in that case. Hope that helps. Best, Wolfgang [1]: https://postgrest.org