Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufYsK-008v5u-In for pgsql-general@arkaria.postgresql.org; Sat, 26 Jul 2025 06:59:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ufYrJ-007NvE-QG for pgsql-general@arkaria.postgresql.org; Sat, 26 Jul 2025 06:58:18 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufYrJ-007Nv6-8I for pgsql-general@lists.postgresql.org; Sat, 26 Jul 2025 06:58:17 +0000 Received: from mail-ed1-x534.google.com ([2a00:1450:4864:20::534]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1ufYrF-000uX3-0l for pgsql-general@lists.postgresql.org; Sat, 26 Jul 2025 06:58:16 +0000 Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-60780d74c85so4775057a12.2 for ; Fri, 25 Jul 2025 23:58:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=churyukin-com.20230601.gappssmtp.com; s=20230601; t=1753513092; x=1754117892; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=B21tVfiiSffQvrNcQTJ8lp4TYabQtd7yhjKgx7m4krY=; b=lH/Dcdqb353W+y5U/aZ6NX/hwNE+6VBziPr8ySxKAf+F4FGjiygpquhgmhbSTv4Pyd NhdALZc2/pUPKpVirBkb0J8wLNvveDE4n3T5g8CcaO/qunQsu8uUrz1VT4PvnccR0pWk t4jYKJWbJU8nfk5lzy1Zrq2OVxmL5tIQE6LGTCu6zVbX9QnFKQrWmGBoa+iVGtZBX01R nVu6rVOpAAJquLndFPDNRvkZhneUu56Yl17tNlkHM/f8fe+V9zAXCbp+9ysVg5qiYiHM bbExtMAq8z+Z0g+KryAgFJJ4Ndv92uISetFL8R3blSnUHpZ/8pFa3VFxQpRv7rrdqAnl joYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753513092; x=1754117892; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=B21tVfiiSffQvrNcQTJ8lp4TYabQtd7yhjKgx7m4krY=; b=YeLAQA397mj2QOZVdozrWbUSBJhDMKKUX3zphjxMSletyqVVmfWZ+PzoylujXLs1y2 djDWy5ZLyOddY3oCsy//4vOZmuwoiykohUn27O4/88DZ2ZSdht54TT3TpK5ZpQkyHtbv y84SYCuDhT0SQOHqysTxUaEkVzsI0g0hNS8fbN0J8+5pcutzgYb8QAfzqvBXbI6oXOMx 0npIsmPUpVs0EYz8GmfdnD6YnHiIWO7d3LUdQbNkMW+PK2ag/Z0mD1dBOjE7kBynghW0 NSfnocNJpmk+xmaFqftR10gnRTKFbSs6jifOOlBC3l7I1gZJrrEKv1Ch7Ko1TlUKJ1+G qphA== X-Gm-Message-State: AOJu0YwHAvSONaY9DaY+orWOoiU8pUKzNURXRfBopo4AVz9Za8PFGBZ2 uUkJN4g1UFSf/EI3VvW/3JI2vt8lNKWq0R8uNiFtEvTBIWBZR+O2AJ9x5sbSUTKmFlhsk1SqJOn g/goiinz2ByM4zptDOrA7jYjKkvDhTgoJFAw6 X-Gm-Gg: ASbGncuaBXhpUgu3fJPDcIOzN1S1PxgqnkR9X1A0uhdCfGgJv823JQZmsHLWvdCOmgp IKhJ0djz7MGE0TtyKYtpyMJovEIR3/5Gn293w2bchLwSROhX/y8b3BGG6+ugVqb7m36LO4/8LAI rHCB+hDC2PJeV5JR0X0k1a3Mh1RllIozh6t0fYoxC+KE3Rl/lSA1U6zbAXC5sm7xjssVvV6Rz+g Tz5YXWmtDNSA39LBH1xnYTmVsuYTq9TG7vw0Q== X-Google-Smtp-Source: AGHT+IEWCriWlv0kHF6HC+7XeJRdV/3lATve019gAxpQCqwEE0UtSicVqKRrgeRCk2uMFlRFPuHAg1TLzlgEuwPJWH4= X-Received: by 2002:a17:907:97d3:b0:ae0:aede:8a2c with SMTP id a640c23a62f3a-af61e05138fmr514940766b.32.1753513091301; Fri, 25 Jul 2025 23:58:11 -0700 (PDT) MIME-Version: 1.0 References: <8188513c-e089-4273-b2be-16dd0a5a0a80@app.fastmail.com> In-Reply-To: From: Vladimir Churyukin Date: Fri, 25 Jul 2025 23:57:59 -0700 X-Gm-Features: Ac12FXxl-8zrTy7elQ2NhmJ5Sf9khf5g3JoU3sAmWTNYHwB0GwH6S80yRi4xnWc Message-ID: Subject: Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance To: Pierre Barre Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000ea8781063acf9492" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000ea8781063acf9492 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable A shared storage would require a lot of extra work. That's essentially what AWS Aurora does. You will have to have functionality to sync in-memory states between nodes, because all the instances will have cached data that can easily become stale on any write operation. That alone is not that simple. You will have to modify some locking logic. Most likely do a lot of other changes in a lot of places, Postgres was not just built with the assumption that the storage can be shared. -Vladimir On Fri, Jul 18, 2025 at 5:31=E2=80=AFAM Pierre Barre wrot= e: > Now, I'm trying to understand how CAP theorem applies here. Traditional > PostgreSQL replication has clear CAP trade-offs - you choose between > consistency and availability during partitions. > > But when PostgreSQL instances share storage rather than replicate: > - Consistency seems maintained (same data) > - Availability seems maintained (client can always promote an accessible > node) > - Partitions between PostgreSQL nodes don't prevent the system from > functioning > > It seems that CAP assumes specific implementation details (like nodes > maintaining independent state) without explicitly stating them. > > How should we think about CAP theorem when distributed nodes share storag= e > rather than coordinate state? Are the trade-offs simply moved to a > different layer, or does shared storage fundamentally change the analysis= ? > > Client with awareness of both PostgreSQL nodes > | | > =E2=86=93 (partition here) =E2=86=93 > PostgreSQL Primary PostgreSQL Standby > | | > =E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=AC=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =98 > =E2=86=93 > Shared ZFS Pool > | > 6 Global ZeroFS instances > > Best, > Pierre > > On Fri, Jul 18, 2025, at 12:57, Pierre Barre wrote: > > Hi Seref, > > > > For the benchmarks, I used Hetzner's cloud service with the following > setup: > > > > - A Hetzner s3 bucket in the FSN1 region > > - A virtual machine of type ccx63 48 vCPU 192 GB memory > > - 3 ZeroFS nbd devices (same s3 bucket) > > - A ZFS stripped pool with the 3 devices > > - 200GB zfs L2ARC > > - Postgres configured accordingly memory-wise as well as with > synchronous_commit =3D off, wal_init_zero =3D off and wal_recycle =3D off= . > > > > Best, > > Pierre > > > > On Fri, Jul 18, 2025, at 12:42, Seref Arikan wrote: > >> Sorry, this was meant to go to the whole group: > >> > >> Very interesting!. Great work. Can you clarify how exactly you're > running postgres in your tests? A specific AWS service? What's the test > infrastructure that sits above the file system? > >> > >> On Thu, Jul 17, 2025 at 11:59=E2=80=AFPM Pierre Barre wrote: > >>> Hi everyone, > >>> > >>> I wanted to share a project I've been working on that enables > PostgreSQL to run on S3 storage while maintaining performance comparable = to > local NVMe. The approach uses block-level access rather than trying to ma= p > filesystem operations to S3 objects. > >>> > >>> ZeroFS: https://github.com/Barre/ZeroFS > >>> > >>> # The Architecture > >>> > >>> ZeroFS provides NBD (Network Block Device) servers that expose S3 > storage as raw block devices. PostgreSQL runs unmodified on ZFS pools bui= lt > on these block devices: > >>> > >>> PostgreSQL -> ZFS -> NBD -> ZeroFS -> S3 > >>> > >>> By providing block-level access and leveraging ZFS's caching > capabilities (L2ARC), we can achieve microsecond latencies despite the > underlying storage being in S3. > >>> > >>> ## Performance Results > >>> > >>> Here are pgbench results from PostgreSQL running on this setup: > >>> > >>> ### Read/Write Workload > >>> > >>> ``` > >>> postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 > example > >>> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) > >>> starting vacuum...end. > >>> transaction type: > >>> scaling factor: 50 > >>> query mode: simple > >>> number of clients: 50 > >>> number of threads: 15 > >>> maximum number of tries: 1 > >>> number of transactions per client: 100000 > >>> number of transactions actually processed: 5000000/5000000 > >>> number of failed transactions: 0 (0.000%) > >>> latency average =3D 0.943 ms > >>> initial connection time =3D 48.043 ms > >>> tps =3D 53041.006947 (without initial connection time) > >>> ``` > >>> > >>> ### Read-Only Workload > >>> > >>> ``` > >>> postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 -S > example > >>> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) > >>> starting vacuum...end. > >>> transaction type: > >>> scaling factor: 50 > >>> query mode: simple > >>> number of clients: 50 > >>> number of threads: 15 > >>> maximum number of tries: 1 > >>> number of transactions per client: 100000 > >>> number of transactions actually processed: 5000000/5000000 > >>> number of failed transactions: 0 (0.000%) > >>> latency average =3D 0.121 ms > >>> initial connection time =3D 53.358 ms > >>> tps =3D 413436.248089 (without initial connection time) > >>> ``` > >>> > >>> These numbers are with 50 concurrent clients and the actual data > stored in S3. Hot data is served from ZFS L2ARC and ZeroFS's memory cache= s, > while cold data comes from S3. > >>> > >>> ## How It Works > >>> > >>> 1. ZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS > can use like any other block device > >>> 2. Multiple cache layers hide S3 latency: > >>> a. ZFS ARC/L2ARC for frequently accessed blocks > >>> b. ZeroFS memory cache for metadata and hot dataZeroFS exposes NBD > devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use like any other bloc= k > device > >>> c. Optional local disk cache > >>> 3. All data is encrypted (ChaCha20-Poly1305) before hitting S3 > >>> 4. Files are split into 128KB chunks for insertion into ZeroFS' > LSM-tree > >>> > >>> ## Geo-Distributed PostgreSQL > >>> > >>> Since each region can run its own ZeroFS instance, you can create > geographically distributed PostgreSQL setups. > >>> > >>> Example architectures: > >>> > >>> Architecture 1 > >>> > >>> > >>> PostgreSQL Client > >>> | > >>> | SQL queries > >>> | > >>> +--------------+ > >>> | PG Proxy | > >>> | (HAProxy/ | > >>> | PgBouncer) | > >>> +--------------+ > >>> / \ > >>> / \ > >>> Synchronous Synchronous > >>> Replication Replication > >>> / \ > >>> / \ > >>> +---------------+ +---------------+ > >>> | PostgreSQL 1 | | PostgreSQL 2 | > >>> | (Primary) |=E2=97=84------=E2=96=BA| (Standby) = | > >>> +---------------+ +---------------+ > >>> | | > >>> | POSIX filesystem ops | > >>> | | > >>> +---------------+ +---------------+ > >>> | ZFS Pool 1 | | ZFS Pool 2 | > >>> | (3-way mirror)| | (3-way mirror)| > >>> +---------------+ +---------------+ > >>> / | \ / | \ > >>> / | \ / | \ > >>> NBD:10809 NBD:10810 NBD:10811 NBD:10812 NBD:10813 NBD:10814 > >>> | | | | | | > >>> +--------++--------++--------++--------++--------++--------+ > >>> |ZeroFS 1||ZeroFS 2||ZeroFS 3||ZeroFS 4||ZeroFS 5||ZeroFS 6| > >>> +--------++--------++--------++--------++--------++--------+ > >>> | | | | | | > >>> | | | | | | > >>> S3-Region1 S3-Region2 S3-Region3 S3-Region4 S3-Region5 > S3-Region6 > >>> (us-east) (eu-west) (ap-south) (us-west) (eu-north) (ap-east) > >>> > >>> Architecture 2: > >>> > >>> PostgreSQL Primary (Region 1) =E2=86=90=E2=86=92 PostgreSQL Standby (= Region 2) > >>> \ / > >>> \ / > >>> Same ZFS Pool (NBD) > >>> | > >>> 6 Global ZeroFS > >>> | > >>> S3 Regions > >>> > >>> > >>> The main advantages I see are: > >>> 1. Dramatic cost reduction for large datasets > >>> 2. Simplified geo-distribution > >>> 3. Infinite storage capacity > >>> 4. Built-in encryption and compression > >>> > >>> Looking forward to your feedback and questions! > >>> > >>> Best, > >>> Pierre > >>> > >>> P.S. The full project includes a custom NFS filesystem too. > >>> > > > > > --000000000000ea8781063acf9492 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
A shared storage would require a lot of extra work. That&#= 39;s essentially what AWS Aurora does.
You will have to have functional= ity to sync in-memory states between nodes, because all the instances will = have cached data that can easily become stale on any write operation.
=
That alone is not that simple. You will have to modify some locking lo= gic. Most likely do a lot of other changes in a lot of places, Postgres was= not just built with the assumption that the storage can be shared.

-Vladimir

On Fri, Jul 18, 2025= at 5:31=E2=80=AFAM Pierre Barre <pie= rre@barre.sh> wrote:
Now, I'm trying to understand how CAP theorem applies here.= Traditional PostgreSQL replication has clear CAP trade-offs - you choose b= etween consistency and availability during partitions.

But when PostgreSQL instances share storage rather than replicate:
- Consistency seems maintained (same data)
- Availability seems maintained (client can always promote an accessible no= de)
- Partitions between PostgreSQL nodes don't prevent the system from fun= ctioning

It seems that CAP assumes specific implementation details (like nodes maint= aining independent state) without explicitly stating them.

How should we think about CAP theorem when distributed nodes share storage = rather than coordinate state? Are the trade-offs simply moved to a differen= t layer, or does shared storage fundamentally change the analysis?

Client with awareness of both PostgreSQL nodes
=C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|
=C2=A0 =C2=A0 =E2=86=93 (partition here)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =E2=86=93
PostgreSQL Primary=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PostgreS= QL Standby
=C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|
=C2=A0 =C2=A0 =E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=AC=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=98
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =E2=86=93
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Shared ZFS Pool
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A06 Global ZeroFS instances

Best,
Pierre

On Fri, Jul 18, 2025, at 12:57, Pierre Barre wrote:
> Hi Seref,
>
> For the benchmarks, I used Hetzner's cloud service with the follow= ing setup:
>
> - A Hetzner s3 bucket in the FSN1 region
> - A virtual machine of type ccx63 48 vCPU 192 GB memory
> - 3 ZeroFS nbd devices (same s3 bucket)
> - A ZFS stripped pool with the 3 devices
> - 200GB zfs L2ARC
> - Postgres configured accordingly memory-wise as well as with synchron= ous_commit =3D off, wal_init_zero =3D off and wal_recycle =3D off.
>
> Best,
> Pierre
>
> On Fri, Jul 18, 2025, at 12:42, Seref Arikan wrote:
>> Sorry, this was meant to go to the whole group:
>>
>> Very interesting!. Great work. Can you clarify how exactly you'= ;re running postgres in your tests? A specific AWS service? What's the = test infrastructure that sits above the file system?
>>
>> On Thu, Jul 17, 2025 at 11:59=E2=80=AFPM Pierre Barre <pierre@barre.sh> wrote= :
>>> Hi everyone,
>>>
>>> I wanted to share a project I've been working on that enab= les PostgreSQL to run on S3 storage while maintaining performance comparabl= e to local NVMe. The approach uses block-level access rather than trying to= map filesystem operations to S3 objects.
>>>
>>> ZeroFS: https://github.com/Barre/ZeroFS
>>>
>>> # The Architecture
>>>
>>> ZeroFS provides NBD (Network Block Device) servers that expose= S3 storage as raw block devices. PostgreSQL runs unmodified on ZFS pools b= uilt on these block devices:
>>>
>>> PostgreSQL -> ZFS -> NBD -> ZeroFS -> S3
>>>
>>> By providing block-level access and leveraging ZFS's cachi= ng capabilities (L2ARC), we can achieve microsecond latencies despite the u= nderlying storage being in S3.
>>>
>>> ## Performance Results
>>>
>>> Here are pgbench results from PostgreSQL running on this setup= :
>>>
>>> ### Read/Write Workload
>>>
>>> ```
>>> postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 1000= 00 example
>>> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1))
>>> starting vacuum...end.
>>> transaction type: <builtin: TPC-B (sort of)>
>>> scaling factor: 50
>>> query mode: simple
>>> number of clients: 50
>>> number of threads: 15
>>> maximum number of tries: 1
>>> number of transactions per client: 100000
>>> number of transactions actually processed: 5000000/5000000
>>> number of failed transactions: 0 (0.000%)
>>> latency average =3D 0.943 ms
>>> initial connection time =3D 48.043 ms
>>> tps =3D 53041.006947 (without initial connection time)
>>> ```
>>>
>>> ### Read-Only Workload
>>>
>>> ```
>>> postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 1000= 00 -S example
>>> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1))
>>> starting vacuum...end.
>>> transaction type: <builtin: select only>
>>> scaling factor: 50
>>> query mode: simple
>>> number of clients: 50
>>> number of threads: 15
>>> maximum number of tries: 1
>>> number of transactions per client: 100000
>>> number of transactions actually processed: 5000000/5000000
>>> number of failed transactions: 0 (0.000%)
>>> latency average =3D 0.121 ms
>>> initial connection time =3D 53.358 ms
>>> tps =3D 413436.248089 (without initial connection time)
>>> ```
>>>
>>> These numbers are with 50 concurrent clients and the actual da= ta stored in S3. Hot data is served from ZFS L2ARC and ZeroFS's memory = caches, while cold data comes from S3.
>>>
>>> ## How It Works
>>>
>>> 1. ZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQ= L/ZFS can use like any other block device
>>> 2. Multiple cache layers hide S3 latency:
>>>=C2=A0 =C2=A0 a. ZFS ARC/L2ARC for frequently accessed blocks >>>=C2=A0 =C2=A0 b. ZeroFS memory cache for metadata and hot dataZ= eroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use lik= e any other block device
>>>=C2=A0 =C2=A0 c. Optional local disk cache
>>> 3. All data is encrypted (ChaCha20-Poly1305) before hitting S3=
>>> 4. Files are split into 128KB chunks for insertion into ZeroFS= ' LSM-tree
>>>
>>> ## Geo-Distributed PostgreSQL
>>>
>>> Since each region can run its own ZeroFS instance, you can cre= ate geographically distributed PostgreSQL setups.
>>>
>>> Example architectures:
>>>
>>> Architecture 1
>>>
>>>
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 PostgreSQL Client
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | SQL querie= s
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0+--------------+
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 PG Proxy=C2=A0 =C2=A0 | >>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| (HAProxy/=C2=A0 =C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 PgBouncer)=C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0+--------------+
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /=C2=A0 =C2=A0 =C2=A0 =C2= =A0 \
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 \
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 Synchronous=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Synchronous
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 Replication=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Replication
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 \
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 \
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0+-------= --------+=C2=A0 =C2=A0 =C2=A0 =C2=A0 +---------------+
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| Postgr= eSQL 1=C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 | PostgreSQL 2=C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| (Prima= ry)=C2=A0 =C2=A0 =C2=A0|=E2=97=84------=E2=96=BA| (Standby)=C2=A0 =C2=A0 = =C2=A0|
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0+-------= --------+=C2=A0 =C2=A0 =C2=A0 =C2=A0 +---------------+
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0|=C2=A0 POSIX filesystem ops=C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0+-------= --------+=C2=A0 =C2=A0 =C2=A0 =C2=A0 +---------------+
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 = =C2=A0ZFS Pool 1=C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0ZFS Pool = 2=C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| (3-way= mirror)|=C2=A0 =C2=A0 =C2=A0 =C2=A0 | (3-way mirror)|
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0+-------= --------+=C2=A0 =C2=A0 =C2=A0 =C2=A0 +---------------+
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /=C2=A0= =C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 \=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /= =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 \
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2=A0 = =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0\=C2=A0 =C2=A0 =C2=A0 =C2= =A0 /=C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0\
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0NBD:10809 NBD:10810 NBD:10811= =C2=A0 NBD:10812 NBD:10813 NBD:10814
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0= =C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 | >>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0+--------++--------++--------= ++--------++--------++--------+
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|ZeroFS 1||ZeroFS 2||ZeroFS 3= ||ZeroFS 4||ZeroFS 5||ZeroFS 6|
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0+--------++--------++--------= ++--------++--------++--------+
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0|
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0|
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0S3-Region1 S3-Region2 S3-Regi= on3 S3-Region4 S3-Region5 S3-Region6
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(us-east) (eu-west) (ap-south= ) (us-west) (eu-north) (ap-east)
>>>
>>> Architecture 2:
>>>
>>> PostgreSQL Primary (Region 1) =E2=86=90=E2=86=92 PostgreSQL St= andby (Region 2)
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0\= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = \=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0Same ZFS Pool (NBD)
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A06 Global ZeroFS
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 |
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0S3 Regions
>>>
>>>
>>> The main advantages I see are:
>>> 1. Dramatic cost reduction for large datasets
>>> 2. Simplified geo-distribution
>>> 3. Infinite storage capacity
>>> 4. Built-in encryption and compression
>>>
>>> Looking forward to your feedback and questions!
>>>
>>> Best,
>>> Pierre
>>>
>>> P.S. The full project includes a custom NFS filesystem too. >>>
>


--000000000000ea8781063acf9492--