Feedback-ID: i97614980:Fastmail
MIME-Version: 1.0
Date: Sat, 26 Jul 2025 10:44:41 +0200
From: "Pierre Barre" <pierre@barre.sh>
To: "Vladimir Churyukin" <vladimir@churyukin.com>
Cc: pgsql-general@lists.postgresql.org
Message-Id: <8f750558-17b5-4d87-a03f-0dcfdbf4899c@app.fastmail.com>
In-Reply-To: <44dafe90-9ad6-41ae-b9fe-bea4aaf49a59@app.fastmail.com>
References: <a9fe5ddb-9685-4139-bc1f-88161a7a4da3@app.fastmail.com>
 <CAG1bHGOzCNtDeW0W8gRO7mpW=t7BqWh-iz4kX5VRCPgt_6Tr6Q@mail.gmail.com>
 <8188513c-e089-4273-b2be-16dd0a5a0a80@app.fastmail.com>
 <c5a52444-80cd-4e50-8fc4-a3a9bc09feb4@app.fastmail.com>
 <CAFSGpE2j29C0MntAe59oay0sm1W_htQFyAbfuJeEViJ0BN1Wyg@mail.gmail.com>
 <96edd171-9cbe-466d-b3d6-04e069cee419@app.fastmail.com>
 <CAFSGpE2xzAz4zefZa8sQLkNajp0hT7LiONQDGSAxigwGG3ii8w@mail.gmail.com>
 <44dafe90-9ad6-41ae-b9fe-bea4aaf49a59@app.fastmail.com>
Subject: Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance
Content-Type: multipart/alternative;
 boundary=4b20097cb77a440d9fe2b0f0882b7a21
Archived-At: <https://www.postgresql.org/message-id/8f750558-17b5-4d87-a03f-0dcfdbf4899c%40app.fastmail.com>
Precedence: bulk

--4b20097cb77a440d9fe2b0f0882b7a21
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Also, Neon [0] and Aurora [1] pricing is so high that it seems to make m=
ost use-cases impractical (well, if you want a managed offering...). Neo=
n's top public tier is not even what a single modern dedicated server (o=
r virtual machine) can provide. I would have thought decoupling compute =
and storage would make the offerings cheaper, if anything.

Taking my own Merklemap [2] use-case where I run a 30TB database with Ne=
on pricing (and I don't doubt that the non-public pricing would be even =
more expensive than that):

Storage Scaling:

- Business plan: 500 GB -> $700
- You need: 30,000 GB (30 TB)
- Scaling factor: 60x
- Linear estimate: $700 =C3=97 60 =3D $42,000/month
- Total 12 months cost: $504,000

Aurora calculation [3]:

- Instance type: db.r5.24xlarge
- Monthly cost: $21,887.28
- Total 12 months cost: $262,647.36

Now, calculating the same 30TB with the same instance type and S3 storag=
e [4]:

- Instance Type: r5.24xlarge
- Monthly cost: $5,555.04
- Total 12 months cost: $66,660.48

But more interestingly, you don't need to use AWS at all anymore, becaus=
e you can just move your setup anywhere at this point, as you get a simi=
lar level of reliability - and simplicity - but with very cheap services.

Hetzner ccx63 + Cloudflare R2:

- Hetzner ccx63: =E2=82=AC287.99/month =E2=89=88 $338/month
- R2 storage (30TB): 30,000 GB =C3=97 $0.015 =3D $450/month
- R2 operations: Should be measured to be calculated properly, but will =
probably be negligible.
- Total monthly: ~$760
- Total 12 months cost: $9,120/year

Best,
Pierre

[0] https://neon.com/pricing
[1] https://aws.amazon.com/rds/aurora/pricing/
[2] https://www.merklemap.com/
[3] https://calculator.aws/#/estimate?id=3D3f0ce6a91eed9a666d54bb8852ea0=
0b042c3cd6e
[4] https://calculator.aws/#/estimate?id=3D1a77d8da3489bafc8681c6fd738a3=
186fb749ea3

On Sat, Jul 26, 2025, at 09:51, Pierre Barre wrote:
> Ah, by "shared storage" I mean that each node can acquire exclusivity,=
 not that they can both R/W to it at the same time.
>=20
> > Some pretty well-known cases of storage / compute separation (Aurora=
, Neon) also share the storage between instances,
>=20
> That model is cool, but I think it's more of a solution for outliers a=
s I was suggesting, not something that most would or should want.
>=20
> Best,
> Pierre
>=20
> On Sat, Jul 26, 2025, at 09:42, Vladimir Churyukin wrote:
>> Sorry, I was referring to this:
>>=20
>> >  But when PostgreSQL instances share storage rather than replicate:
>> > - Consistency seems maintained (same data)
>> > - Availability seems maintained (client can always promote an acces=
sible node)
>> > - Partitions between PostgreSQL nodes don't prevent the system from=
 functioning
>>=20
>> Some pretty well-known cases of storage / compute separation (Aurora,=
 Neon) also share the storage between instances,
>> that's why I'm a bit confused by your reply. I thought you're thinkin=
g about this approach too, that's why I mentioned what kind of challenge=
s one may have on that path.
>>=20
>>=20
>> On Sat, Jul 26, 2025 at 12:36=E2=80=AFAM Pierre Barre <pierre@barre.s=
h> wrote:
>>> __
>>> What you describe doesn=E2=80=99t look like something very useful fo=
r the vast majority of projects that needs a database. Why would you eve=
n want that if you can avoid it?=20
>>>=20
>>> If your =E2=80=9Csingle node=E2=80=9D can handle tens / hundreds of =
thousands requests per second, still have very durable and highly availa=
ble storage, as well as fast recovery mechanisms, what=E2=80=99s the poi=
nt?
>>>=20
>>> I am not trying to cater to extreme outliers that may want very weir=
d like this, that=E2=80=99s just not the use-cases I want to address, be=
cause I believe they are few and far between.
>>>=20
>>> Best,
>>> Pierre=20
>>>=20
>>> On Sat, Jul 26, 2025, at 08:57, Vladimir Churyukin wrote:
>>>> A shared storage would require a lot of extra work. That's essentia=
lly what AWS Aurora does.
>>>> You will have to have functionality to sync in-memory states betwee=
n nodes, because all the instances will have cached data that can easily=
 become stale on any write operation.
>>>> That alone is not that simple. You will have to modify some locking=
 logic. Most likely do a lot of other changes in a lot of places, Postgr=
es was not just built with the assumption that the storage can be shared.
>>>>=20
>>>> -Vladimir
>>>>=20
>>>> On Fri, Jul 18, 2025 at 5:31=E2=80=AFAM Pierre Barre <pierre@barre.=
sh> wrote:
>>>>> Now, I'm trying to understand how CAP theorem applies here. Tradit=
ional PostgreSQL replication has clear CAP trade-offs - you choose betwe=
en consistency and availability during partitions.
>>>>>=20
>>>>> But when PostgreSQL instances share storage rather than replicate:
>>>>> - Consistency seems maintained (same data)
>>>>> - Availability seems maintained (client can always promote an acce=
ssible node)
>>>>> - Partitions between PostgreSQL nodes don't prevent the system fro=
m functioning
>>>>>=20
>>>>> It seems that CAP assumes specific implementation details (like no=
des maintaining independent state) without explicitly stating them.
>>>>>=20
>>>>> How should we think about CAP theorem when distributed nodes share=
 storage rather than coordinate state? Are the trade-offs simply moved t=
o a different layer, or does shared storage fundamentally change the ana=
lysis?
>>>>>=20
>>>>> Client with awareness of both PostgreSQL nodes
>>>>>     |                               |
>>>>>     =E2=86=93 (partition here)              =E2=86=93
>>>>> PostgreSQL Primary              PostgreSQL Standby
>>>>>     |                               |
>>>>>     =E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=AC=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=98
>>>>>                 =E2=86=93
>>>>>          Shared ZFS Pool
>>>>>                 |
>>>>>          6 Global ZeroFS instances
>>>>>=20
>>>>> Best,
>>>>> Pierre
>>>>>=20
>>>>> On Fri, Jul 18, 2025, at 12:57, Pierre Barre wrote:
>>>>> > Hi Seref,
>>>>> >
>>>>> > For the benchmarks, I used Hetzner's cloud service with the foll=
owing setup:
>>>>> >
>>>>> > - A Hetzner s3 bucket in the FSN1 region
>>>>> > - A virtual machine of type ccx63 48 vCPU 192 GB memory
>>>>> > - 3 ZeroFS nbd devices (same s3 bucket)
>>>>> > - A ZFS stripped pool with the 3 devices
>>>>> > - 200GB zfs L2ARC
>>>>> > - Postgres configured accordingly memory-wise as well as with sy=
nchronous_commit =3D off, wal_init_zero =3D off and wal_recycle =3D off.
>>>>> >
>>>>> > Best,
>>>>> > Pierre
>>>>> >
>>>>> > On Fri, Jul 18, 2025, at 12:42, Seref Arikan wrote:
>>>>> >> Sorry, this was meant to go to the whole group:
>>>>> >>
>>>>> >> Very interesting!. Great work. Can you clarify how exactly you'=
re running postgres in your tests? A specific AWS service? What's the te=
st infrastructure that sits above the file system?
>>>>> >>
>>>>> >> On Thu, Jul 17, 2025 at 11:59=E2=80=AFPM Pierre Barre <pierre@b=
arre.sh> wrote:
>>>>> >>> Hi everyone,
>>>>> >>>
>>>>> >>> I wanted to share a project I've been working on that enables =
PostgreSQL to run on S3 storage while maintaining performance comparable=
 to local NVMe. The approach uses block-level access rather than trying =
to map filesystem operations to S3 objects.
>>>>> >>>
>>>>> >>> ZeroFS: https://github.com/Barre/ZeroFS
>>>>> >>>
>>>>> >>> # The Architecture
>>>>> >>>
>>>>> >>> ZeroFS provides NBD (Network Block Device) servers that expose=
 S3 storage as raw block devices. PostgreSQL runs unmodified on ZFS pool=
s built on these block devices:
>>>>> >>>
>>>>> >>> PostgreSQL -> ZFS -> NBD -> ZeroFS -> S3
>>>>> >>>
>>>>> >>> By providing block-level access and leveraging ZFS's caching c=
apabilities (L2ARC), we can achieve microsecond latencies despite the un=
derlying storage being in S3.
>>>>> >>>
>>>>> >>> ## Performance Results
>>>>> >>>
>>>>> >>> Here are pgbench results from PostgreSQL running on this setup:
>>>>> >>>
>>>>> >>> ### Read/Write Workload
>>>>> >>>
>>>>> >>> ```
>>>>> >>> postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 1000=
00 example
>>>>> >>> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1))
>>>>> >>> starting vacuum...end.
>>>>> >>> transaction type: <builtin: TPC-B (sort of)>
>>>>> >>> scaling factor: 50
>>>>> >>> query mode: simple
>>>>> >>> number of clients: 50
>>>>> >>> number of threads: 15
>>>>> >>> maximum number of tries: 1
>>>>> >>> number of transactions per client: 100000
>>>>> >>> number of transactions actually processed: 5000000/5000000
>>>>> >>> number of failed transactions: 0 (0.000%)
>>>>> >>> latency average =3D 0.943 ms
>>>>> >>> initial connection time =3D 48.043 ms
>>>>> >>> tps =3D 53041.006947 (without initial connection time)
>>>>> >>> ```
>>>>> >>>
>>>>> >>> ### Read-Only Workload
>>>>> >>>
>>>>> >>> ```
>>>>> >>> postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 1000=
00 -S example
>>>>> >>> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1))
>>>>> >>> starting vacuum...end.
>>>>> >>> transaction type: <builtin: select only>
>>>>> >>> scaling factor: 50
>>>>> >>> query mode: simple
>>>>> >>> number of clients: 50
>>>>> >>> number of threads: 15
>>>>> >>> maximum number of tries: 1
>>>>> >>> number of transactions per client: 100000
>>>>> >>> number of transactions actually processed: 5000000/5000000
>>>>> >>> number of failed transactions: 0 (0.000%)
>>>>> >>> latency average =3D 0.121 ms
>>>>> >>> initial connection time =3D 53.358 ms
>>>>> >>> tps =3D 413436.248089 (without initial connection time)
>>>>> >>> ```
>>>>> >>>
>>>>> >>> These numbers are with 50 concurrent clients and the actual da=
ta stored in S3. Hot data is served from ZFS L2ARC and ZeroFS's memory c=
aches, while cold data comes from S3.
>>>>> >>>
>>>>> >>> ## How It Works
>>>>> >>>
>>>>> >>> 1. ZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQ=
L/ZFS can use like any other block device
>>>>> >>> 2. Multiple cache layers hide S3 latency:
>>>>> >>>    a. ZFS ARC/L2ARC for frequently accessed blocks
>>>>> >>>    b. ZeroFS memory cache for metadata and hot dataZeroFS expo=
ses NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use like any o=
ther block device
>>>>> >>>    c. Optional local disk cache
>>>>> >>> 3. All data is encrypted (ChaCha20-Poly1305) before hitting S3
>>>>> >>> 4. Files are split into 128KB chunks for insertion into ZeroFS=
' LSM-tree
>>>>> >>>
>>>>> >>> ## Geo-Distributed PostgreSQL
>>>>> >>>
>>>>> >>> Since each region can run its own ZeroFS instance, you can cre=
ate geographically distributed PostgreSQL setups.
>>>>> >>>
>>>>> >>> Example architectures:
>>>>> >>>
>>>>> >>> Architecture 1
>>>>> >>>
>>>>> >>>
>>>>> >>>                          PostgreSQL Client
>>>>> >>>                                    |
>>>>> >>>                                    | SQL queries
>>>>> >>>                                    |
>>>>> >>>                             +--------------+
>>>>> >>>                             |  PG Proxy    |
>>>>> >>>                             | (HAProxy/    |
>>>>> >>>                             |  PgBouncer)  |
>>>>> >>>                             +--------------+
>>>>> >>>                                /        \
>>>>> >>>                               /          \
>>>>> >>>                    Synchronous            Synchronous
>>>>> >>>                    Replication            Replication
>>>>> >>>                             /              \
>>>>> >>>                            /                \
>>>>> >>>               +---------------+        +---------------+
>>>>> >>>               | PostgreSQL 1  |        | PostgreSQL 2  |
>>>>> >>>               | (Primary)     |=E2=97=84------=E2=96=BA| (Stan=
dby)     |
>>>>> >>>               +---------------+        +---------------+
>>>>> >>>                       |                        |
>>>>> >>>                       |  POSIX filesystem ops  |
>>>>> >>>                       |                        |
>>>>> >>>               +---------------+        +---------------+
>>>>> >>>               |   ZFS Pool 1  |        |   ZFS Pool 2  |
>>>>> >>>               | (3-way mirror)|        | (3-way mirror)|
>>>>> >>>               +---------------+        +---------------+
>>>>> >>>                /      |      \          /      |      \
>>>>> >>>               /       |       \        /       |       \
>>>>> >>>         NBD:10809 NBD:10810 NBD:10811  NBD:10812 NBD:10813 NBD=
:10814
>>>>> >>>              |        |        |           |        |        |
>>>>> >>>         +--------++--------++--------++--------++--------++---=
-----+
>>>>> >>>         |ZeroFS 1||ZeroFS 2||ZeroFS 3||ZeroFS 4||ZeroFS 5||Zer=
oFS 6|
>>>>> >>>         +--------++--------++--------++--------++--------++---=
-----+
>>>>> >>>              |         |         |         |         |        =
 |
>>>>> >>>              |         |         |         |         |        =
 |
>>>>> >>>         S3-Region1 S3-Region2 S3-Region3 S3-Region4 S3-Region5=
 S3-Region6
>>>>> >>>         (us-east) (eu-west) (ap-south) (us-west) (eu-north) (a=
p-east)
>>>>> >>>
>>>>> >>> Architecture 2:
>>>>> >>>
>>>>> >>> PostgreSQL Primary (Region 1) =E2=86=90=E2=86=92 PostgreSQL St=
andby (Region 2)
>>>>> >>>                 \                    /
>>>>> >>>                  \                  /
>>>>> >>>                   Same ZFS Pool (NBD)
>>>>> >>>                          |
>>>>> >>>                   6 Global ZeroFS
>>>>> >>>                          |
>>>>> >>>                       S3 Regions
>>>>> >>>
>>>>> >>>
>>>>> >>> The main advantages I see are:
>>>>> >>> 1. Dramatic cost reduction for large datasets
>>>>> >>> 2. Simplified geo-distribution
>>>>> >>> 3. Infinite storage capacity
>>>>> >>> 4. Built-in encryption and compression
>>>>> >>>
>>>>> >>> Looking forward to your feedback and questions!
>>>>> >>>
>>>>> >>> Best,
>>>>> >>> Pierre
>>>>> >>>
>>>>> >>> P.S. The full project includes a custom NFS filesystem too.
>>>>> >>>
>>>>> >
>>>>>=20
>>>=20
>=20

--4b20097cb77a440d9fe2b0f0882b7a21
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html><html><head><title></title></head><body><div>Also, Neon [=
0] and Aurora [1] pricing is so high that it seems to make most use-case=
s impractical (well, if you want a managed offering...). Neon's top publ=
ic tier is not even what a single modern dedicated server (or virtual ma=
chine) can provide. I would have thought decoupling compute and storage =
would make the offerings cheaper, if anything.</div><div><br></div><div>=
Taking my own Merklemap [2] use-case where I run a 30TB database with Ne=
on pricing (and I don't doubt that the non-public pricing would be even =
more expensive than that):</div><div><br></div><div>Storage Scaling:</di=
v><div><br></div><div>- Business plan: 500 GB -&gt; $700</div><div>- You=
 need: 30,000 GB (30 TB)</div><div>- Scaling factor: 60x</div><div>- Lin=
ear estimate: $700 =C3=97 60 =3D $42,000/month</div><div>- Total 12 mont=
hs cost: $504,000</div><div><br></div><div>Aurora calculation [3]:</div>=
<div><br></div><div>- Instance type: db.r5.24xlarge</div><div>- Monthly =
cost: $21,887.28</div><div>- Total 12 months cost: $262,647.36</div><div=
><br></div><div>Now, calculating the same 30TB with the same instance ty=
pe and S3 storage [4]:</div><div><br></div><div>- Instance Type: r5.24xl=
arge</div><div>- Monthly cost: $5,555.04</div><div>- Total 12 months cos=
t: $66,660.48</div><div><br></div><div>But more interestingly, you don't=
 need to use AWS at all anymore, because you can just move your setup an=
ywhere at this point, as you get a similar level of reliability - and si=
mplicity - but with very cheap services.</div><div><br></div><div>Hetzne=
r ccx63 + Cloudflare R2:</div><div><br></div><div>- Hetzner ccx63: =E2=82=
=AC287.99/month =E2=89=88 $338/month</div><div>- R2 storage (30TB): 30,0=
00 GB =C3=97 $0.015 =3D $450/month</div><div>- R2 operations: Should be =
measured to be calculated properly, but will probably be negligible.</di=
v><div>- Total monthly: ~$760</div><div>- Total 12 months cost: $9,120/y=
ear</div><div><br></div><div>Best,</div><div>Pierre</div><div><br></div>=
<div>[0]&nbsp;<a href=3D"https://neon.com/pricing">https://neon.com/pric=
ing</a></div><div>[1]&nbsp;<a href=3D"https://aws.amazon.com/rds/aurora/=
pricing/">https://aws.amazon.com/rds/aurora/pricing/</a><br></div><div>[=
2]&nbsp;<a href=3D"https://www.merklemap.com/">https://www.merklemap.com=
/</a></div><div>[3]&nbsp;<span class=3D"color" style=3D"color:rgb(0, 0, =
0);"><a href=3D"https://calculator.aws/#/estimate?id=3D3f0ce6a91eed9a666=
d54bb8852ea00b042c3cd6e">https://calculator.aws/#/estimate?id=3D3f0ce6a9=
1eed9a666d54bb8852ea00b042c3cd6e</a></span><br></div><div>[4]&nbsp;<span=
 class=3D"color" style=3D"color:rgb(0, 0, 0);"><a href=3D"https://calcul=
ator.aws/#/estimate?id=3D1a77d8da3489bafc8681c6fd738a3186fb749ea3">https=
://calculator.aws/#/estimate?id=3D1a77d8da3489bafc8681c6fd738a3186fb749e=
a3</a></span></div><div><span class=3D"color" style=3D"color:rgb(0, 0, 0=
);"></span><br></div><div>On Sat, Jul 26, 2025, at 09:51, Pierre Barre w=
rote:</div><blockquote type=3D"cite" id=3D"qt" style=3D""><div>Ah, by "s=
hared storage" I mean that each node can acquire exclusivity, not that t=
hey can both R/W to it at the same time.</div><div><br></div><div>&gt;&n=
bsp;Some pretty well-known cases of storage / compute separation (Aurora=
, Neon) also share the storage between instances,</div><div><br></div><d=
iv>That model is cool, but I think it's more of a solution for outliers =
as I was suggesting, not something that most would or should want.</div>=
<div><br></div><div>Best,</div><div>Pierre</div><div><br></div><div>On S=
at, Jul 26, 2025, at 09:42, Vladimir Churyukin wrote:</div><blockquote t=
ype=3D"cite" id=3D"qt-qt" style=3D""><div dir=3D"ltr"><div>Sorry, I was =
referring to this:</div><div><br></div><div><div>&gt;&nbsp;&nbsp;<span s=
tyle=3D"color:rgb(80, 0, 80);">But when PostgreSQL instances share stora=
ge rather than replicate:</span></div><div><span style=3D"color:rgb(80, =
0, 80);">&gt; - Consistency seems maintained (same data)</span></div><di=
v><span style=3D"color:rgb(80, 0, 80);">&gt; - Availability seems mainta=
ined (client can always promote an accessible node)</span></div><div><sp=
an style=3D"color:rgb(80, 0, 80);">&gt; - Partitions between PostgreSQL =
nodes don't prevent the system from functioning</span></div></div><div><=
span style=3D"color:rgb(80, 0, 80);"></span><br></div><div>Some pretty w=
ell-known cases of storage / compute separation (Aurora, Neon) also shar=
e the storage between instances,</div><div>that's why I'm a bit confused=
 by your reply. I thought you're thinking about this approach too, that'=
s why I mentioned what kind of challenges one may have on that path.</di=
v><div><span class=3D"qt-color" style=3D"color:rgb(80, 0, 80);"></span><=
br></div></div><div><br></div><div class=3D"qt-qt-gmail_quote qt-qt-gmai=
l_quote_container"><div dir=3D"ltr" class=3D"qt-qt-gmail_attr">On Sat, J=
ul 26, 2025 at 12:36=E2=80=AFAM Pierre Barre &lt;<a href=3D"mailto:pierr=
e@barre.sh">pierre@barre.sh</a>&gt; wrote:</div><blockquote class=3D"qt-=
qt-gmail_quote" style=3D"margin-top:0px;margin-right:0px;margin-bottom:0=
px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;borde=
r-left-color:rgb(204, 204, 204);padding-left:1ex;"><div><u></u><br></div=
><div><div>What you describe doesn=E2=80=99t look like something very us=
eful for the vast majority of projects that needs a database. Why would =
you even want that if you can avoid it?&nbsp;</div><div><br></div><div>I=
f your =E2=80=9Csingle node=E2=80=9D can handle tens / hundreds of thous=
ands requests per second, still have very durable and highly available s=
torage, as well as fast recovery mechanisms, what=E2=80=99s the point?</=
div><div><br></div><div>I am not trying to cater to extreme outliers tha=
t may want very weird like this, that=E2=80=99s just not the use-cases I=
 want to address, because I believe they are few and far between.</div><=
div><br></div><div>Best,</div><div>Pierre&nbsp;</div><div><br></div><div=
>On Sat, Jul 26, 2025, at 08:57, Vladimir Churyukin wrote:</div><blockqu=
ote type=3D"cite" id=3D"qt-qt-m_7592450530125555523qt"><div dir=3D"ltr">=
<div>A shared storage would require a lot of extra work. That's essentia=
lly what AWS Aurora does.</div><div>You will have to have functionality =
to sync in-memory states between nodes, because all the instances will h=
ave cached data that can easily become stale on any write operation.</di=
v><div>That alone is not that simple. You will have to modify some locki=
ng logic. Most likely do a lot of other changes in a lot of places, Post=
gres was not just built with the assumption that the storage can be shar=
ed.</div><div><br></div><div>-Vladimir</div></div><div><br></div><div><d=
iv dir=3D"ltr">On Fri, Jul 18, 2025 at 5:31=E2=80=AFAM Pierre Barre &lt;=
<a href=3D"mailto:pierre@barre.sh" target=3D"_blank">pierre@barre.sh</a>=
&gt; wrote:</div><blockquote style=3D"margin-top:0px;margin-right:0px;ma=
rgin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-styl=
e:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div>Now=
, I'm trying to understand how CAP theorem applies here. Traditional Pos=
tgreSQL replication has clear CAP trade-offs - you choose between consis=
tency and availability during partitions.</div><div><br></div><div>But w=
hen PostgreSQL instances share storage rather than replicate:</div><div>=
- Consistency seems maintained (same data)</div><div>- Availability seem=
s maintained (client can always promote an accessible node)</div><div>- =
Partitions between PostgreSQL nodes don't prevent the system from functi=
oning</div><div><br></div><div>It seems that CAP assumes specific implem=
entation details (like nodes maintaining independent state) without expl=
icitly stating them.</div><div><br></div><div>How should we think about =
CAP theorem when distributed nodes share storage rather than coordinate =
state? Are the trade-offs simply moved to a different layer, or does sha=
red storage fundamentally change the analysis?</div><div><br></div><div>=
Client with awareness of both PostgreSQL nodes</div><div>&nbsp; &nbsp; |=
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|</div><div>&nbsp; &nbsp; =E2=86=93=
 (partition here)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =E2=86=
=93</div><div>PostgreSQL Primary&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
; &nbsp; PostgreSQL Standby</div><div>&nbsp; &nbsp; |&nbsp; &nbsp; &nbsp=
; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;|</div><div>&nbsp; &nbsp; =E2=94=94=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=AC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=98</div><div>&nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =E2=86=93</div><div>&nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp;Shared ZFS Pool</div><div>&nbsp; &nbsp; &=
nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |</div><div>&nbsp; &nbsp; &nbsp=
; &nbsp; &nbsp;6 Global ZeroFS instances</div><div><br></div><div>Best,<=
/div><div>Pierre</div><div><br></div><div>On Fri, Jul 18, 2025, at 12:57=
, Pierre Barre wrote:</div><div>&gt; Hi Seref,</div><div>&gt;</div><div>=
&gt; For the benchmarks, I used Hetzner's cloud service with the followi=
ng setup:</div><div>&gt;</div><div>&gt; - A Hetzner s3 bucket in the FSN=
1 region</div><div>&gt; - A virtual machine of type ccx63 48 vCPU 192 GB=
 memory</div><div>&gt; - 3 ZeroFS nbd devices (same s3 bucket)</div><div=
>&gt; - A ZFS stripped pool with the 3 devices</div><div>&gt; - 200GB zf=
s L2ARC</div><div>&gt; - Postgres configured accordingly memory-wise as =
well as with synchronous_commit =3D off, wal_init_zero =3D off and wal_r=
ecycle =3D off.</div><div>&gt;</div><div>&gt; Best,</div><div>&gt; Pierr=
e</div><div>&gt;</div><div>&gt; On Fri, Jul 18, 2025, at 12:42, Seref Ar=
ikan wrote:</div><div>&gt;&gt; Sorry, this was meant to go to the whole =
group:</div><div>&gt;&gt;</div><div>&gt;&gt; Very interesting!. Great wo=
rk. Can you clarify how exactly you're running postgres in your tests? A=
 specific AWS service? What's the test infrastructure that sits above th=
e file system?</div><div>&gt;&gt;</div><div>&gt;&gt; On Thu, Jul 17, 202=
5 at 11:59=E2=80=AFPM Pierre Barre &lt;<a href=3D"mailto:pierre@barre.sh=
" target=3D"_blank">pierre@barre.sh</a>&gt; wrote:</div><div>&gt;&gt;&gt=
; Hi everyone,</div><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; I wanted to=
 share a project I've been working on that enables PostgreSQL to run on =
S3 storage while maintaining performance comparable to local NVMe. The a=
pproach uses block-level access rather than trying to map filesystem ope=
rations to S3 objects.</div><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; Zer=
oFS: <a href=3D"https://github.com/Barre/ZeroFS" rel=3D"noreferrer" targ=
et=3D"_blank">https://github.com/Barre/ZeroFS</a></div><div>&gt;&gt;&gt;=
</div><div>&gt;&gt;&gt; # The Architecture</div><div>&gt;&gt;&gt;</div><=
div>&gt;&gt;&gt; ZeroFS provides NBD (Network Block Device) servers that=
 expose S3 storage as raw block devices. PostgreSQL runs unmodified on Z=
FS pools built on these block devices:</div><div>&gt;&gt;&gt;</div><div>=
&gt;&gt;&gt; PostgreSQL -&gt; ZFS -&gt; NBD -&gt; ZeroFS -&gt; S3</div><=
div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; By providing block-level access =
and leveraging ZFS's caching capabilities (L2ARC), we can achieve micros=
econd latencies despite the underlying storage being in S3.</div><div>&g=
t;&gt;&gt;</div><div>&gt;&gt;&gt; ## Performance Results</div><div>&gt;&=
gt;&gt;</div><div>&gt;&gt;&gt; Here are pgbench results from PostgreSQL =
running on this setup:</div><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; ###=
 Read/Write Workload</div><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; ```</=
div><div>&gt;&gt;&gt; postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -=
j 15 -t 100000 example</div><div>&gt;&gt;&gt; pgbench (16.9 (Ubuntu 16.9=
-0ubuntu0.24.04.1))</div><div>&gt;&gt;&gt; starting vacuum...end.</div><=
div>&gt;&gt;&gt; transaction type: &lt;builtin: TPC-B (sort of)&gt;</div=
><div>&gt;&gt;&gt; scaling factor: 50</div><div>&gt;&gt;&gt; query mode:=
 simple</div><div>&gt;&gt;&gt; number of clients: 50</div><div>&gt;&gt;&=
gt; number of threads: 15</div><div>&gt;&gt;&gt; maximum number of tries=
: 1</div><div>&gt;&gt;&gt; number of transactions per client: 100000</di=
v><div>&gt;&gt;&gt; number of transactions actually processed: 5000000/5=
000000</div><div>&gt;&gt;&gt; number of failed transactions: 0 (0.000%)<=
/div><div>&gt;&gt;&gt; latency average =3D 0.943 ms</div><div>&gt;&gt;&g=
t; initial connection time =3D 48.043 ms</div><div>&gt;&gt;&gt; tps =3D =
53041.006947 (without initial connection time)</div><div>&gt;&gt;&gt; ``=
`</div><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; ### Read-Only Workload</=
div><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; ```</div><div>&gt;&gt;&gt; =
postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 -S exam=
ple</div><div>&gt;&gt;&gt; pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1))=
</div><div>&gt;&gt;&gt; starting vacuum...end.</div><div>&gt;&gt;&gt; tr=
ansaction type: &lt;builtin: select only&gt;</div><div>&gt;&gt;&gt; scal=
ing factor: 50</div><div>&gt;&gt;&gt; query mode: simple</div><div>&gt;&=
gt;&gt; number of clients: 50</div><div>&gt;&gt;&gt; number of threads: =
15</div><div>&gt;&gt;&gt; maximum number of tries: 1</div><div>&gt;&gt;&=
gt; number of transactions per client: 100000</div><div>&gt;&gt;&gt; num=
ber of transactions actually processed: 5000000/5000000</div><div>&gt;&g=
t;&gt; number of failed transactions: 0 (0.000%)</div><div>&gt;&gt;&gt; =
latency average =3D 0.121 ms</div><div>&gt;&gt;&gt; initial connection t=
ime =3D 53.358 ms</div><div>&gt;&gt;&gt; tps =3D 413436.248089 (without =
initial connection time)</div><div>&gt;&gt;&gt; ```</div><div>&gt;&gt;&g=
t;</div><div>&gt;&gt;&gt; These numbers are with 50 concurrent clients a=
nd the actual data stored in S3. Hot data is served from ZFS L2ARC and Z=
eroFS's memory caches, while cold data comes from S3.</div><div>&gt;&gt;=
&gt;</div><div>&gt;&gt;&gt; ## How It Works</div><div>&gt;&gt;&gt;</div>=
<div>&gt;&gt;&gt; 1. ZeroFS exposes NBD devices (e.g., /dev/nbd0) that P=
ostgreSQL/ZFS can use like any other block device</div><div>&gt;&gt;&gt;=
 2. Multiple cache layers hide S3 latency:</div><div>&gt;&gt;&gt;&nbsp; =
&nbsp; a. ZFS ARC/L2ARC for frequently accessed blocks</div><div>&gt;&gt=
;&gt;&nbsp; &nbsp; b. ZeroFS memory cache for metadata and hot dataZeroF=
S exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use like=
 any other block device</div><div>&gt;&gt;&gt;&nbsp; &nbsp; c. Optional =
local disk cache</div><div>&gt;&gt;&gt; 3. All data is encrypted (ChaCha=
20-Poly1305) before hitting S3</div><div>&gt;&gt;&gt; 4. Files are split=
 into 128KB chunks for insertion into ZeroFS' LSM-tree</div><div>&gt;&gt=
;&gt;</div><div>&gt;&gt;&gt; ## Geo-Distributed PostgreSQL</div><div>&gt=
;&gt;&gt;</div><div>&gt;&gt;&gt; Since each region can run its own ZeroF=
S instance, you can create geographically distributed PostgreSQL setups.=
</div><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; Example architectures:</d=
iv><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; Architecture 1</div><div>&gt=
;&gt;&gt;</div><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
; PostgreSQL Client</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbs=
p; &nbsp; &nbsp; &nbsp; |</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
; &nbsp; &nbsp; &nbsp; &nbsp; | SQL queries</div><div>&gt;&gt;&gt;&nbsp;=
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &=
nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |</div><div>&gt;&gt;&gt;=
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp; &nbsp; &nbsp; &nbsp; &nbsp;+--------------+</div><div>&gt;&gt;&gt;&=
nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; PG Proxy&nbsp; &nbsp; |</div><div=
>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;| (HAProxy/&nbsp; &nbsp; |<=
/div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &=
nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; PgBouncer)=
&nbsp; |</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;=
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+--------=
------+</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /&=
nbsp; &nbsp; &nbsp; &nbsp; \</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp; &nbsp; &nbsp;/&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \</div><div>&gt;&g=
t;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nb=
sp; Synchronous&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Synchronous</di=
v><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbs=
p; &nbsp; &nbsp; Replication&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Re=
plication</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/&nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \</div><div>&gt;&gt;&gt;&nbsp;=
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &=
nbsp; &nbsp; &nbsp; /&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp; \</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &=
nbsp; &nbsp;+---------------+&nbsp; &nbsp; &nbsp; &nbsp; +--------------=
-+</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
; &nbsp;| PostgreSQL 1&nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; | PostgreSQL 2=
&nbsp; |</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;=
 &nbsp; &nbsp;| (Primary)&nbsp; &nbsp; &nbsp;|=E2=97=84------=E2=96=BA| =
(Standby)&nbsp; &nbsp; &nbsp;|</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp=
; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+---------------+&nbsp; &nbsp; &nbsp=
; &nbsp; +---------------+</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &n=
bsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbs=
p; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;=
 |</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; POSIX filesystem ops&nbsp; |=
</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |</div><div>&gt;&gt;&gt;&=
nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+---------------+&=
nbsp; &nbsp; &nbsp; &nbsp; +---------------+</div><div>&gt;&gt;&gt;&nbsp=
; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp;ZFS Pool=
 1&nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;ZFS Pool 2&nbsp; |</=
div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp;| (3-way mirror)|&nbsp; &nbsp; &nbsp; &nbsp; | (3-way mirror)|</div>=
<div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;=
+---------------+&nbsp; &nbsp; &nbsp; &nbsp; +---------------+</div><div=
>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /&n=
bsp; &nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; \&nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp; /&nbsp; &nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; \</div><div>&gt;&gt;&gt=
;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/&nbsp; &nbsp; &=
nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp;\&nbsp; &nbsp; &nbsp; &nbsp; /&n=
bsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp;\</div><div>&gt;&gt=
;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NBD:10809 NBD:10810 NBD:10811&nbs=
p; NBD:10812 NBD:10813 NBD:10814</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nb=
sp; &nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbs=
p; &nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; |</div><div>&gt;&gt;&gt;&n=
bsp; &nbsp; &nbsp; &nbsp; &nbsp;+--------++--------++--------++--------+=
+--------++--------+</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &=
nbsp;|ZeroFS 1||ZeroFS 2||ZeroFS 3||ZeroFS 4||ZeroFS 5||ZeroFS 6|</div><=
div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+--------++--------++-=
-------++--------++--------++--------+</div><div>&gt;&gt;&gt;&nbsp; &nbs=
p; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
;|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|=
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|</=
div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&=
nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp=
; &nbsp; &nbsp; &nbsp; &nbsp;|</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp=
; &nbsp; &nbsp;S3-Region1 S3-Region2 S3-Region3 S3-Region4 S3-Region5 S3=
-Region6</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;(us-eas=
t) (eu-west) (ap-south) (us-west) (eu-north) (ap-east)</div><div>&gt;&gt=
;&gt;</div><div>&gt;&gt;&gt; Architecture 2:</div><div>&gt;&gt;&gt;</div=
><div>&gt;&gt;&gt; PostgreSQL Primary (Region 1) =E2=86=90=E2=86=92 Post=
greSQL Standby (Region 2)</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\&nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /</div><div>&gt;&gt;&gt;&nbsp; &nbs=
p; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \&nbsp; &nbsp; &nbsp=
; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /</div><div>&gt;&gt;&gt;&nbs=
p; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Same ZF=
S Pool (NBD)</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |</div><div>&gt;&g=
t;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nb=
sp;6 Global ZeroFS</div><div>&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |</div><div>=
&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbs=
p; &nbsp; &nbsp; &nbsp;S3 Regions</div><div>&gt;&gt;&gt;</div><div>&gt;&=
gt;&gt;</div><div>&gt;&gt;&gt; The main advantages I see are:</div><div>=
&gt;&gt;&gt; 1. Dramatic cost reduction for large datasets</div><div>&gt=
;&gt;&gt; 2. Simplified geo-distribution</div><div>&gt;&gt;&gt; 3. Infin=
ite storage capacity</div><div>&gt;&gt;&gt; 4. Built-in encryption and c=
ompression</div><div>&gt;&gt;&gt;</div><div>&gt;&gt;&gt; Looking forward=
 to your feedback and questions!</div><div>&gt;&gt;&gt;</div><div>&gt;&g=
t;&gt; Best,</div><div>&gt;&gt;&gt; Pierre</div><div>&gt;&gt;&gt;</div><=
div>&gt;&gt;&gt; P.S. The full project includes a custom NFS filesystem =
too.</div><div>&gt;&gt;&gt;</div><div>&gt;</div><div><br></div></blockqu=
ote></div></blockquote><div><br></div></div></blockquote></div></blockqu=
ote><div><br></div></blockquote><div><br></div></body></html>
--4b20097cb77a440d9fe2b0f0882b7a21--