public inbox for [email protected]  
help / color / mirror / Atom feed
Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
11+ messages / 6 participants
[nested] [flat]

* Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
@ 2025-06-25 08:35 Motog Plus <[email protected]>
  2025-06-25 08:48 ` Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 05:11 ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 12:49 ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Ron Johnson <[email protected]>
  0 siblings, 3 replies; 11+ messages in thread

From: Motog Plus @ 2025-06-25 08:35 UTC (permalink / raw)
  To: Pgsql-admin <[email protected]>

Dear PostgreSQL Community,

We are implementing a new feature in our application that is expected to
generate a significant amount of data, and we are seeking your expert
guidance on how to best handle this growth within our existing PostgreSQL
setup.



Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu
instance with the following specifications:

   - *RAM:* 32 GB
   - *Disk:* 1.2 TB
   - *vCPUs:* 16



Our database architecture utilizes a primary-standby streaming replication
setup. Application modules (running in Kubernetes pods) connect to the
database through Pgpool-II, using HikariCP for connection pooling.



We have multiple databases on our primary server, with their approximate
current sizes as follows:

   - *C:* 620 GB
   - *M:* 225 GB
   - *P:* 59 GB
   - *K:* 13 MB



The total current size of our databases is around *1 TB*. With the new
feature, we anticipate a substantial increase in data, potentially reaching *10
TB* over the next 5-7 years.



Below is the table for current size and expected growth in size:



*S.No.*

*DB*

*Current DB size*

*Future DB size*

*Schema Name*

*Current Schema size*

*Future Schema size *

1

C

1 TB

8 TB - 10 TB

acc

297 GB

3 TB - 4 TB

po

270 GB

2.6 TB - 3.5 TB

pa

27 GB

270 GB

pra

13 GB

130 GB

fu

13 GB

130 GB

te

167 MB

2 GB

pro

30 MB

300 MB

2

M

225 GB

2.2 TB - 3 TB

bi

82 GB

820 GB

co

80 GB

800 GB

ps

17 GB

170 GB

qo

16 GB

160 GB

to

7 GB

70 GB

in

7 GB

70 GB

di

6 GB

60 GB

no

4 GB

40 GB

do

4 GB

40 GB

cl

3 GB

30 GB

3

P

60 GB

600 GB

au

45 GB

450 GB

fi

8 GB

80 GB

con

4 GB

40 GB

ba

1 GB

10 GB

li

2 MB

20 GB





We would greatly appreciate your insights on the following points:

   1. *Scalability for Large Datasets:* Conceptually, PostgreSQL is known
   to handle large datasets. However, we'd like to confirm if a single
   PostgreSQL instance can realistically and efficiently manage 10-12 TB of
   data in a production environment, considering typical transaction loads.
   2. *Database Split Strategy:* Our largest database, "C," currently
   occupies 620 GB. It contains multiple schemas. We are considering splitting
   database "C" into two new databases: "C1" to exclusively house the "acc"
   schema, and "C2" for the remaining schemas. Is this a recommended approach
   for managing growth, and what are the potential pros and cons?
   3. *Server Allocation for Split Databases:* If we proceed with splitting
   "C" into "C1" and "C2," would it be advisable to assign a new, separate
   database server for "C2," or could both "C1" and "C2" reside on the same
   database server? What factors should we consider in making this decision?
   4. *Performance Limits per Database and Database Server:* From a
   performance perspective, is there a general "limit" or best practice for
   the maximum amount of data a single database server should handle (e.g., 10
   TB) and similarly general limit per database? How does this influence the
   decision to add more database servers?
   5. *Best Practices for Large-Scale Data Management:* Beyond standard
   practices like indexing and partitioning, what other best practices should
   we consider implementing to ensure optimal performance and manageability
   with such a large dataset? This could include configurations, maintenance
   strategies, etc.
   6. *Hardware Configuration Recommendations:* Based on our projected data
   growth and desired performance, what hardware configurations (e.g., RAM,
   CPU, storage I/O, storage type like NVMe) would you recommend for future
   database servers to efficiently handle 10-12 TB?
   7. *Open-Source Horizontal Scaling Solutions:* Are there any open-source
   horizontal scaling solutions for PostgreSQL (other than Citus Data) that
   the community recommends or has experience with for managing extremely
   large datasets? Any pointers or guidance on this would be highly valuable.



Thank you in advance for your time and expertise. We look forward to your
valuable insights.

Thanks & Regards,

Ramzy


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
@ 2025-06-25 08:48 ` Motog Plus <[email protected]>
  2025-07-04 13:45   ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Greg Sabino Mullane <[email protected]>
  2 siblings, 1 reply; 11+ messages in thread

From: Motog Plus @ 2025-06-25 08:48 UTC (permalink / raw)
  To: [email protected]

Dear PostgreSQL Community,

We are implementing a new feature in our application that is expected to
generate a significant amount of data, and we are seeking your expert
guidance on how to best handle this growth within our existing PostgreSQL
setup.



Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu
instance with the following specifications:

   - *RAM:* 32 GB
   - *Disk:* 1.2 TB
   - *vCPUs:* 16



Our database architecture utilizes a primary-standby streaming replication
setup. Application modules (running in Kubernetes pods) connect to the
database through Pgpool-II, using HikariCP for connection pooling.



We have multiple databases on our primary server, with their approximate
current sizes as follows:

   - *C:* 620 GB
   - *M:* 225 GB
   - *P:* 59 GB
   - *K:* 13 MB



The total current size of our databases is around *1 TB*. With the new
feature, we anticipate a substantial increase in data, potentially reaching *10
TB* over the next 5-7 years.



Below is the table for current size and expected growth in size:



*S.No.*

*DB*

*Current DB size*

*Future DB size*

*Schema Name*

*Current Schema size*

*Future Schema size *

1

C

1 TB

8 TB - 10 TB

acc

297 GB

3 TB - 4 TB

po

270 GB

2.6 TB - 3.5 TB

pa

27 GB

270 GB

pra

13 GB

130 GB

fu

13 GB

130 GB

te

167 MB

2 GB

pro

30 MB

300 MB

2

M

225 GB

2.2 TB - 3 TB

bi

82 GB

820 GB

co

80 GB

800 GB

ps

17 GB

170 GB

qo

16 GB

160 GB

to

7 GB

70 GB

in

7 GB

70 GB

di

6 GB

60 GB

no

4 GB

40 GB

do

4 GB

40 GB

cl

3 GB

30 GB

3

P

60 GB

600 GB

au

45 GB

450 GB

fi

8 GB

80 GB

con

4 GB

40 GB

ba

1 GB

10 GB

li

2 MB

20 GB





We would greatly appreciate your insights on the following points:

   1. *Scalability for Large Datasets:* Conceptually, PostgreSQL is known
   to handle large datasets. However, we'd like to confirm if a single
   PostgreSQL instance can realistically and efficiently manage 10-12 TB of
   data in a production environment, considering typical transaction loads.
   2. *Database Split Strategy:* Our largest database, "C," currently
   occupies 620 GB. It contains multiple schemas. We are considering splitting
   database "C" into two new databases: "C1" to exclusively house the "acc"
   schema, and "C2" for the remaining schemas. Is this a recommended approach
   for managing growth, and what are the potential pros and cons?
   3. *Server Allocation for Split Databases:* If we proceed with splitting
   "C" into "C1" and "C2," would it be advisable to assign a new, separate
   database server for "C2," or could both "C1" and "C2" reside on the same
   database server? What factors should we consider in making this decision?
   4. *Performance Limits per Database and Database Server:* From a
   performance perspective, is there a general "limit" or best practice for
   the maximum amount of data a single database server should handle (e.g., 10
   TB) and similarly general limit per database? How does this influence the
   decision to add more database servers?
   5. *Best Practices for Large-Scale Data Management:* Beyond standard
   practices like indexing and partitioning, what other best practices should
   we consider implementing to ensure optimal performance and manageability
   with such a large dataset? This could include configurations, maintenance
   strategies, etc.
   6. *Hardware Configuration Recommendations:* Based on our projected data
   growth and desired performance, what hardware configurations (e.g., RAM,
   CPU, storage I/O, storage type like NVMe) would you recommend for future
   database servers to efficiently handle 10-12 TB?
   7. *Open-Source Horizontal Scaling Solutions:* Are there any open-source
   horizontal scaling solutions for PostgreSQL (other than Citus Data) that
   the community recommends or has experience with for managing extremely
   large datasets? Any pointers or guidance on this would be highly valuable.



Thank you in advance for your time and expertise. We look forward to your
valuable insights.

Thanks & Regards,

Ramzy


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-25 08:48 ` Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
@ 2025-07-04 13:45   ` Greg Sabino Mullane <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: Greg Sabino Mullane @ 2025-07-04 13:45 UTC (permalink / raw)
  To: Motog Plus <[email protected]>; +Cc: [email protected]

It's hard to give generic recommendations for what really depends on your
specific needs, but here is one attempt:

using HikariCP for connection pooling.


For better scaling, look into PGBouncer, which has very fast "transaction"
and "statement" modes.

... manage 10-12 TB of data in a production environment, considering
> typical transaction loads.


Yes, 10 TB is very doable.

We are considering splitting database "C" into two new databases: "C1" to
> exclusively house the "acc" schema, and "C2" for the remaining schemas. Is
> this a recommended approach for managing growth, and what are the potential
> pros and cons?


If they are logically connected, then keep them the same database. Having
to go across databases (or across clusters) is a lot of added complexity
for little gain.


> ...or could both "C1" and "C2" reside on the same database server?


They could, but you would be sharing all the resources anyway, so you don't
gain much.


> is there a general "limit" or best practice for the maximum amount of data
> a single database server should handle (e.g., 10 TB) and similarly general
> limit per database?


No limit per se, it really depends if you start seeing effects on your
measured performance. Lots of indirect things to keep in mind as well: time
it takes to make backups, autovacuum efforts, time to spin up replicas.
These days 10TB is not considered particularly huge, but it really depends
on your workload. Don't worry about limits per database - it's all about
the total cluster size; which database things are in can be considered a
housekeeping record.

Beyond standard practices like indexing and partitioning, what other best
> practices should we consider implementing to ensure optimal performance and
> manageability with such a large dataset?


This is probably the vaguest question in the email. Obvious things are to
make sure you are doing heavy monitoring, both at the OS level and PG
level, particularly via log_min_duration_statement and pg_stat_statements.
Keep a close eye on bloat. Keep indexes to a minimum and make them all
justify their worth. Use partial and functional indexes. Make sure your
backups are solid (use pgbackrest). Test your restores regularly. Use
pgbouncer. Send simple selects to the read replicas. Automate everything
you can. Be paranoid. Assume the application is going to do everything
wrong and try to destroy your database. Get a seasoned PG DBA who will know
how to do all this and what else to look for (the mailing lists are good,
but mostly reactive and asynchronous, as you are now discovering)

Hardware Configuration Recommendations: Based on our projected data growth
> and desired performance, what hardware configurations (e.g., RAM, CPU,
> storage I/O, storage type like NVMe) would you recommend for future
> database servers to efficiently handle 10-12 TB?


Maybe someone else can attempt specifics, but it's too open-ended of a
question for me. Storage should be fast but above all, stable and reliable.
More RAM is always good. More cores is always good. Postgres scales well
vertically. Offload as much work as possible (including backups) to the
replicas. 10-12 TB is a little meaningless except in regards to backups:
what matters is how much of that 10TB is being actively used.

Open-Source Horizontal Scaling Solutions: Are there any open-source
> horizontal scaling solutions for PostgreSQL (other than Citus Data) that
> the community recommends or has experience with for managing extremely
> large datasets?


Citus is very good for certain datasets. Can be overkill for many
situations. Don't overlook streaming rep + pgpool/haproxy as a good start
for basic horizontal scaling.

The more specific future questions are, the better a reply you will get.
Showing us database sizes is not a very good metric. Some more useful
things to measure would be WAL rate, txn rate, active data size (i.e.
shared_buffers analysis), number or active connections, and which queries
are the most expensive.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
@ 2025-06-26 05:11 ` Motog Plus <[email protected]>
  2025-06-26 05:40   ` Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation David G. Johnston <[email protected]>
  2 siblings, 1 reply; 11+ messages in thread

From: Motog Plus @ 2025-06-26 05:11 UTC (permalink / raw)
  To: Pgsql-admin <[email protected]>

Dear Community,

We are still actively seeking your valuable insights and guidance on
managing our anticipated data growth.

Any thoughts or recommendations on the points raised would be greatly
appreciated.

Thanks & Regards,
Ramzy

On Wed, Jun 25, 2025, 14:05 Motog Plus <[email protected]> wrote:

> Dear PostgreSQL Community,
>
> We are implementing a new feature in our application that is expected to
> generate a significant amount of data, and we are seeking your expert
> guidance on how to best handle this growth within our existing PostgreSQL
> setup.
>
>
>
> Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu
> instance with the following specifications:
>
>    - *RAM:* 32 GB
>    - *Disk:* 1.2 TB
>    - *vCPUs:* 16
>
>
>
> Our database architecture utilizes a primary-standby streaming replication
> setup. Application modules (running in Kubernetes pods) connect to the
> database through Pgpool-II, using HikariCP for connection pooling.
>
>
>
> We have multiple databases on our primary server, with their approximate
> current sizes as follows:
>
>    - *C:* 620 GB
>    - *M:* 225 GB
>    - *P:* 59 GB
>    - *K:* 13 MB
>
>
>
> The total current size of our databases is around *1 TB*. With the new
> feature, we anticipate a substantial increase in data, potentially reaching *10
> TB* over the next 5-7 years.
>
>
>
> Below is the table for current size and expected growth in size:
>
>
>
> *S.No.*
>
> *DB*
>
> *Current DB size*
>
> *Future DB size*
>
> *Schema Name*
>
> *Current Schema size*
>
> *Future Schema size *
>
> 1
>
> C
>
> 1 TB
>
> 8 TB - 10 TB
>
> acc
>
> 297 GB
>
> 3 TB - 4 TB
>
> po
>
> 270 GB
>
> 2.6 TB - 3.5 TB
>
> pa
>
> 27 GB
>
> 270 GB
>
> pra
>
> 13 GB
>
> 130 GB
>
> fu
>
> 13 GB
>
> 130 GB
>
> te
>
> 167 MB
>
> 2 GB
>
> pro
>
> 30 MB
>
> 300 MB
>
> 2
>
> M
>
> 225 GB
>
> 2.2 TB - 3 TB
>
> bi
>
> 82 GB
>
> 820 GB
>
> co
>
> 80 GB
>
> 800 GB
>
> ps
>
> 17 GB
>
> 170 GB
>
> qo
>
> 16 GB
>
> 160 GB
>
> to
>
> 7 GB
>
> 70 GB
>
> in
>
> 7 GB
>
> 70 GB
>
> di
>
> 6 GB
>
> 60 GB
>
> no
>
> 4 GB
>
> 40 GB
>
> do
>
> 4 GB
>
> 40 GB
>
> cl
>
> 3 GB
>
> 30 GB
>
> 3
>
> P
>
> 60 GB
>
> 600 GB
>
> au
>
> 45 GB
>
> 450 GB
>
> fi
>
> 8 GB
>
> 80 GB
>
> con
>
> 4 GB
>
> 40 GB
>
> ba
>
> 1 GB
>
> 10 GB
>
> li
>
> 2 MB
>
> 20 GB
>
>
>
>
>
> We would greatly appreciate your insights on the following points:
>
>    1. *Scalability for Large Datasets:* Conceptually, PostgreSQL is known
>    to handle large datasets. However, we'd like to confirm if a single
>    PostgreSQL instance can realistically and efficiently manage 10-12 TB of
>    data in a production environment, considering typical transaction loads.
>    2. *Database Split Strategy:* Our largest database, "C," currently
>    occupies 620 GB. It contains multiple schemas. We are considering splitting
>    database "C" into two new databases: "C1" to exclusively house the "acc"
>    schema, and "C2" for the remaining schemas. Is this a recommended approach
>    for managing growth, and what are the potential pros and cons?
>    3. *Server Allocation for Split Databases:* If we proceed with
>    splitting "C" into "C1" and "C2," would it be advisable to assign a new,
>    separate database server for "C2," or could both "C1" and "C2" reside on
>    the same database server? What factors should we consider in making this
>    decision?
>    4. *Performance Limits per Database and Database Server:* From a
>    performance perspective, is there a general "limit" or best practice for
>    the maximum amount of data a single database server should handle (e.g., 10
>    TB) and similarly general limit per database? How does this influence the
>    decision to add more database servers?
>    5. *Best Practices for Large-Scale Data Management:* Beyond standard
>    practices like indexing and partitioning, what other best practices should
>    we consider implementing to ensure optimal performance and manageability
>    with such a large dataset? This could include configurations, maintenance
>    strategies, etc.
>    6. *Hardware Configuration Recommendations:* Based on our projected
>    data growth and desired performance, what hardware configurations (e.g.,
>    RAM, CPU, storage I/O, storage type like NVMe) would you recommend for
>    future database servers to efficiently handle 10-12 TB?
>    7. *Open-Source Horizontal Scaling Solutions:* Are there any
>    open-source horizontal scaling solutions for PostgreSQL (other than Citus
>    Data) that the community recommends or has experience with for managing
>    extremely large datasets? Any pointers or guidance on this would be highly
>    valuable.
>
>
>
> Thank you in advance for your time and expertise. We look forward to your
> valuable insights.
>
> Thanks & Regards,
>
> Ramzy
>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 05:11 ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
@ 2025-06-26 05:40   ` David G. Johnston <[email protected]>
  2025-06-26 06:34     ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: David G. Johnston @ 2025-06-26 05:40 UTC (permalink / raw)
  To: Motog Plus <[email protected]>; +Cc: Pgsql-admin <[email protected]>

On Wednesday, June 25, 2025, Motog Plus <[email protected]> wrote:
>
>
> We are still actively seeking your valuable insights and guidance on
> managing our anticipated data growth.
>

If you can’t even wait a full day without pinging your request for input
you might consider paying professionals instead of using volunteers.

David J.


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 05:11 ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 05:40   ` Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation David G. Johnston <[email protected]>
@ 2025-06-26 06:34     ` Motog Plus <[email protected]>
  2025-06-26 13:11       ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Claus Koch <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Motog Plus @ 2025-06-26 06:34 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: Pgsql-admin <[email protected]>

Dear David,

My apologies for the quick follow-up. You're absolutely right, and I
completely understand that community members contribute their valuable time
as volunteers. There was no intention to rush or disrespect anyone's
efforts.

I'm simply very keen to address this significant scaling challenge for our
application and was perhaps a bit too eager for guidance. I will certainly
be more patient moving forward.

Thank you for your feedback

Regards,
Ramzy

On Thu, Jun 26, 2025, 11:10 David G. Johnston <[email protected]>
wrote:

> On Wednesday, June 25, 2025, Motog Plus <[email protected]> wrote:
>>
>>
>> We are still actively seeking your valuable insights and guidance on
>> managing our anticipated data growth.
>>
>
> If you can’t even wait a full day without pinging your request for input
> you might consider paying professionals instead of using volunteers.
>
> David J.
>
>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 05:11 ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 05:40   ` Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation David G. Johnston <[email protected]>
  2025-06-26 06:34     ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
@ 2025-06-26 13:11       ` Claus Koch <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: Claus Koch @ 2025-06-26 13:11 UTC (permalink / raw)
  To: [email protected]

Dear Ramzy,

I have some experience with this type of process and can see that the 
information you have provided is far from sufficient. The experts in 
this channel can certainly show you various techniques for updating and 
give you examples. However, to carry out such a project successfully, 
you will need much more detailed information about your company, your 
intentions, and your operating processes. I would consider it highly 
questionable to disclose this information about my company in a public list.

I recommend that you hire a consultant who specializes in Postgres, at 
least for the planning phase.

Claus K.

On 2025-06-26 08:34, Motog Plus wrote:
> Dear David,
>
> My apologies for the quick follow-up. You're absolutely right, and I 
> completely understand that community members contribute their valuable 
> time as volunteers. There was no intention to rush or disrespect 
> anyone's efforts.
>
> I'm simply very keen to address this significant scaling challenge for 
> our application and was perhaps a bit too eager for guidance. I will 
> certainly be more patient moving forward.
>
> Thank you for your feedback
>
> Regards,
> Ramzy
>
> On Thu, Jun 26, 2025, 11:10 David G. Johnston 
> <[email protected]> wrote:
>
>     On Wednesday, June 25, 2025, Motog Plus <[email protected]> wrote:
>
>
>         We are still actively seeking your valuable insights and
>         guidance on managing our anticipated data growth.
>
>
>     If you can’t even wait a full day without pinging your request for
>     input you might consider paying professionals instead of using
>     volunteers.
>
>     David J.
>

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
@ 2025-06-26 12:49 ` Ron Johnson <[email protected]>
  2025-06-26 13:43   ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2 siblings, 1 reply; 11+ messages in thread

From: Ron Johnson @ 2025-06-26 12:49 UTC (permalink / raw)
  To: Pgsql-admin <[email protected]>

PG easily handles our 6TB database, as well as 3 and 5TB databases (all on
different VMs), and has done so since at least v8.4.

Ours are on single LVM mount points, as are the disks that hold the
PgBackRest savesets.

"considering typical transaction loads."

Pfft...there are no typical transaction loads.  Is this db OLTP, Reporting
or DW?

On Wed, Jun 25, 2025 at 4:35 AM Motog Plus <[email protected]> wrote:

> Dear PostgreSQL Community,
>
> We are implementing a new feature in our application that is expected to
> generate a significant amount of data, and we are seeking your expert
> guidance on how to best handle this growth within our existing PostgreSQL
> setup.
>
>
>
> Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu
> instance with the following specifications:
>
>    - *RAM:* 32 GB
>    - *Disk:* 1.2 TB
>    - *vCPUs:* 16
>
>
>
> Our database architecture utilizes a primary-standby streaming replication
> setup. Application modules (running in Kubernetes pods) connect to the
> database through Pgpool-II, using HikariCP for connection pooling.
>
>
>
> We have multiple databases on our primary server, with their approximate
> current sizes as follows:
>
>    - *C:* 620 GB
>    - *M:* 225 GB
>    - *P:* 59 GB
>    - *K:* 13 MB
>
>
>
> The total current size of our databases is around *1 TB*. With the new
> feature, we anticipate a substantial increase in data, potentially reaching *10
> TB* over the next 5-7 years.
>
>
>
> Below is the table for current size and expected growth in size:
>
>
>
> *S.No.*
>
> *DB*
>
> *Current DB size*
>
> *Future DB size*
>
> *Schema Name*
>
> *Current Schema size*
>
> *Future Schema size *
>
> 1
>
> C
>
> 1 TB
>
> 8 TB - 10 TB
>
> acc
>
> 297 GB
>
> 3 TB - 4 TB
>
> po
>
> 270 GB
>
> 2.6 TB - 3.5 TB
>
> pa
>
> 27 GB
>
> 270 GB
>
> pra
>
> 13 GB
>
> 130 GB
>
> fu
>
> 13 GB
>
> 130 GB
>
> te
>
> 167 MB
>
> 2 GB
>
> pro
>
> 30 MB
>
> 300 MB
>
> 2
>
> M
>
> 225 GB
>
> 2.2 TB - 3 TB
>
> bi
>
> 82 GB
>
> 820 GB
>
> co
>
> 80 GB
>
> 800 GB
>
> ps
>
> 17 GB
>
> 170 GB
>
> qo
>
> 16 GB
>
> 160 GB
>
> to
>
> 7 GB
>
> 70 GB
>
> in
>
> 7 GB
>
> 70 GB
>
> di
>
> 6 GB
>
> 60 GB
>
> no
>
> 4 GB
>
> 40 GB
>
> do
>
> 4 GB
>
> 40 GB
>
> cl
>
> 3 GB
>
> 30 GB
>
> 3
>
> P
>
> 60 GB
>
> 600 GB
>
> au
>
> 45 GB
>
> 450 GB
>
> fi
>
> 8 GB
>
> 80 GB
>
> con
>
> 4 GB
>
> 40 GB
>
> ba
>
> 1 GB
>
> 10 GB
>
> li
>
> 2 MB
>
> 20 GB
>
>
>
>
>
> We would greatly appreciate your insights on the following points:
>
>    1. *Scalability for Large Datasets:* Conceptually, PostgreSQL is known
>    to handle large datasets. However, we'd like to confirm if a single
>    PostgreSQL instance can realistically and efficiently manage 10-12 TB of
>    data in a production environment, considering typical transaction loads.
>    2. *Database Split Strategy:* Our largest database, "C," currently
>    occupies 620 GB. It contains multiple schemas. We are considering splitting
>    database "C" into two new databases: "C1" to exclusively house the "acc"
>    schema, and "C2" for the remaining schemas. Is this a recommended approach
>    for managing growth, and what are the potential pros and cons?
>    3. *Server Allocation for Split Databases:* If we proceed with
>    splitting "C" into "C1" and "C2," would it be advisable to assign a new,
>    separate database server for "C2," or could both "C1" and "C2" reside on
>    the same database server? What factors should we consider in making this
>    decision?
>    4. *Performance Limits per Database and Database Server:* From a
>    performance perspective, is there a general "limit" or best practice for
>    the maximum amount of data a single database server should handle (e.g., 10
>    TB) and similarly general limit per database? How does this influence the
>    decision to add more database servers?
>    5. *Best Practices for Large-Scale Data Management:* Beyond standard
>    practices like indexing and partitioning, what other best practices should
>    we consider implementing to ensure optimal performance and manageability
>    with such a large dataset? This could include configurations, maintenance
>    strategies, etc.
>    6. *Hardware Configuration Recommendations:* Based on our projected
>    data growth and desired performance, what hardware configurations (e.g.,
>    RAM, CPU, storage I/O, storage type like NVMe) would you recommend for
>    future database servers to efficiently handle 10-12 TB?
>    7. *Open-Source Horizontal Scaling Solutions:* Are there any
>    open-source horizontal scaling solutions for PostgreSQL (other than Citus
>    Data) that the community recommends or has experience with for managing
>    extremely large datasets? Any pointers or guidance on this would be highly
>    valuable.
>
>
>
> Thank you in advance for your time and expertise. We look forward to your
> valuable insights.
>
> Thanks & Regards,
>
> Ramzy
>


-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 12:49 ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Ron Johnson <[email protected]>
@ 2025-06-26 13:43   ` Motog Plus <[email protected]>
  2025-06-26 14:30     ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Paul Smith* <[email protected]>
  2025-06-26 14:54     ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Ron Johnson <[email protected]>
  0 siblings, 2 replies; 11+ messages in thread

From: Motog Plus @ 2025-06-26 13:43 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; Pgsql-admin <[email protected]>

Thanks Ron, for the feedback and for sharing your experience with
PostgreSQL handling such large databases – that's very encouraging to hear.
We are using postgres version 15.12.

You're absolutely right about "typical transaction loads" not being a
useful term without more context. My apologies for the vagueness. We
actually have two distinct workloads on separate servers:

OLTP: This is our primary transactional workload and has replication setup,
pgpool - II
Reporting/DW: This is for reporting purposes.


The growth figures I initially shared (8-9 TB) were a more conservative
estimate for OLTP.

However, after a more focused rough estimate for our OLTP workload alone,
we anticipate it could reach 35-40 TB of data over the next 5-7 years.


Specifically for our OLTP databases (which I listed in my initial email):

Database C could reach 30-32 TB, with the acc schema within it potentially
growing to 13-15 TB.
Database M might reach 5-7 TB.
Database P could reach 1-2 TB.


Given these revised, more detailed projections for the OLTP side, we would
be extremely grateful for your and the community's guidance on all the
questions we originally posed, specifically considering these new volume
expectations for our OLTP workload:


1. Will PostgreSQL be able to handle this much load (35-40 TB, with one DB
potentially at 30-32 TB and a schema at 13-15 TB) for an OLTP environment?

2. Should we still consider splitting our database "C" into two DBs (C1 for
"acc" schema and C2 for the rest), given the projected 13-15 TB for acc
alone?

3. Should we assign a new DB server to C2, or keep it on the same server,
particularly now with these larger OLTP volumes?

4. Will a single DB server be able to handle 30+ TB of OLTP data, or is
there a particular limit per DB server from a performance point of view for
OLTP?

5. What are the best practices, apart from indexing and partitioning, to
keep in mind for such large-scale OLTP data management?

6. What hardware configuration (RAM, CPU, storage I/O, storage type like
NVMe) would you recommend for future OLTP database servers to efficiently
handle these new projected sizes?

7. Is a horizontal scaling solution (open source, apart from Citus)
possible in PostgreSQL for these OLTP volumes, and do you have any pointers
on that?


Thanks again for your time and invaluable guidance.

We truly appreciate the community's expertise.

Regards,
Ramzy

On Thu, Jun 26, 2025, 18:19 Ron Johnson <[email protected]> wrote:

> PG easily handles our 6TB database, as well as 3 and 5TB databases (all on
> different VMs), and has done so since at least v8.4.
>
> Ours are on single LVM mount points, as are the disks that hold the
> PgBackRest savesets.
>
> "considering typical transaction loads."
>
> Pfft...there are no typical transaction loads.  Is this db OLTP, Reporting
> or DW?
>
> On Wed, Jun 25, 2025 at 4:35 AM Motog Plus <[email protected]> wrote:
>
>> Dear PostgreSQL Community,
>>
>> We are implementing a new feature in our application that is expected to
>> generate a significant amount of data, and we are seeking your expert
>> guidance on how to best handle this growth within our existing PostgreSQL
>> setup.
>>
>>
>>
>> Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu
>> instance with the following specifications:
>>
>>    - *RAM:* 32 GB
>>    - *Disk:* 1.2 TB
>>    - *vCPUs:* 16
>>
>>
>>
>> Our database architecture utilizes a primary-standby streaming
>> replication setup. Application modules (running in Kubernetes pods) connect
>> to the database through Pgpool-II, using HikariCP for connection pooling.
>>
>>
>>
>> We have multiple databases on our primary server, with their approximate
>> current sizes as follows:
>>
>>    - *C:* 620 GB
>>    - *M:* 225 GB
>>    - *P:* 59 GB
>>    - *K:* 13 MB
>>
>>
>>
>> The total current size of our databases is around *1 TB*. With the new
>> feature, we anticipate a substantial increase in data, potentially reaching *10
>> TB* over the next 5-7 years.
>>
>>
>>
>> Below is the table for current size and expected growth in size:
>>
>>
>>
>> *S.No.*
>>
>> *DB*
>>
>> *Current DB size*
>>
>> *Future DB size*
>>
>> *Schema Name*
>>
>> *Current Schema size*
>>
>> *Future Schema size *
>>
>> 1
>>
>> C
>>
>> 1 TB
>>
>> 8 TB - 10 TB
>>
>> acc
>>
>> 297 GB
>>
>> 3 TB - 4 TB
>>
>> po
>>
>> 270 GB
>>
>> 2.6 TB - 3.5 TB
>>
>> pa
>>
>> 27 GB
>>
>> 270 GB
>>
>> pra
>>
>> 13 GB
>>
>> 130 GB
>>
>> fu
>>
>> 13 GB
>>
>> 130 GB
>>
>> te
>>
>> 167 MB
>>
>> 2 GB
>>
>> pro
>>
>> 30 MB
>>
>> 300 MB
>>
>> 2
>>
>> M
>>
>> 225 GB
>>
>> 2.2 TB - 3 TB
>>
>> bi
>>
>> 82 GB
>>
>> 820 GB
>>
>> co
>>
>> 80 GB
>>
>> 800 GB
>>
>> ps
>>
>> 17 GB
>>
>> 170 GB
>>
>> qo
>>
>> 16 GB
>>
>> 160 GB
>>
>> to
>>
>> 7 GB
>>
>> 70 GB
>>
>> in
>>
>> 7 GB
>>
>> 70 GB
>>
>> di
>>
>> 6 GB
>>
>> 60 GB
>>
>> no
>>
>> 4 GB
>>
>> 40 GB
>>
>> do
>>
>> 4 GB
>>
>> 40 GB
>>
>> cl
>>
>> 3 GB
>>
>> 30 GB
>>
>> 3
>>
>> P
>>
>> 60 GB
>>
>> 600 GB
>>
>> au
>>
>> 45 GB
>>
>> 450 GB
>>
>> fi
>>
>> 8 GB
>>
>> 80 GB
>>
>> con
>>
>> 4 GB
>>
>> 40 GB
>>
>> ba
>>
>> 1 GB
>>
>> 10 GB
>>
>> li
>>
>> 2 MB
>>
>> 20 GB
>>
>>
>>
>>
>>
>> We would greatly appreciate your insights on the following points:
>>
>>    1. *Scalability for Large Datasets:* Conceptually, PostgreSQL is
>>    known to handle large datasets. However, we'd like to confirm if a single
>>    PostgreSQL instance can realistically and efficiently manage 10-12 TB of
>>    data in a production environment, considering typical transaction loads.
>>    2. *Database Split Strategy:* Our largest database, "C," currently
>>    occupies 620 GB. It contains multiple schemas. We are considering splitting
>>    database "C" into two new databases: "C1" to exclusively house the "acc"
>>    schema, and "C2" for the remaining schemas. Is this a recommended approach
>>    for managing growth, and what are the potential pros and cons?
>>    3. *Server Allocation for Split Databases:* If we proceed with
>>    splitting "C" into "C1" and "C2," would it be advisable to assign a new,
>>    separate database server for "C2," or could both "C1" and "C2" reside on
>>    the same database server? What factors should we consider in making this
>>    decision?
>>    4. *Performance Limits per Database and Database Server:* From a
>>    performance perspective, is there a general "limit" or best practice for
>>    the maximum amount of data a single database server should handle (e.g., 10
>>    TB) and similarly general limit per database? How does this influence the
>>    decision to add more database servers?
>>    5. *Best Practices for Large-Scale Data Management:* Beyond standard
>>    practices like indexing and partitioning, what other best practices should
>>    we consider implementing to ensure optimal performance and manageability
>>    with such a large dataset? This could include configurations, maintenance
>>    strategies, etc.
>>    6. *Hardware Configuration Recommendations:* Based on our projected
>>    data growth and desired performance, what hardware configurations (e.g.,
>>    RAM, CPU, storage I/O, storage type like NVMe) would you recommend for
>>    future database servers to efficiently handle 10-12 TB?
>>    7. *Open-Source Horizontal Scaling Solutions:* Are there any
>>    open-source horizontal scaling solutions for PostgreSQL (other than Citus
>>    Data) that the community recommends or has experience with for managing
>>    extremely large datasets? Any pointers or guidance on this would be highly
>>    valuable.
>>
>>
>>
>> Thank you in advance for your time and expertise. We look forward to your
>> valuable insights.
>>
>> Thanks & Regards,
>>
>> Ramzy
>>
>
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 12:49 ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Ron Johnson <[email protected]>
  2025-06-26 13:43   ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
@ 2025-06-26 14:30     ` Paul Smith* <[email protected]>
  1 sibling, 0 replies; 11+ messages in thread

From: Paul Smith* @ 2025-06-26 14:30 UTC (permalink / raw)
  To: [email protected]

On 26/06/2025 14:43, Motog Plus wrote:
> OLTP: This is our primary transactional workload and has replication 
> setup, pgpool - II
> Reporting/DW: This is for reporting purposes.
>
> The growth figures I initially shared (8-9 TB) were a more 
> conservative estimate for OLTP.
>
> However, after a more focused rough estimate for our OLTP workload 
> alone, we anticipate it could reach 35-40 TB of data over the next 5-7 
> years.
>
>
> Specifically for our OLTP databases (which I listed in my initial email):
>
> Database C could reach 30-32 TB, with the acc schema within it 
> potentially growing to 13-15 TB.

The database size is irrelevant (once it's significantly bigger than the 
available RAM)

A Raspberry Pi could easily handle a 30TB database with 50 transactions 
an hour

A 64 core Xeon with 64GB RAM couldn't handle a 500GB database with 
50,000 random insert/update transactions a second.

How many transactions per <time unit> is more important than size, as is 
what sort of transactions are they - eg an indexed SELECT of a single 
row is a lot less effort than an INSERT which has triggers to update 
multiple other tables.

That is what people mean by 'transactional workload'


If the quantity/quality of transactions is staying the same, but, for 
instance, you are simply keeping historical data longer and not querying 
it, then increasing the database size might not be as significant as you 
fear

Paul






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
  2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
  2025-06-26 12:49 ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Ron Johnson <[email protected]>
  2025-06-26 13:43   ` Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
@ 2025-06-26 14:54     ` Ron Johnson <[email protected]>
  1 sibling, 0 replies; 11+ messages in thread

From: Ron Johnson @ 2025-06-26 14:54 UTC (permalink / raw)
  To: Pgsql-admin <[email protected]>

Our application does "manual" horizontal partitioning, with a "reference"
database holding info on which customers are in which child database (where
each of the three children is on a separate VM).

Note: largest DB is 4.8TB, not 6TB.  Total size is 11.3TB.

I can't/won't answer your specific questions, but will give some guidance.

Seven years of OLTP in one database is pretty severe.  Part of the design
should be an archive process to (quarterly, monthly, semiannually,
whatever) move data from _the_ OLTP database to the "archive" database
(which _can_ be in the same instance, but is preferably on a different
instance).  Either your application can do the joining of current and
archived data, or you can use postgres_fdw to combine the current and
archived data.

(Having said that, we have 10 years of image/bytea data in our databases.
Tables are partitioned via inheritance.)

"Small" OLTP databases can have table partitions split by account number or
"true PK" (which means you don't have to add a date field to the synthetic
PK), while giant Reporting and DW tables can be partitioned by date.

Two other reason to regularly archive:
1) small instances backup *AND RESTORE* faster than giant instances.
2) pg_basebackup creates replication faster on a small instance.

On Thu, Jun 26, 2025 at 9:43 AM Motog Plus <[email protected]> wrote:

> Thanks Ron, for the feedback and for sharing your experience with
> PostgreSQL handling such large databases – that's very encouraging to hear.
> We are using postgres version 15.12.
>
> You're absolutely right about "typical transaction loads" not being a
> useful term without more context. My apologies for the vagueness. We
> actually have two distinct workloads on separate servers:
>
> OLTP: This is our primary transactional workload and has replication
> setup, pgpool - II
> Reporting/DW: This is for reporting purposes.
>
>
> The growth figures I initially shared (8-9 TB) were a more conservative
> estimate for OLTP.
>
> However, after a more focused rough estimate for our OLTP workload alone,
> we anticipate it could reach 35-40 TB of data over the next 5-7 years.
>
>
> Specifically for our OLTP databases (which I listed in my initial email):
>
> Database C could reach 30-32 TB, with the acc schema within it potentially
> growing to 13-15 TB.
> Database M might reach 5-7 TB.
> Database P could reach 1-2 TB.
>
>
> Given these revised, more detailed projections for the OLTP side, we would
> be extremely grateful for your and the community's guidance on all the
> questions we originally posed, specifically considering these new volume
> expectations for our OLTP workload:
>
>
> 1. Will PostgreSQL be able to handle this much load (35-40 TB, with one DB
> potentially at 30-32 TB and a schema at 13-15 TB) for an OLTP environment?
>
> 2. Should we still consider splitting our database "C" into two DBs (C1
> for "acc" schema and C2 for the rest), given the projected 13-15 TB for acc
> alone?
>
> 3. Should we assign a new DB server to C2, or keep it on the same server,
> particularly now with these larger OLTP volumes?
>
> 4. Will a single DB server be able to handle 30+ TB of OLTP data, or is
> there a particular limit per DB server from a performance point of view for
> OLTP?
>
> 5. What are the best practices, apart from indexing and partitioning, to
> keep in mind for such large-scale OLTP data management?
>
> 6. What hardware configuration (RAM, CPU, storage I/O, storage type like
> NVMe) would you recommend for future OLTP database servers to efficiently
> handle these new projected sizes?
>
> 7. Is a horizontal scaling solution (open source, apart from Citus)
> possible in PostgreSQL for these OLTP volumes, and do you have any pointers
> on that?
>
>
> Thanks again for your time and invaluable guidance.
>
> We truly appreciate the community's expertise.
>
> Regards,
> Ramzy
>
> On Thu, Jun 26, 2025, 18:19 Ron Johnson <[email protected]> wrote:
>
>> PG easily handles our 6TB database, as well as 3 and 5TB databases (all
>> on different VMs), and has done so since at least v8.4.
>>
>> Ours are on single LVM mount points, as are the disks that hold the
>> PgBackRest savesets.
>>
>> "considering typical transaction loads."
>>
>> Pfft...there are no typical transaction loads.  Is this db OLTP,
>> Reporting or DW?
>>
>> On Wed, Jun 25, 2025 at 4:35 AM Motog Plus <[email protected]> wrote:
>>
>>> Dear PostgreSQL Community,
>>>
>>> We are implementing a new feature in our application that is expected to
>>> generate a significant amount of data, and we are seeking your expert
>>> guidance on how to best handle this growth within our existing PostgreSQL
>>> setup.
>>>
>>>
>>>
>>> Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu
>>> instance with the following specifications:
>>>
>>>    - *RAM:* 32 GB
>>>    - *Disk:* 1.2 TB
>>>    - *vCPUs:* 16
>>>
>>>
>>>
>>> Our database architecture utilizes a primary-standby streaming
>>> replication setup. Application modules (running in Kubernetes pods) connect
>>> to the database through Pgpool-II, using HikariCP for connection pooling.
>>>
>>>
>>>
>>> We have multiple databases on our primary server, with their approximate
>>> current sizes as follows:
>>>
>>>    - *C:* 620 GB
>>>    - *M:* 225 GB
>>>    - *P:* 59 GB
>>>    - *K:* 13 MB
>>>
>>>
>>>
>>> The total current size of our databases is around *1 TB*. With the new
>>> feature, we anticipate a substantial increase in data, potentially reaching *10
>>> TB* over the next 5-7 years.
>>>
>>>
>>>
>>> Below is the table for current size and expected growth in size:
>>>
>>>
>>>
>>> *S.No.*
>>>
>>> *DB*
>>>
>>> *Current DB size*
>>>
>>> *Future DB size*
>>>
>>> *Schema Name*
>>>
>>> *Current Schema size*
>>>
>>> *Future Schema size *
>>>
>>> 1
>>>
>>> C
>>>
>>> 1 TB
>>>
>>> 8 TB - 10 TB
>>>
>>> acc
>>>
>>> 297 GB
>>>
>>> 3 TB - 4 TB
>>>
>>> po
>>>
>>> 270 GB
>>>
>>> 2.6 TB - 3.5 TB
>>>
>>> pa
>>>
>>> 27 GB
>>>
>>> 270 GB
>>>
>>> pra
>>>
>>> 13 GB
>>>
>>> 130 GB
>>>
>>> fu
>>>
>>> 13 GB
>>>
>>> 130 GB
>>>
>>> te
>>>
>>> 167 MB
>>>
>>> 2 GB
>>>
>>> pro
>>>
>>> 30 MB
>>>
>>> 300 MB
>>>
>>> 2
>>>
>>> M
>>>
>>> 225 GB
>>>
>>> 2.2 TB - 3 TB
>>>
>>> bi
>>>
>>> 82 GB
>>>
>>> 820 GB
>>>
>>> co
>>>
>>> 80 GB
>>>
>>> 800 GB
>>>
>>> ps
>>>
>>> 17 GB
>>>
>>> 170 GB
>>>
>>> qo
>>>
>>> 16 GB
>>>
>>> 160 GB
>>>
>>> to
>>>
>>> 7 GB
>>>
>>> 70 GB
>>>
>>> in
>>>
>>> 7 GB
>>>
>>> 70 GB
>>>
>>> di
>>>
>>> 6 GB
>>>
>>> 60 GB
>>>
>>> no
>>>
>>> 4 GB
>>>
>>> 40 GB
>>>
>>> do
>>>
>>> 4 GB
>>>
>>> 40 GB
>>>
>>> cl
>>>
>>> 3 GB
>>>
>>> 30 GB
>>>
>>> 3
>>>
>>> P
>>>
>>> 60 GB
>>>
>>> 600 GB
>>>
>>> au
>>>
>>> 45 GB
>>>
>>> 450 GB
>>>
>>> fi
>>>
>>> 8 GB
>>>
>>> 80 GB
>>>
>>> con
>>>
>>> 4 GB
>>>
>>> 40 GB
>>>
>>> ba
>>>
>>> 1 GB
>>>
>>> 10 GB
>>>
>>> li
>>>
>>> 2 MB
>>>
>>> 20 GB
>>>
>>>
>>>
>>>
>>>
>>> We would greatly appreciate your insights on the following points:
>>>
>>>    1. *Scalability for Large Datasets:* Conceptually, PostgreSQL is
>>>    known to handle large datasets. However, we'd like to confirm if a single
>>>    PostgreSQL instance can realistically and efficiently manage 10-12 TB of
>>>    data in a production environment, considering typical transaction loads.
>>>    2. *Database Split Strategy:* Our largest database, "C," currently
>>>    occupies 620 GB. It contains multiple schemas. We are considering splitting
>>>    database "C" into two new databases: "C1" to exclusively house the "acc"
>>>    schema, and "C2" for the remaining schemas. Is this a recommended approach
>>>    for managing growth, and what are the potential pros and cons?
>>>    3. *Server Allocation for Split Databases:* If we proceed with
>>>    splitting "C" into "C1" and "C2," would it be advisable to assign a new,
>>>    separate database server for "C2," or could both "C1" and "C2" reside on
>>>    the same database server? What factors should we consider in making this
>>>    decision?
>>>    4. *Performance Limits per Database and Database Server:* From a
>>>    performance perspective, is there a general "limit" or best practice for
>>>    the maximum amount of data a single database server should handle (e.g., 10
>>>    TB) and similarly general limit per database? How does this influence the
>>>    decision to add more database servers?
>>>    5. *Best Practices for Large-Scale Data Management:* Beyond standard
>>>    practices like indexing and partitioning, what other best practices should
>>>    we consider implementing to ensure optimal performance and manageability
>>>    with such a large dataset? This could include configurations, maintenance
>>>    strategies, etc.
>>>    6. *Hardware Configuration Recommendations:* Based on our projected
>>>    data growth and desired performance, what hardware configurations (e.g.,
>>>    RAM, CPU, storage I/O, storage type like NVMe) would you recommend for
>>>    future database servers to efficiently handle 10-12 TB?
>>>    7. *Open-Source Horizontal Scaling Solutions:* Are there any
>>>    open-source horizontal scaling solutions for PostgreSQL (other than Citus
>>>    Data) that the community recommends or has experience with for managing
>>>    extremely large datasets? Any pointers or guidance on this would be highly
>>>    valuable.
>>>
>>>
>>>
>>> Thank you in advance for your time and expertise. We look forward to
>>> your valuable insights.
>>>
>>> Thanks & Regards,
>>>
>>> Ramzy
>>>
>>
>>
>> --
>> Death to <Redacted>, and butter sauce.
>> Don't boil me, I'm still alive.
>> <Redacted> lobster!
>>
>

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 11+ messages in thread


end of thread, other threads:[~2025-07-04 13:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-06-25 08:35 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation Motog Plus <[email protected]>
2025-06-25 08:48 ` Motog Plus <[email protected]>
2025-07-04 13:45   ` Greg Sabino Mullane <[email protected]>
2025-06-26 05:11 ` Motog Plus <[email protected]>
2025-06-26 05:40   ` David G. Johnston <[email protected]>
2025-06-26 06:34     ` Motog Plus <[email protected]>
2025-06-26 13:11       ` Claus Koch <[email protected]>
2025-06-26 12:49 ` Ron Johnson <[email protected]>
2025-06-26 13:43   ` Motog Plus <[email protected]>
2025-06-26 14:30     ` Paul Smith* <[email protected]>
2025-06-26 14:54     ` Ron Johnson <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox