Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uUm2w-008KXE-0I for pgsql-admin@arkaria.postgresql.org; Thu, 26 Jun 2025 12:49:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uUm2t-00CPNS-R3 for pgsql-admin@arkaria.postgresql.org; Thu, 26 Jun 2025 12:49:40 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uUm2t-00CPNK-CO for pgsql-admin@lists.postgresql.org; Thu, 26 Jun 2025 12:49:40 +0000 Received: from mail-oi1-x233.google.com ([2607:f8b0:4864:20::233]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uUm2r-004Dp2-2T for pgsql-admin@lists.postgresql.org; Thu, 26 Jun 2025 12:49:39 +0000 Received: by mail-oi1-x233.google.com with SMTP id 5614622812f47-40a6692b75cso780017b6e.1 for ; Thu, 26 Jun 2025 05:49:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750942175; x=1751546975; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=No97xfbJR5svAME5cErUToqrOH3q9oTuRmO4bDvDe1I=; b=Q2Ui/ZwYR3PjcGQn4nRgOTkDCXYZg1TK5syERIu5XGmEdN65frHtpjCEhfEYjWGDkW Lrl1lOqYpUUCvqoT3KXPbWXWLWpDRLudyrehCGrhG/BVf9Zkb03tHXKGQJmqrh85ZduO vdzCLknBIKUxJfwwDf56EM4YmDtUqwoTMRhMDp1Jf0ARDPJG0Tp812Vi3FjrEhKE2sT0 v8VL3twEcpVb+5a6HplwQUeOPpZYL/irbitUMK78DGV+GfQQwNFvo0OoAWr835U31eJS NCNxFllpGPpC5HW3R3BoM3eA07Y7D4sU2eVW0+P4oU1pn3oM8QB88nlu/KY3kINCllqq RElw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750942175; x=1751546975; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=No97xfbJR5svAME5cErUToqrOH3q9oTuRmO4bDvDe1I=; b=mN+q/jNmusGJYpfGr+7OVHx6K4FXjHUB9MrX34Y8M6onMPe3xmcJBBrY4usGKr5Ign B4Nk+AmBbVcJk/wvgxEVw4p7OAzUdIloDVotqPhhPQh4FMPxEGqJ5bQtgLTgURb3NXa7 GvNbBgNwJSOQm0UEeJIgybuy7JUrYNxW6C9sDLGRS50S8njFqwRNsLR3How/3/0E8+AO 7ltd50BZkx3jCZS+5s5w7Yp91VPr9+c4Hdbp6RKIja6tOWzBuc34paqbhDgN1wRBOX7Q NkdkdePBCAM6AmNF3TTUmgugluvMLMWG1BoPRYHGkFIsu8DmbXxbyP1QWE1JHfVuXvTZ sjew== X-Gm-Message-State: AOJu0YyUitIiS5yACUqDOEGxS1lurz/7ZRDcmZO1XvOjZja8Gv2KP/Q3 afEWPY2dSUUSEEH66RbWCIySjoisJYBCT/zc6KiWG8Y+x163oftaZzus9m81VuU3fMBGasrgBQT bOLKm6qF9G7NyVbBtfu8AcbrUUyVPg3sdPwDu X-Gm-Gg: ASbGncus6YJ5IRFmHyT2HMx1iJp1mS5ga81iLjzvE1wchnamv54RCR6hdy82Dv8titX CoAs+uhIqZqtByVj32NuiOID94zATApBVrKu+k023JwZ2a9hc3aANa3P3+D2F78eABZpemPx88+ ASnGJX38Joif1x4yRoYHdeUt/+2dTGSiKt0EBgdI5D9GLK X-Google-Smtp-Source: AGHT+IF+xLu4Wm3y2GFvgkPgZnQx6K/YwVEgfdPi5QLVqpl4KPejz0ZNQqCv4HW4fkbQAjhL6bt+TetCb5ZRvIYomMM= X-Received: by 2002:a05:6808:1302:b0:406:4e0c:9aaa with SMTP id 5614622812f47-40b05a937f4mr4964366b6e.14.1750942175440; Thu, 26 Jun 2025 05:49:35 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Thu, 26 Jun 2025 08:49:23 -0400 X-Gm-Features: Ac12FXyjyeooS5iW6r5Y3ZnM6Efg5oZ4pttgssra8GkDFV08emGz6p80F8FYPHs Message-ID: Subject: Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation To: Pgsql-admin Content-Type: multipart/alternative; boundary="00000000000063b53c063878feaf" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000063b53c063878feaf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable PG easily handles our 6TB database, as well as 3 and 5TB databases (all on different VMs), and has done so since at least v8.4. Ours are on single LVM mount points, as are the disks that hold the PgBackRest savesets. "considering typical transaction loads." Pfft...there are no typical transaction loads. Is this db OLTP, Reporting or DW? On Wed, Jun 25, 2025 at 4:35=E2=80=AFAM Motog Plus wr= ote: > Dear PostgreSQL Community, > > We are implementing a new feature in our application that is expected to > generate a significant amount of data, and we are seeking your expert > guidance on how to best handle this growth within our existing PostgreSQL > setup. > > > > Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu > instance with the following specifications: > > - *RAM:* 32 GB > - *Disk:* 1.2 TB > - *vCPUs:* 16 > > > > Our database architecture utilizes a primary-standby streaming replicatio= n > setup. Application modules (running in Kubernetes pods) connect to the > database through Pgpool-II, using HikariCP for connection pooling. > > > > We have multiple databases on our primary server, with their approximate > current sizes as follows: > > - *C:* 620 GB > - *M:* 225 GB > - *P:* 59 GB > - *K:* 13 MB > > > > The total current size of our databases is around *1 TB*. With the new > feature, we anticipate a substantial increase in data, potentially reachi= ng *10 > TB* over the next 5-7 years. > > > > Below is the table for current size and expected growth in size: > > > > *S.No.* > > *DB* > > *Current DB size* > > *Future DB size* > > *Schema Name* > > *Current Schema size* > > *Future Schema size * > > 1 > > C > > 1 TB > > 8 TB - 10 TB > > acc > > 297 GB > > 3 TB - 4 TB > > po > > 270 GB > > 2.6 TB - 3.5 TB > > pa > > 27 GB > > 270 GB > > pra > > 13 GB > > 130 GB > > fu > > 13 GB > > 130 GB > > te > > 167 MB > > 2 GB > > pro > > 30 MB > > 300 MB > > 2 > > M > > 225 GB > > 2.2 TB - 3 TB > > bi > > 82 GB > > 820 GB > > co > > 80 GB > > 800 GB > > ps > > 17 GB > > 170 GB > > qo > > 16 GB > > 160 GB > > to > > 7 GB > > 70 GB > > in > > 7 GB > > 70 GB > > di > > 6 GB > > 60 GB > > no > > 4 GB > > 40 GB > > do > > 4 GB > > 40 GB > > cl > > 3 GB > > 30 GB > > 3 > > P > > 60 GB > > 600 GB > > au > > 45 GB > > 450 GB > > fi > > 8 GB > > 80 GB > > con > > 4 GB > > 40 GB > > ba > > 1 GB > > 10 GB > > li > > 2 MB > > 20 GB > > > > > > We would greatly appreciate your insights on the following points: > > 1. *Scalability for Large Datasets:* Conceptually, PostgreSQL is known > to handle large datasets. However, we'd like to confirm if a single > PostgreSQL instance can realistically and efficiently manage 10-12 TB = of > data in a production environment, considering typical transaction load= s. > 2. *Database Split Strategy:* Our largest database, "C," currently > occupies 620 GB. It contains multiple schemas. We are considering spli= tting > database "C" into two new databases: "C1" to exclusively house the "ac= c" > schema, and "C2" for the remaining schemas. Is this a recommended appr= oach > for managing growth, and what are the potential pros and cons? > 3. *Server Allocation for Split Databases:* If we proceed with > splitting "C" into "C1" and "C2," would it be advisable to assign a ne= w, > separate database server for "C2," or could both "C1" and "C2" reside = on > the same database server? What factors should we consider in making th= is > decision? > 4. *Performance Limits per Database and Database Server:* From a > performance perspective, is there a general "limit" or best practice f= or > the maximum amount of data a single database server should handle (e.g= ., 10 > TB) and similarly general limit per database? How does this influence = the > decision to add more database servers? > 5. *Best Practices for Large-Scale Data Management:* Beyond standard > practices like indexing and partitioning, what other best practices sh= ould > we consider implementing to ensure optimal performance and manageabili= ty > with such a large dataset? This could include configurations, maintena= nce > strategies, etc. > 6. *Hardware Configuration Recommendations:* Based on our projected > data growth and desired performance, what hardware configurations (e.g= ., > RAM, CPU, storage I/O, storage type like NVMe) would you recommend for > future database servers to efficiently handle 10-12 TB? > 7. *Open-Source Horizontal Scaling Solutions:* Are there any > open-source horizontal scaling solutions for PostgreSQL (other than Ci= tus > Data) that the community recommends or has experience with for managin= g > extremely large datasets? Any pointers or guidance on this would be hi= ghly > valuable. > > > > Thank you in advance for your time and expertise. We look forward to your > valuable insights. > > Thanks & Regards, > > Ramzy > --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --00000000000063b53c063878feaf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
PG easily handles our 6TB database, as we= ll as 3 and 5TB databases (all on different=C2=A0VMs), and has done so sinc= e at least v8.4.

Ours are on single LVM mount points, as= are the disks that hold the PgBackRest savesets.

= "considering typical transaction loads."

Pfft...there are no typical transaction loads.=C2=A0 Is this db OLTP, Repo= rting or DW?

On Wed, Jun 25, 2025 at 4:35=E2=80= =AFAM Motog Plus <mplus7535@gmail= .com> wrote:
Dear PostgreSQL Community,

We are implementing a new feature in our application that is expe= cted to generate a significant amount of data, and we are seeking your expe= rt guidance on how to best handle this growth within our existing PostgreSQL setup.

=C2=A0

Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubun= tu instance with the following specifications:

  • RAM: 32 GB
  • = Disk:<= /b> 1.2 TB=
  • vCPUs: 16

=C2=A0

Our database architecture utilizes a primary-standby streaming re= plication setup. Application modules (running in Kubernetes pods) connect t= o the database through Pgpool-II, using HikariCP for connection pooling.

=C2=A0

We have multiple databases on our primary server, with their appr= oximate current sizes as follows:

  • C: 620 GB
  • <= span style=3D"font-size:11pt;font-family:Calibri,sans-serif">M:<= span style=3D"font-size:11pt;font-family:Calibri,sans-serif"> 225 GB=
  • P: 59 GB
  • K: 13 MB

=C2=A0

The total current size of our databases is around 1 TB. With the new feature, we anticipate a substantial increase in = data, potentially reaching 10 TB over the next 5-7 years.

=C2=A0

Below is the table for current size and expected growth in size:<= u>

=C2=A0

S.No.

DB

Current DB size

Future DB size

Schema Name<= /b>

Current Schema size=

Future Schema size

1

C

1 TB

8 TB - 10 TB

acc

297 GB

3 TB - 4 TB

po

270 GB

2.6 TB - 3.5 TB=

pa

27 GB

270 GB

pra

13 GB

130 GB

fu

13 GB

130 GB

te

167 MB

2 GB

pro

30 MB

300 MB

2

M

225 GB

2.2 TB - 3 TB

bi

82 GB

820 GB

co

80 GB

800 GB

ps

17 GB

170 GB

qo

16 GB

160 GB

to

7 GB

70 GB

in

7 GB

70 GB

di

6 GB

60 GB

no

4 GB

40 GB

do

4 GB

40 GB

cl

3 GB

30 GB

3

P

60 GB

600 GB

au

45 GB

450 GB

fi

8 GB

80 GB

con

4 GB

40 GB

ba

1 GB

10 GB

li

2 MB

20 GB

=C2=A0

=C2=A0

We would greatly appreciate your insights on the following points= :

  1. Scalability for Large Datasets: Conceptually, PostgreSQL is kno= wn to handle large datasets. However, we'd like to confirm if a single Po= stgreSQL instance can realistically and efficiently manage 10-12 TB of data= in a production environment, considering typical transaction loads.=
  2. Database Split Strategy: Our largest databas= e, "C," currently occupies 620 GB. It contains multiple schemas. We are considering splittin= g database "C" into two new databases: "C1" to exclusiv= ely house the "acc" schema, and "C2" for the remaining = schemas. Is this a recommended approach for managing growth, and what are the potential pros and cons?
  3. Server= Allocation for Split Databases: If we proceed with splitting "C" into "C1" and "C2," would it be advisabl= e to assign a new, separate database server for "C2," or could bo= th "C1" and "C2" reside on the same database server? Wh= at factors should we consider in making this decision?=
  4. Performance Limits per Database and Database Server: From = a performance perspective, is there a general "limit" or best practice for the maximum amou= nt of data a single database server should handle (e.g., 10 TB) and similar= ly general limit per database? How does this influence the decision to add = more database servers?
  5. = Best Practice= s for Large-Scale Data Management: Beyond standard practices like indexing and partitioning, what other best practices should= we consider implementing to ensure optimal performance and manageability w= ith such a large dataset? This could include configurations, maintenance st= rategies, etc.
  6. Hardware Configuratio= n Recommendations: Based on our projected data growth and desired performance, what hardware configurations (e.g., R= AM, CPU, storage I/O, storage type like NVMe) would you recommend for futur= e database servers to efficiently handle 10-12 TB?
  7. Open-Source Horizontal Scaling Solutions: Are there any open-s= ource horizontal scaling solutions for PostgreSQL (other than Citus Data) that t= he community recommends or has experience with for managing extremely large= datasets? Any pointers or guidance on this would be highly valuable.

=C2=A0

Thank you in advance for your time and expertise. We look forward= to your valuable insights.=C2=A0

Thanks & Regards,=

Ramzy=C2=A0



--
Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.
<Redacted> lobs= ter!
--00000000000063b53c063878feaf--