Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uUetS-006b7J-Pq for pgsql-admin@arkaria.postgresql.org; Thu, 26 Jun 2025 05:11:27 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uUetQ-007iBZ-Kp for pgsql-admin@arkaria.postgresql.org; Thu, 26 Jun 2025 05:11:25 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uUetP-007iBR-Ra for pgsql-admin@lists.postgresql.org; Thu, 26 Jun 2025 05:11:24 +0000 Received: from mail-pg1-x531.google.com ([2607:f8b0:4864:20::531]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uUetO-0042q3-0Y for pgsql-admin@lists.postgresql.org; Thu, 26 Jun 2025 05:11:23 +0000 Received: by mail-pg1-x531.google.com with SMTP id 41be03b00d2f7-b2c4331c50eso487020a12.3 for ; Wed, 25 Jun 2025 22:11:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750914681; x=1751519481; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=W+P+DkU1kbp07bTrpD9EjJHTCuXAFDk6x1TSZ0cWCjM=; b=hbjB2rcOHWFOjTdVsy3enDjgSnyksRhlCpllVVdVycp5RSti0sltL2YQdkkvuk7HQW SZBCivm7GgzkVsik55G1zw5rinytiRW79fjy4hAB6K1GwSyKeqpZKeAYwYwXlSrOqvFY 1bfeBJoOzCIHUFvkIWV+uU1OWsl+uNkIucf4D55od7tJ0GDBBvDS0aev494tVxSOmteq 3wRGOZvZVL3nAJ3aw9LsuudUcKr6IruKii9sdCiGx/QLrp1MFDPUseVvmZiMiA9s6V+D 5TEBfh6pCczo2DPwT3XgGY+qndk0s3c9ioteNMmUxou9R2dFwA3ANGLGoR9GsG2gim5D tkqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750914681; x=1751519481; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W+P+DkU1kbp07bTrpD9EjJHTCuXAFDk6x1TSZ0cWCjM=; b=kXTz9UHC/O2gwXs9R6TH91MmtkcsslC8NzUpHVsbec8Rg/dk9wxv/mkRvm5yt3S+PN orrAaHv36PKZ+/1UMZWMaEogL628dE7Hw5HxKBlwSRl2ORtbIrN9uahsk9YU7zeTBkaf QbGeUgVuJ1S2jTjcbnNvzHJJYVLyRjTcZpgy800Jss6AQ972z/+tws8eijT4WN9Ps/aj cqgWn8yX/A6N/dYV3lO7xViE2mFEDg3m8yEBgb9LtnzP3mNAdeXwqgmhFvP5QFLPnCK2 fBh0FkkC3PdzN9aU1hHrwUUbf9YdZXLpj1JSd1XEj1nGODhFqcDryLMEIyC86xpgclCz zDkA== X-Gm-Message-State: AOJu0Ywqd4tI5IsIKso/EVy2njf/uvr2wnbkTnh4XiB6BZiAXEnKVNRc xsOc3FdcfX98sIxs7fN5obHQT8S0o7ySsmdXWAkjvXqitRsOEOXcojtfnGtah2dp5+rp0LNAxur w8RpT0c7DmQV5cURDkzyHXOfvzZjAUOeS5a7z X-Gm-Gg: ASbGnctyzY5PIvU84zfj2Tn+LXs0bimgmP3QURTXmNkfQjKbg852RWllkodPJhhr/I7 GU/0lpNwTsjngQAseN++g82v5b5/7eBaiJ5nlWAk4MoYNVzq+F36qp1cxNyCbdGmktPXFdP4Opl eGK4ziQcEwepXBAxnGHytNDwYMunR7RhQ6keqeKCFy+VqoiPxok1j6m9kMcPhhdihRQhNIqH9kZ NY= X-Google-Smtp-Source: AGHT+IEShaSTZKOycWPMbqAtoHHXyM9WSTOjn8Fn/VEAoCJpIPUrkTDIU9NCGTlRDpSR7vWTzC4hpis6Z61PRoKAHnA= X-Received: by 2002:a17:90b:5303:b0:311:c970:c9bc with SMTP id 98e67ed59e1d1-315f26a24a6mr8208019a91.30.1750914680612; Wed, 25 Jun 2025 22:11:20 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Motog Plus Date: Thu, 26 Jun 2025 10:41:09 +0530 X-Gm-Features: Ac12FXw3RELMDxFYwMX61zEJ87Bs2e24NgZpSI0zK5peMme_74M5rWk22OK6tl4 Message-ID: Subject: Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation To: Pgsql-admin Content-Type: multipart/alternative; boundary="00000000000091ec2d0638729716" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000091ec2d0638729716 Content-Type: text/plain; charset="UTF-8" Dear Community, We are still actively seeking your valuable insights and guidance on managing our anticipated data growth. Any thoughts or recommendations on the points raised would be greatly appreciated. Thanks & Regards, Ramzy On Wed, Jun 25, 2025, 14:05 Motog Plus wrote: > Dear PostgreSQL Community, > > We are implementing a new feature in our application that is expected to > generate a significant amount of data, and we are seeking your expert > guidance on how to best handle this growth within our existing PostgreSQL > setup. > > > > Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu > instance with the following specifications: > > - *RAM:* 32 GB > - *Disk:* 1.2 TB > - *vCPUs:* 16 > > > > Our database architecture utilizes a primary-standby streaming replication > setup. Application modules (running in Kubernetes pods) connect to the > database through Pgpool-II, using HikariCP for connection pooling. > > > > We have multiple databases on our primary server, with their approximate > current sizes as follows: > > - *C:* 620 GB > - *M:* 225 GB > - *P:* 59 GB > - *K:* 13 MB > > > > The total current size of our databases is around *1 TB*. With the new > feature, we anticipate a substantial increase in data, potentially reaching *10 > TB* over the next 5-7 years. > > > > Below is the table for current size and expected growth in size: > > > > *S.No.* > > *DB* > > *Current DB size* > > *Future DB size* > > *Schema Name* > > *Current Schema size* > > *Future Schema size * > > 1 > > C > > 1 TB > > 8 TB - 10 TB > > acc > > 297 GB > > 3 TB - 4 TB > > po > > 270 GB > > 2.6 TB - 3.5 TB > > pa > > 27 GB > > 270 GB > > pra > > 13 GB > > 130 GB > > fu > > 13 GB > > 130 GB > > te > > 167 MB > > 2 GB > > pro > > 30 MB > > 300 MB > > 2 > > M > > 225 GB > > 2.2 TB - 3 TB > > bi > > 82 GB > > 820 GB > > co > > 80 GB > > 800 GB > > ps > > 17 GB > > 170 GB > > qo > > 16 GB > > 160 GB > > to > > 7 GB > > 70 GB > > in > > 7 GB > > 70 GB > > di > > 6 GB > > 60 GB > > no > > 4 GB > > 40 GB > > do > > 4 GB > > 40 GB > > cl > > 3 GB > > 30 GB > > 3 > > P > > 60 GB > > 600 GB > > au > > 45 GB > > 450 GB > > fi > > 8 GB > > 80 GB > > con > > 4 GB > > 40 GB > > ba > > 1 GB > > 10 GB > > li > > 2 MB > > 20 GB > > > > > > We would greatly appreciate your insights on the following points: > > 1. *Scalability for Large Datasets:* Conceptually, PostgreSQL is known > to handle large datasets. However, we'd like to confirm if a single > PostgreSQL instance can realistically and efficiently manage 10-12 TB of > data in a production environment, considering typical transaction loads. > 2. *Database Split Strategy:* Our largest database, "C," currently > occupies 620 GB. It contains multiple schemas. We are considering splitting > database "C" into two new databases: "C1" to exclusively house the "acc" > schema, and "C2" for the remaining schemas. Is this a recommended approach > for managing growth, and what are the potential pros and cons? > 3. *Server Allocation for Split Databases:* If we proceed with > splitting "C" into "C1" and "C2," would it be advisable to assign a new, > separate database server for "C2," or could both "C1" and "C2" reside on > the same database server? What factors should we consider in making this > decision? > 4. *Performance Limits per Database and Database Server:* From a > performance perspective, is there a general "limit" or best practice for > the maximum amount of data a single database server should handle (e.g., 10 > TB) and similarly general limit per database? How does this influence the > decision to add more database servers? > 5. *Best Practices for Large-Scale Data Management:* Beyond standard > practices like indexing and partitioning, what other best practices should > we consider implementing to ensure optimal performance and manageability > with such a large dataset? This could include configurations, maintenance > strategies, etc. > 6. *Hardware Configuration Recommendations:* Based on our projected > data growth and desired performance, what hardware configurations (e.g., > RAM, CPU, storage I/O, storage type like NVMe) would you recommend for > future database servers to efficiently handle 10-12 TB? > 7. *Open-Source Horizontal Scaling Solutions:* Are there any > open-source horizontal scaling solutions for PostgreSQL (other than Citus > Data) that the community recommends or has experience with for managing > extremely large datasets? Any pointers or guidance on this would be highly > valuable. > > > > Thank you in advance for your time and expertise. We look forward to your > valuable insights. > > Thanks & Regards, > > Ramzy > --00000000000091ec2d0638729716 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Dear Community,

We are still actively seeking your valuable insights and guidance on ma= naging our anticipated data growth.

Any thoughts or recommendations on the points raised would=C2= =A0be greatly appreciated.

Thanks & Regards,
Ramzy

On Wed, Jun 25, 2025, 14:05 Motog Plus <mplus7535@gmail.com> wrote:
Dear PostgreSQL Community,

We are implementing a new feature in our applicatio= n that is expected to generate a significant amount of data, and we are see= king your expert guidance on how to best handle this growth within our existing PostgreSQL setup.

=C2=A0

Currently, our PostgreSQL instance runs on an EC2 c= 5.4xlarge Ubuntu instance with the following specifications:<= /span>

  • RAM: 32 GB
  • Disk: 1.2 TB<= /li>
  • vCPUs: 16<= /li>

=C2=A0

Our database architecture utilizes a primary-standb= y streaming replication setup. Application modules (running in Kubernetes p= ods) connect to the database through Pgpool-II, using HikariCP for connection pooling.

=C2=A0

We have multiple databases on our primary server, w= ith their approximate current sizes as follows:

  • C: 620 GB
  • =
  • M: 225 GB
  • =
  • P: 59 GB
  • <= li class=3D"MsoNormal">K: 13 MB

    =C2=A0

    The total current size of our databases is around 1 TB. With the new feature, we anticipate a substantial increase in = data, potentially reaching 10 TB over the next 5-7 years.

    =C2=A0

    Below is the table for current size and expected gr= owth in size:

    =C2=A0

    S.No.<= /p>

    DB

    Current DB size

    Future DB size

    Schema Name

    Current Schema size

    Future Schema size

    1

    C

    1 TB

    8 TB - 10 TB

    acc

    297 GB

    3 TB - 4 TB

    po

    270 GB

    2.6 TB - 3.5 TB

    pa

    27 GB

    270 GB

    pra

    13 GB

    130 GB

    fu

    13 GB

    130 GB

    te

    167 MB

    2 GB

    pro

    30 MB

    300 MB

    2

    M

    225 GB

    2.2 TB - 3 TB

    bi

    82 GB

    820 GB

    co

    80 GB

    800 GB

    ps

    17 GB

    170 GB

    qo

    16 GB

    160 GB

    to

    7 GB

    70 GB

    in

    7 GB

    70 GB

    di

    6 GB

    60 GB

    no

    4 GB

    40 GB

    do

    4 GB

    40 GB

    cl

    3 GB

    30 GB

    3

    P

    60 GB

    600 GB

    au

    45 GB

    450 GB

    fi

    8 GB

    80 GB

    con

    4 GB

    40 GB

    ba

    1 GB

    10 GB

    li

    2 MB

    20 GB

    =C2=A0

    =C2=A0

    We would greatly appreciate your insights on the fo= llowing points:

    1. Scalability for Large Datasets: Con= ceptually, PostgreSQL is known to handle large datasets. However, we'd like to confirm if a single Po= stgreSQL instance can realistically and efficiently manage 10-12 TB of data= in a production environment, considering typical transaction loads.=
    2. Database Split Strategy: Our largest database, "C," currently occupies 620 GB. It contains multiple schemas. We are considering splittin= g database "C" into two new databases: "C1" to exclusiv= ely house the "acc" schema, and "C2" for the remaining = schemas. Is this a recommended approach for managing growth, and what are the potential pros and cons?
    3. Server Allocation for Split Databases: If we proceed wi= th splitting "C" into "C1" and "C2," would it be advisabl= e to assign a new, separate database server for "C2," or could bo= th "C1" and "C2" reside on the same database server? Wh= at factors should we consider in making this decision?=
    4. Performance Limits per Database and Databa= se Server: From a performance perspective, is there a general "limit" or best practice for the maximum amou= nt of data a single database server should handle (e.g., 10 TB) and similar= ly general limit per database? How does this influence the decision to add = more database servers?
    5. = Best Practices for Large-Scale Data Management: Beyond standar= d practices like indexing and partitioning, what other best practices should= we consider implementing to ensure optimal performance and manageability w= ith such a large dataset? This could include configurations, maintenance st= rategies, etc.
    6. Hardwar= e Configuration Recommendations: Based on our projected data growth and desired performance, what hardware configurations (e.g., R= AM, CPU, storage I/O, storage type like NVMe) would you recommend for futur= e database servers to efficiently handle 10-12 TB?
    7. Open-Source Horizontal Scaling Solutions: Are there any open-source horizontal scaling solutions for PostgreSQL (other than Citus Data) that t= he community recommends or has experience with for managing extremely large= datasets? Any pointers or guidance on this would be highly valuable.

    =C2=A0

    Thank you in advance for your time and expertise. W= e look forward to your valuable insights.=C2=A0

    Thanks & Regards,

    Ramzy=C2=A0

--00000000000091ec2d0638729716--