Content-Type: multipart/alternative;
 boundary="------------oh8V2QZZ6D3Rz1oWmxC8AAyD"
Date: Mon, 24 Nov 2025 16:51:35 -0500
MIME-Version: 1.0
Subject: Re: Schema design: user account deletion vs. keeping family tree data
In-Reply-To: 
 <CAEzggP_UOGwe5BA9s3iLY3R8LEi4QztA=2ka+vkRp-Wy64vXdQ@mail.gmail.com>
Content-Transfer-Encoding: 7bit
From: pg254kl@georgiou.vip
To: Christoph Pieper <christoph@fecra.de>,"pgsql-generallists.postgresql.org"
 <pgsql-general@lists.postgresql.org>
Message-ID: <176402110004.8.5805411983290632546.1025420735@georgiou.vip>
References: 
 <CAEzggP_UOGwe5BA9s3iLY3R8LEi4QztA=2ka+vkRp-Wy64vXdQ@mail.gmail.com>
Archived-At: <https://www.postgresql.org/message-id/176402110004.8.5805411983290632546.1025420735%40georgiou.vip>
Precedence: bulk

This is a multi-part message in MIME format.
--------------oh8V2QZZ6D3Rz1oWmxC8AAyD
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

Option B would be fine with me, unless there is good reason to normalize 
it further.  A query using recursive CTE would be able to find ancestors 
and descendants neatly and efficiently.

I deal with some tables in the billions of rows, and with that hat on, I 
would use int/bigint identity for the PKs instead of UUIDs (less 
storage, smaller indices, faster joins).  I would have a boolean 
'active' column to handle soft deletes, along with created_at and 
disabled_at timestamptz columns maintained by triggers.  I would use 
composite partitioning, first level partition by list on 'active', and 
second level partition by range on the id PK with the range being a few 
million.  If for some reason you have to use UUIDs, use time-based 
UUIDv7 (native on PostgreSQL v18) so you can range partition.

-- 
regards,
Kiriakos Georgiou


On 11/24/25 6:27 AM, Christoph Pieper - christoph at fecra.de wrote:
> Hi,
>
> I’m designing a schema for a family‑tree web app on PostgreSQL. Users 
> register accounts and can create one or more family trees. Each tree 
> consists of persons (the user themself, relatives, ancestors). Many 
> persons in a tree will never have an account (e.g. 
> great‑grandparents). Because of GDPR, when a user deletes their 
> account we must remove/anonymise their user profile, but we want to 
> keep the family tree data intact so that other users can still 
> reference those ancestors.
>
> We expect hundreds of thousands to millions of persons and deep 
> ancestry queries (N generations, inbreeding/relationship calculations).
> I’m hesitating between two schema designs:
>
> *Option A – Separate family_tree_node table*
>
> create table app_user (
>   id          uuid primary key,
>   email       text unique not null,
>   created_at  timestamptz not null default now()
> );
>
> create table person (
>   id                 uuid primary key,
>   created_by_user_id uuid references app_user(id) on delete set null,
>   first_name         text,
>   last_name          text,
>   birth_date         date
>   -- more non-account-specific attributes may be added her in future!
> );
>
> create table family_tree (
>   id            uuid primary key,
>   owner_user_id uuid not null references app_user(id) on delete cascade,
>   created_at    timestamptz not null default now()
> );
>
> create table family_tree_node (
>   id              uuid primary key,
>   family_tree_id  uuid not null references family_tree(id) on delete 
> cascade,
>   person_id       uuid references person(id) on delete set null,
>   father_node_id  uuid references family_tree_node(id),
>   mother_node_id  uuid references family_tree_node(id)
> );
>
> create index on family_tree_node (family_tree_id);
> create index on family_tree_node (person_id);
> create index on family_tree_node (father_node_id);
> create index on family_tree_node (mother_node_id);
>
> Here family_tree_node is the structural graph for a specific tree. A 
> node may point to a person, but can also exist without one (minimal 
> data only). If a user/account is deleted, we only drop/anonymise data 
> in app_user (and optionally created_by_user_id), while person and 
> family_tree_node remain.
>
> *Option B – Use person directly as the graph node (soft delete)*
>
> create table app_user (
>   id          uuid primary key,
>   email       text unique not null,
>   created_at  timestamptz not null default now()
> );
>
> create table person (
>   id                 uuid primary key,
>   created_by_user_id uuid references app_user(id) on delete set null,
>   first_name         text,
>   last_name          text,
>   birth_date         date,
>   father_id          uuid references person(id),
>   mother_id          uuid references person(id),
>   deleted_at         timestamptz     -- soft delete flag
> );
>
> create index on person (father_id);
> create index on person (mother_id);
> create index on person (deleted_at);
>
> In this model, the pedigree graph is just a person(father_id, 
> mother_id). When a user deletes their account we never hard‑delete 
> persons; instead we set deleted_at and/or anonymise some fields. All 
> queries must filter on deleted_at is null to hide soft‑deleted persons.
>
> Question:
> From a PostgreSQL point of view (database best practices, data 
> integrity, performance and long‑term maintainability at millions of 
> rows), which approach would you prefer, or is there a better pattern 
> for this kind of “account can be deleted, but genealogy should remain” 
> use case?
>
> Regards and many thanks!
> Christoph
>
>
>
> -- 
> fecra company logo 	
>
> *Christoph Pieper*
>
> christoph@fecra.de <mailto:christoph@fecra.de>
>
> fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland
>
> www.fecra.de <https://www.fecra.de/>  | HRB 268518 B
>
--------------oh8V2QZZ6D3Rz1oWmxC8AAyD
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Option B would be fine with me, unless there is good reason to
      normalize it further.  A query using recursive CTE would be able
      to find ancestors and descendants neatly and efficiently.</p>
    <p>I deal with some tables in the billions of rows, and with that
      hat on, I would use int/bigint identity for the PKs instead of
      UUIDs (less storage, smaller indices, faster joins).  I would have
      a boolean 'active' column to handle soft deletes, along with
      created_at and disabled_at timestamptz columns maintained by
      triggers.  I would use composite partitioning, first level
      partition by list on 'active', and second level partition by range
      on the id PK with the range being a few million.  If for some
      reason you have to use UUIDs, use time-based UUIDv7 (native on
      PostgreSQL v18) so you can range partition.</p>
    <pre class="moz-signature" cols="72">-- 
regards,
Kiriakos Georgiou</pre>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 11/24/25 6:27 AM, Christoph Pieper -
      christoph at fecra.de wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAEzggP_UOGwe5BA9s3iLY3R8LEi4QztA=2ka+vkRp-Wy64vXdQ@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div>Hi,<br>
          <br>
          I’m designing a schema for a family‑tree web app on
          PostgreSQL. Users register accounts and can create one or more
          family trees. Each tree consists of persons (the user
          themself, relatives, ancestors). Many persons in a tree will
          never have an account (e.g. great‑grandparents). Because of
          GDPR, when a user deletes their account we must
          remove/anonymise their user profile, but we want to keep the
          family tree data intact so that other users can still
          reference those ancestors.<br>
          <br>
          We expect hundreds of thousands to millions of persons and
          deep ancestry queries (N generations, inbreeding/relationship
          calculations).</div>
        <div>I’m hesitating between two schema designs:</div>
        <div><br>
        </div>
        <div><b>Option A – Separate family_tree_node table</b></div>
        <div><br>
          <font face="monospace">create table app_user (<br>
              id          uuid primary key,<br>
              email       text unique not null,<br>
              created_at  timestamptz not null default now()<br>
            );<br>
            <br>
            create table person (<br>
              id                 uuid primary key,<br>
              created_by_user_id uuid references app_user(id) on delete
            set null,<br>
              first_name         text,<br>
              last_name          text,<br>
              birth_date         date<br>
              -- more non-account-specific attributes may be added her
            in future!<br>
            );<br>
            <br>
            create table family_tree (<br>
              id            uuid primary key,<br>
              owner_user_id uuid not null references app_user(id) on
            delete cascade,<br>
              created_at    timestamptz not null default now()<br>
            );<br>
            <br>
            create table family_tree_node (<br>
              id              uuid primary key,<br>
              family_tree_id  uuid not null references family_tree(id)
            on delete cascade,<br>
              person_id       uuid references person(id) on delete set
            null,<br>
              father_node_id  uuid references family_tree_node(id),<br>
              mother_node_id  uuid references family_tree_node(id)<br>
            );<br>
            <br>
            create index on family_tree_node (family_tree_id);<br>
            create index on family_tree_node (person_id);<br>
            create index on family_tree_node (father_node_id);<br>
            create index on family_tree_node (mother_node_id);</font><br>
          <br>
          Here family_tree_node is the structural graph for a specific
          tree. A node may point to a person, but can also exist without
          one (minimal data only). If a user/account is deleted, we only
          drop/anonymise data in app_user (and optionally
          created_by_user_id), while person and family_tree_node remain.</div>
        <div><br>
        </div>
        <div><b>Option B – Use person directly as the graph node (soft
            delete)</b></div>
        <div><br>
          <font face="monospace">create table app_user (<br>
              id          uuid primary key,<br>
              email       text unique not null,<br>
              created_at  timestamptz not null default now()<br>
            );<br>
            <br>
            create table person (<br>
              id                 uuid primary key,<br>
              created_by_user_id uuid references app_user(id) on delete
            set null,<br>
              first_name         text,<br>
              last_name          text,<br>
              birth_date         date,<br>
              father_id          uuid references person(id),<br>
              mother_id          uuid references person(id),<br>
              deleted_at         timestamptz     -- soft delete flag<br>
            );<br>
            <br>
            create index on person (father_id);<br>
            create index on person (mother_id);<br>
            create index on person (deleted_at);</font><br>
          <br>
          In this model, the pedigree graph is just a person(father_id,
          mother_id). When a user deletes their account we never
          hard‑delete persons; instead we set deleted_at and/or
          anonymise some fields. All queries must filter on deleted_at
          is null to hide soft‑deleted persons.<br>
          <br>
          Question:</div>
        <div>From a PostgreSQL point of view (database best practices,
          data integrity, performance and long‑term maintainability at
          millions of rows), which approach would you prefer, or is
          there a better pattern for this kind of “account can be
          deleted, but genealogy should remain” use case?</div>
        <div><br>
        </div>
        <div>Regards and many thanks!</div>
        <div>Christoph</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <span class="gmail_signature_prefix">-- </span><br>
        <div dir="ltr" class="gmail_signature"
          data-smartmail="gmail_signature">
          <div dir="ltr">
            <table
style="font-family:Arial,sans-serif;font-size:14px;color:rgb(51,51,51)">
              <tbody>
                <tr>
                  <td style="vertical-align:top"><img
                      src="https://www.fecra.de/fecra_logo.png"
                      alt="fecra company logo" width="60" height="60"
                      moz-do-not-send="true"></td>
                  <td style="padding-left:15px">
                    <p style="margin:0px;padding-bottom:0.25rem"><strong>Christoph
                        Pieper</strong></p>
                    <p style="margin:0px;padding-bottom:0.5rem"><a
                        href="mailto:christoph@fecra.de" target="_blank"
                        moz-do-not-send="true"><font color="#666666">christoph@fecra.de</font></a></p>
                    <p style="margin:0px;color:rgb(90,90,90)">fecra
                      GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland</p>
                    <p style="margin:0px"><a
                        href="https://www.fecra.de/"
                        style="color:rgb(90,90,90)" target="_blank"
                        moz-do-not-send="true">www.fecra.de</a>  | <span
                        style="color:rgb(90,90,90)">HRB 268518 B</span></p>
                  </td>
                </tr>
              </tbody>
            </table>
          </div>
        </div>
      </div>
    </blockquote>
    <pre class="moz-signature" cols="72">
</pre>
  </body>
</html>

--------------oh8V2QZZ6D3Rz1oWmxC8AAyD--