public inbox for [email protected]  
help / color / mirror / Atom feed
From: [email protected]
To: Christoph Pieper <[email protected]>
To: pgsql-generallists.postgresql.org <[email protected]>
Subject: Re: Schema design: user account deletion vs. keeping family tree data
Date: Mon, 24 Nov 2025 16:51:35 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAEzggP_UOGwe5BA9s3iLY3R8LEi4QztA=2ka+vkRp-Wy64vXdQ@mail.gmail.com>
References: <CAEzggP_UOGwe5BA9s3iLY3R8LEi4QztA=2ka+vkRp-Wy64vXdQ@mail.gmail.com>

Option B would be fine with me, unless there is good reason to normalize 
it further.  A query using recursive CTE would be able to find ancestors 
and descendants neatly and efficiently.

I deal with some tables in the billions of rows, and with that hat on, I 
would use int/bigint identity for the PKs instead of UUIDs (less 
storage, smaller indices, faster joins).  I would have a boolean 
'active' column to handle soft deletes, along with created_at and 
disabled_at timestamptz columns maintained by triggers.  I would use 
composite partitioning, first level partition by list on 'active', and 
second level partition by range on the id PK with the range being a few 
million.  If for some reason you have to use UUIDs, use time-based 
UUIDv7 (native on PostgreSQL v18) so you can range partition.

-- 
regards,
Kiriakos Georgiou


On 11/24/25 6:27 AM, Christoph Pieper - christoph at fecra.de wrote:
> Hi,
>
> I’m designing a schema for a family‑tree web app on PostgreSQL. Users 
> register accounts and can create one or more family trees. Each tree 
> consists of persons (the user themself, relatives, ancestors). Many 
> persons in a tree will never have an account (e.g. 
> great‑grandparents). Because of GDPR, when a user deletes their 
> account we must remove/anonymise their user profile, but we want to 
> keep the family tree data intact so that other users can still 
> reference those ancestors.
>
> We expect hundreds of thousands to millions of persons and deep 
> ancestry queries (N generations, inbreeding/relationship calculations).
> I’m hesitating between two schema designs:
>
> *Option A – Separate family_tree_node table*
>
> create table app_user (
>   id          uuid primary key,
>   email       text unique not null,
>   created_at  timestamptz not null default now()
> );
>
> create table person (
>   id                 uuid primary key,
>   created_by_user_id uuid references app_user(id) on delete set null,
>   first_name         text,
>   last_name          text,
>   birth_date         date
>   -- more non-account-specific attributes may be added her in future!
> );
>
> create table family_tree (
>   id            uuid primary key,
>   owner_user_id uuid not null references app_user(id) on delete cascade,
>   created_at    timestamptz not null default now()
> );
>
> create table family_tree_node (
>   id              uuid primary key,
>   family_tree_id  uuid not null references family_tree(id) on delete 
> cascade,
>   person_id       uuid references person(id) on delete set null,
>   father_node_id  uuid references family_tree_node(id),
>   mother_node_id  uuid references family_tree_node(id)
> );
>
> create index on family_tree_node (family_tree_id);
> create index on family_tree_node (person_id);
> create index on family_tree_node (father_node_id);
> create index on family_tree_node (mother_node_id);
>
> Here family_tree_node is the structural graph for a specific tree. A 
> node may point to a person, but can also exist without one (minimal 
> data only). If a user/account is deleted, we only drop/anonymise data 
> in app_user (and optionally created_by_user_id), while person and 
> family_tree_node remain.
>
> *Option B – Use person directly as the graph node (soft delete)*
>
> create table app_user (
>   id          uuid primary key,
>   email       text unique not null,
>   created_at  timestamptz not null default now()
> );
>
> create table person (
>   id                 uuid primary key,
>   created_by_user_id uuid references app_user(id) on delete set null,
>   first_name         text,
>   last_name          text,
>   birth_date         date,
>   father_id          uuid references person(id),
>   mother_id          uuid references person(id),
>   deleted_at         timestamptz     -- soft delete flag
> );
>
> create index on person (father_id);
> create index on person (mother_id);
> create index on person (deleted_at);
>
> In this model, the pedigree graph is just a person(father_id, 
> mother_id). When a user deletes their account we never hard‑delete 
> persons; instead we set deleted_at and/or anonymise some fields. All 
> queries must filter on deleted_at is null to hide soft‑deleted persons.
>
> Question:
> From a PostgreSQL point of view (database best practices, data 
> integrity, performance and long‑term maintainability at millions of 
> rows), which approach would you prefer, or is there a better pattern 
> for this kind of “account can be deleted, but genealogy should remain” 
> use case?
>
> Regards and many thanks!
> Christoph
>
>
>
> -- 
> fecra company logo 	
>
> *Christoph Pieper*
>
> [email protected] <mailto:[email protected]>
>
> fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland
>
> www.fecra.de <https://www.fecra.de/>  | HRB 268518 B
>

view thread (7+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Schema design: user account deletion vs. keeping family tree data
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox