Schema design: user account deletion vs. keeping family tree data

public inbox for [email protected]  
help / color / mirror / Atom feed

Schema design: user account deletion vs. keeping family tree data
7+ messages / 6 participants
[nested] [flat]

* Schema design: user account deletion vs. keeping family tree data
@ 2025-11-24 11:27 Christoph Pieper <[email protected]>
  2025-11-24 13:17 ` Re: Schema design: user account deletion vs. keeping family tree data Bernice Southey <[email protected]>
  2025-11-24 21:51 ` Re: Schema design: user account deletion vs. keeping family tree data [email protected]
  2025-11-25 21:08 ` Re: Schema design: user account deletion vs. keeping family tree data Jan Claeys <[email protected]>
  0 siblings, 3 replies; 7+ messages in thread

From: Christoph Pieper @ 2025-11-24 11:27 UTC (permalink / raw)
  To: [email protected]

Hi,

I’m designing a schema for a family‑tree web app on PostgreSQL. Users
register accounts and can create one or more family trees. Each tree
consists of persons (the user themself, relatives, ancestors). Many persons
in a tree will never have an account (e.g. great‑grandparents). Because of
GDPR, when a user deletes their account we must remove/anonymise their user
profile, but we want to keep the family tree data intact so that other
users can still reference those ancestors.

We expect hundreds of thousands to millions of persons and deep ancestry
queries (N generations, inbreeding/relationship calculations).
I’m hesitating between two schema designs:

*Option A – Separate family_tree_node table*

create table app_user (
  id          uuid primary key,
  email       text unique not null,
  created_at  timestamptz not null default now()
);

create table person (
  id                 uuid primary key,
  created_by_user_id uuid references app_user(id) on delete set null,
  first_name         text,
  last_name          text,
  birth_date         date
  -- more non-account-specific attributes may be added her in future!
);

create table family_tree (
  id            uuid primary key,
  owner_user_id uuid not null references app_user(id) on delete cascade,
  created_at    timestamptz not null default now()
);

create table family_tree_node (
  id              uuid primary key,
  family_tree_id  uuid not null references family_tree(id) on delete
cascade,
  person_id       uuid references person(id) on delete set null,
  father_node_id  uuid references family_tree_node(id),
  mother_node_id  uuid references family_tree_node(id)
);

create index on family_tree_node (family_tree_id);
create index on family_tree_node (person_id);
create index on family_tree_node (father_node_id);
create index on family_tree_node (mother_node_id);

Here family_tree_node is the structural graph for a specific tree. A node
may point to a person, but can also exist without one (minimal data only).
If a user/account is deleted, we only drop/anonymise data in app_user (and
optionally created_by_user_id), while person and family_tree_node remain.

*Option B – Use person directly as the graph node (soft delete)*

create table app_user (
  id          uuid primary key,
  email       text unique not null,
  created_at  timestamptz not null default now()
);

create table person (
  id                 uuid primary key,
  created_by_user_id uuid references app_user(id) on delete set null,
  first_name         text,
  last_name          text,
  birth_date         date,
  father_id          uuid references person(id),
  mother_id          uuid references person(id),
  deleted_at         timestamptz     -- soft delete flag
);

create index on person (father_id);
create index on person (mother_id);
create index on person (deleted_at);

In this model, the pedigree graph is just a person(father_id, mother_id).
When a user deletes their account we never hard‑delete persons; instead we
set deleted_at and/or anonymise some fields. All queries must filter on
deleted_at is null to hide soft‑deleted persons.

Question:
From a PostgreSQL point of view (database best practices, data integrity,
performance and long‑term maintainability at millions of rows), which
approach would you prefer, or is there a better pattern for this kind of
“account can be deleted, but genealogy should remain” use case?

Regards and many thanks!
Christoph

-- 
[image: fecra company logo]

*Christoph Pieper*

[email protected]

fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland

www.fecra.de  | HRB 268518 B

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Schema design: user account deletion vs. keeping family tree data
  2025-11-24 11:27 Schema design: user account deletion vs. keeping family tree data Christoph Pieper <[email protected]>
@ 2025-11-24 13:17 ` Bernice Southey <[email protected]>
  2025-11-24 15:43   ` Re: Schema design: user account deletion vs. keeping family tree data Rob Sargent <[email protected]>
  2 siblings, 1 reply; 7+ messages in thread

From: Bernice Southey @ 2025-11-24 13:17 UTC (permalink / raw)
  To: Christoph Pieper <[email protected]>; +Cc: [email protected]

Christoph Pieper <[email protected]> wrote:
> Question:
> From a PostgreSQL point of view (database best practices, data integrity, performance and long‑term maintainability at millions of rows), which approach would you prefer, or is there a better pattern for this kind of “account can be deleted, but genealogy should remain” use case?

I can tell you what I'm doing. It solved many design problems, but I
don't claim it's "best practice". I split my table in two.
    1 - columns that I can keep indefinitely
    2 - personal data
That way I just delete the personal data row when I want to remove it.

Thanks, Bernice

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Schema design: user account deletion vs. keeping family tree data
  2025-11-24 11:27 Schema design: user account deletion vs. keeping family tree data Christoph Pieper <[email protected]>
  2025-11-24 13:17 ` Re: Schema design: user account deletion vs. keeping family tree data Bernice Southey <[email protected]>
@ 2025-11-24 15:43   ` Rob Sargent <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Rob Sargent @ 2025-11-24 15:43 UTC (permalink / raw)
  To: Bernice Southey <[email protected]>; +Cc: Christoph Pieper <[email protected]>; [email protected]



> On Nov 24, 2025, at 6:18 AM, Bernice Southey <[email protected]> wrote:
> 
> Christoph Pieper <[email protected]> wrote:
>> Question:
>> From a PostgreSQL point of view (database best practices, data integrity, performance and long‑term maintainability at millions of rows), which approach would you prefer, or is there a better pattern for this kind of “account can be deleted, but genealogy should remain” use case?
> 
> I can tell you what I'm doing. It solved many design problems, but I
> don't claim it's "best practice". I split my table in two.
>    1 - columns that I can keep indefinitely
>    2 - personal data
> That way I just delete the personal data row when I want to remove it.
> 
> Thanks, Bernice
> 
> 

+1
Names are tricky, messy things.[1] Keep egoMaPa as leans as possible.

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Schema design: user account deletion vs. keeping family tree data
  2025-11-24 11:27 Schema design: user account deletion vs. keeping family tree data Christoph Pieper <[email protected]>
@ 2025-11-24 21:51 ` [email protected]
  2 siblings, 0 replies; 7+ messages in thread

From: [email protected] @ 2025-11-24 21:51 UTC (permalink / raw)
  To: Christoph Pieper <[email protected]>; pgsql-generallists.postgresql.org <[email protected]>

Option B would be fine with me, unless there is good reason to normalize 
it further.  A query using recursive CTE would be able to find ancestors 
and descendants neatly and efficiently.

I deal with some tables in the billions of rows, and with that hat on, I 
would use int/bigint identity for the PKs instead of UUIDs (less 
storage, smaller indices, faster joins).  I would have a boolean 
'active' column to handle soft deletes, along with created_at and 
disabled_at timestamptz columns maintained by triggers.  I would use 
composite partitioning, first level partition by list on 'active', and 
second level partition by range on the id PK with the range being a few 
million.  If for some reason you have to use UUIDs, use time-based 
UUIDv7 (native on PostgreSQL v18) so you can range partition.

-- 
regards,
Kiriakos Georgiou


On 11/24/25 6:27 AM, Christoph Pieper - christoph at fecra.de wrote:
> Hi,
>
> I’m designing a schema for a family‑tree web app on PostgreSQL. Users 
> register accounts and can create one or more family trees. Each tree 
> consists of persons (the user themself, relatives, ancestors). Many 
> persons in a tree will never have an account (e.g. 
> great‑grandparents). Because of GDPR, when a user deletes their 
> account we must remove/anonymise their user profile, but we want to 
> keep the family tree data intact so that other users can still 
> reference those ancestors.
>
> We expect hundreds of thousands to millions of persons and deep 
> ancestry queries (N generations, inbreeding/relationship calculations).
> I’m hesitating between two schema designs:
>
> *Option A – Separate family_tree_node table*
>
> create table app_user (
>   id          uuid primary key,
>   email       text unique not null,
>   created_at  timestamptz not null default now()
> );
>
> create table person (
>   id                 uuid primary key,
>   created_by_user_id uuid references app_user(id) on delete set null,
>   first_name         text,
>   last_name          text,
>   birth_date         date
>   -- more non-account-specific attributes may be added her in future!
> );
>
> create table family_tree (
>   id            uuid primary key,
>   owner_user_id uuid not null references app_user(id) on delete cascade,
>   created_at    timestamptz not null default now()
> );
>
> create table family_tree_node (
>   id              uuid primary key,
>   family_tree_id  uuid not null references family_tree(id) on delete 
> cascade,
>   person_id       uuid references person(id) on delete set null,
>   father_node_id  uuid references family_tree_node(id),
>   mother_node_id  uuid references family_tree_node(id)
> );
>
> create index on family_tree_node (family_tree_id);
> create index on family_tree_node (person_id);
> create index on family_tree_node (father_node_id);
> create index on family_tree_node (mother_node_id);
>
> Here family_tree_node is the structural graph for a specific tree. A 
> node may point to a person, but can also exist without one (minimal 
> data only). If a user/account is deleted, we only drop/anonymise data 
> in app_user (and optionally created_by_user_id), while person and 
> family_tree_node remain.
>
> *Option B – Use person directly as the graph node (soft delete)*
>
> create table app_user (
>   id          uuid primary key,
>   email       text unique not null,
>   created_at  timestamptz not null default now()
> );
>
> create table person (
>   id                 uuid primary key,
>   created_by_user_id uuid references app_user(id) on delete set null,
>   first_name         text,
>   last_name          text,
>   birth_date         date,
>   father_id          uuid references person(id),
>   mother_id          uuid references person(id),
>   deleted_at         timestamptz     -- soft delete flag
> );
>
> create index on person (father_id);
> create index on person (mother_id);
> create index on person (deleted_at);
>
> In this model, the pedigree graph is just a person(father_id, 
> mother_id). When a user deletes their account we never hard‑delete 
> persons; instead we set deleted_at and/or anonymise some fields. All 
> queries must filter on deleted_at is null to hide soft‑deleted persons.
>
> Question:
> From a PostgreSQL point of view (database best practices, data 
> integrity, performance and long‑term maintainability at millions of 
> rows), which approach would you prefer, or is there a better pattern 
> for this kind of “account can be deleted, but genealogy should remain” 
> use case?
>
> Regards and many thanks!
> Christoph
>
>
>
> -- 
> fecra company logo 	
>
> *Christoph Pieper*
>
> [email protected] <mailto:[email protected]>
>
> fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland
>
> www.fecra.de <https://www.fecra.de/>  | HRB 268518 B
>

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Schema design: user account deletion vs. keeping family tree data
  2025-11-24 11:27 Schema design: user account deletion vs. keeping family tree data Christoph Pieper <[email protected]>
@ 2025-11-25 21:08 ` Jan Claeys <[email protected]>
  2025-11-25 22:02   ` Re: Schema design: user account deletion vs. keeping family tree data Ron Johnson <[email protected]>
  2 siblings, 1 reply; 7+ messages in thread

From: Jan Claeys @ 2025-11-25 21:08 UTC (permalink / raw)
  To: [email protected]

On Mon, 2025-11-24 at 12:27 +0100, Christoph Pieper wrote:
> I’m designing a schema for a family‑tree web app on PostgreSQL. Users
> register accounts and can create one or more family trees. Each tree
> consists of persons (the user themself, relatives, ancestors). Many
> persons in a tree will never have an account (e.g.
> great‑grandparents).
> Because of GDPR, when a user deletes their account we must
> remove/anonymise their user profile, but we want to keep the family
> tree data intact so that other users can still reference those
> ancestors.

Be careful. Storing and especially sharing/publishing any personal data
of, or closely related to, a living person (including the relations of
such person) would likely be a problem without permission from that
person. You probably want to contact a lawyer who’s familiar with the
GDPR & other privacy laws…

Personally, I would always keep tree data from different users apart,
give them detailed per-record control over what data can be published
and/or shared, and mark any records of living people as hidden/private
by default. And I would delete all records a user created when they
delete their account, or at the very least all those belonging to
living people.

----

About the schema design:
 * both your options assume a person has exactly 1 father and 1 mother
   (probably better just call them "parents" nowadays), and has only 1
   pair of parents (what with people who were adopted, etc.?)
 * "first name" & "last name" are assumptions that only make sense in
   some countries (even when your users are only German, their
   ancestors might not all be), and of course a person might have
   different legal names over their life
 * birth dates in genealogy are often not precise, especially if you go
   further in time, and the Postgres date type can’t  express things
   like "November 1810", "about 1534", "1913 or 1918" or "between 1610
   and 1615", so might need a custom date type (and you later probably
   also want to be able to store/link many other dates?)

Genealogy is messy, and you will have to be able to store all sorts of
data you didn’t expect at first thought (see also the website about
names Rob Sargent linked to).
You also seem to make assumptions about relations being 1:1 or 1:N when
in reality they are very often 1:N or N:N instead.

-- 
Jan Claeys (please don't CC me when replying to the list)

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Schema design: user account deletion vs. keeping family tree data
  2025-11-24 11:27 Schema design: user account deletion vs. keeping family tree data Christoph Pieper <[email protected]>
  2025-11-25 21:08 ` Re: Schema design: user account deletion vs. keeping family tree data Jan Claeys <[email protected]>
@ 2025-11-25 22:02   ` Ron Johnson <[email protected]>
  2025-11-26 22:24     ` Re: Schema design: user account deletion vs. keeping family tree data Christoph Pieper <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Ron Johnson @ 2025-11-25 22:02 UTC (permalink / raw)
  To: pgsql-generallists.postgresql.org <[email protected]>

On Tue, Nov 25, 2025 at 4:08 PM Jan Claeys <[email protected]> wrote:
[snip]

> Genealogy is messy, and you will have to be able to store all sorts of
> data you didn’t expect at first thought (see also the website about
> names Rob Sargent linked to).
> You also seem to make assumptions about relations being 1:1 or 1:N when
> in reality they are very often 1:N or N:N instead.
>

Gramps is a great Python-based genealogy program, which has a Web frontend:
https://github.com/gramps-project/gramps-web

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Schema design: user account deletion vs. keeping family tree data
  2025-11-24 11:27 Schema design: user account deletion vs. keeping family tree data Christoph Pieper <[email protected]>
  2025-11-25 21:08 ` Re: Schema design: user account deletion vs. keeping family tree data Jan Claeys <[email protected]>
  2025-11-25 22:02   ` Re: Schema design: user account deletion vs. keeping family tree data Ron Johnson <[email protected]>
@ 2025-11-26 22:24     ` Christoph Pieper <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Christoph Pieper @ 2025-11-26 22:24 UTC (permalink / raw)
  To: [email protected]

Thanks everybody!

And what do you think about Option A?
Especially the pedigree node approach beside the animal node?

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

end of thread, other threads:[~2025-11-26 22:24 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-11-24 11:27 Schema design: user account deletion vs. keeping family tree data Christoph Pieper <[email protected]>
2025-11-24 13:17 ` Bernice Southey <[email protected]>
2025-11-24 15:43   ` Rob Sargent <[email protected]>
2025-11-24 21:51 ` [email protected]
2025-11-25 21:08 ` Jan Claeys <[email protected]>
2025-11-25 22:02   ` Ron Johnson <[email protected]>
2025-11-26 22:24     ` Christoph Pieper <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox