Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vNeTJ-009xlC-0l for pgsql-general@arkaria.postgresql.org; Mon, 24 Nov 2025 21:51:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vNeTH-004kc7-2g for pgsql-general@arkaria.postgresql.org; Mon, 24 Nov 2025 21:51:44 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vNeTH-004kby-1D for pgsql-general@lists.postgresql.org; Mon, 24 Nov 2025 21:51:43 +0000 Received: from mail-107162.simplelogin.co ([79.135.107.162]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vNeTF-001IuT-1A for pgsql-general@lists.postgresql.org; Mon, 24 Nov 2025 21:51:43 +0000 ARC-Seal: i=1; a=rsa-sha256; d=simplelogin.co; s=arc-20230626; t=1764021100; cv=none; b=pZRzgAXGP8cIPYcMquypzZe+N1DdNSKOdXC+RMcvUl43zTr2WTMzSgiPIgKx595Z8lVhXd8nNHLpayrP4Dik27gnLMuwgjphkdOZxAzrOlgTK+dnJrF/5REK9B57gFEIcYt19gvdCPz/TcrplQjeo/5HthhpF74cg5p1ZjK9c+vPq9It3euhUodKqLn9uZQ75GU5ml8A4Xj8nRD9Vc05pRL4tRl8ugqFz8y/sqo0Vqs3cHPjS0lH3U/Wg+D2AwiP78/I2peDpPEDPhwU32Q/I7UsOMh2mr8vMVk92bOWo7qM7INVJ5beX/rcQR9ngExC5DB6+QG/3LC2SuEFiUgUUA== ARC-Message-Signature: i=1; a=rsa-sha256; d=simplelogin.co; s=arc-20230626; t=1764021100; c=relaxed/simple; bh=fNC6e0liwkMACtlJprlD6iZl+4VJhZm0FEhg7UTTfKA=; h=Date:Subject:In-Reply-To:From:To:References; b=dRX7BGj2Tr/IQmIS7rZsYONAzDF4q7pzzYFOgi/ynuGYAKvUid7EO0d41zEWfVP5pbqcp+JxUYIQg1JGm2J9LFUs9MAa2auDTq6NOrdbg+oWVaJRtR3NzjJnOV0LeYn7RRmB+ZX7NjufWbmyzihyReuAhJZxPhNn0dzffXiu+tiXVecUiTt0juYcnj6YQANxpnj4AgJh184YfAYKxl79xkLFP4p4bmZkRvZ+sFm9B4lZ4031L0vjKeKEAgXPeVtF0wGfMAJtAUonr1QPW1+P1p9zRg0eU+PNkEugHzP+aSiIz9Wb8Po2jHkfZMNNV5TFgsu3jDDjvGn4Em88lxC3JQ== ARC-Authentication-Results: i=1; mail.protonmail.ch DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=georgiou.vip; s=dkim; t=1764021100; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sdKW51NQBAXQsiS7uo7F9QWHORASNMHG8dzBEgnlmJw=; b=xXwUMzY05k1rKokQP5XLKqRFf2RH5SdbRKO9tUifP91rCLtjYt1X8U9GY6pRST9PdXkB+p lSTIa1uQkdnDNMwUsN0SAJhawvJ/ueZvKaP8+rQvO9rPuXbCqUZPo3iQbRR1idX49t8P3w vYOSVg0+9Zc91w1TFPS48hpg0v3zbF4= Content-Type: multipart/alternative; boundary="------------oh8V2QZZ6D3Rz1oWmxC8AAyD" Date: Mon, 24 Nov 2025 16:51:35 -0500 MIME-Version: 1.0 Subject: Re: Schema design: user account deletion vs. keeping family tree data In-Reply-To: Content-Transfer-Encoding: 7bit From: pg254kl@georgiou.vip To: Christoph Pieper ,"pgsql-generallists.postgresql.org" Message-ID: <176402110004.8.5805411983290632546.1025420735@georgiou.vip> References: X-SimpleLogin-Type: Reply X-SimpleLogin-EmailLog-ID: 1025420736 X-SimpleLogin-Want-Signing: yes List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------oh8V2QZZ6D3Rz1oWmxC8AAyD Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Option B would be fine with me, unless there is good reason to normalize it further.  A query using recursive CTE would be able to find ancestors and descendants neatly and efficiently. I deal with some tables in the billions of rows, and with that hat on, I would use int/bigint identity for the PKs instead of UUIDs (less storage, smaller indices, faster joins).  I would have a boolean 'active' column to handle soft deletes, along with created_at and disabled_at timestamptz columns maintained by triggers.  I would use composite partitioning, first level partition by list on 'active', and second level partition by range on the id PK with the range being a few million.  If for some reason you have to use UUIDs, use time-based UUIDv7 (native on PostgreSQL v18) so you can range partition. -- regards, Kiriakos Georgiou On 11/24/25 6:27 AM, Christoph Pieper - christoph at fecra.de wrote: > Hi, > > I’m designing a schema for a family‑tree web app on PostgreSQL. Users > register accounts and can create one or more family trees. Each tree > consists of persons (the user themself, relatives, ancestors). Many > persons in a tree will never have an account (e.g. > great‑grandparents). Because of GDPR, when a user deletes their > account we must remove/anonymise their user profile, but we want to > keep the family tree data intact so that other users can still > reference those ancestors. > > We expect hundreds of thousands to millions of persons and deep > ancestry queries (N generations, inbreeding/relationship calculations). > I’m hesitating between two schema designs: > > *Option A – Separate family_tree_node table* > > create table app_user ( >   id          uuid primary key, >   email       text unique not null, >   created_at  timestamptz not null default now() > ); > > create table person ( >   id                 uuid primary key, >   created_by_user_id uuid references app_user(id) on delete set null, >   first_name         text, >   last_name          text, >   birth_date         date >   -- more non-account-specific attributes may be added her in future! > ); > > create table family_tree ( >   id            uuid primary key, >   owner_user_id uuid not null references app_user(id) on delete cascade, >   created_at    timestamptz not null default now() > ); > > create table family_tree_node ( >   id              uuid primary key, >   family_tree_id  uuid not null references family_tree(id) on delete > cascade, >   person_id       uuid references person(id) on delete set null, >   father_node_id  uuid references family_tree_node(id), >   mother_node_id  uuid references family_tree_node(id) > ); > > create index on family_tree_node (family_tree_id); > create index on family_tree_node (person_id); > create index on family_tree_node (father_node_id); > create index on family_tree_node (mother_node_id); > > Here family_tree_node is the structural graph for a specific tree. A > node may point to a person, but can also exist without one (minimal > data only). If a user/account is deleted, we only drop/anonymise data > in app_user (and optionally created_by_user_id), while person and > family_tree_node remain. > > *Option B – Use person directly as the graph node (soft delete)* > > create table app_user ( >   id          uuid primary key, >   email       text unique not null, >   created_at  timestamptz not null default now() > ); > > create table person ( >   id                 uuid primary key, >   created_by_user_id uuid references app_user(id) on delete set null, >   first_name         text, >   last_name          text, >   birth_date         date, >   father_id          uuid references person(id), >   mother_id          uuid references person(id), >   deleted_at         timestamptz     -- soft delete flag > ); > > create index on person (father_id); > create index on person (mother_id); > create index on person (deleted_at); > > In this model, the pedigree graph is just a person(father_id, > mother_id). When a user deletes their account we never hard‑delete > persons; instead we set deleted_at and/or anonymise some fields. All > queries must filter on deleted_at is null to hide soft‑deleted persons. > > Question: > From a PostgreSQL point of view (database best practices, data > integrity, performance and long‑term maintainability at millions of > rows), which approach would you prefer, or is there a better pattern > for this kind of “account can be deleted, but genealogy should remain” > use case? > > Regards and many thanks! > Christoph > > > > -- > fecra company logo > > *Christoph Pieper* > > christoph@fecra.de > > fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland > > www.fecra.de   | HRB 268518 B > --------------oh8V2QZZ6D3Rz1oWmxC8AAyD Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Option B would be fine with me, unless there is good reason to normalize it further.  A query using recursive CTE would be able to find ancestors and descendants neatly and efficiently.

I deal with some tables in the billions of rows, and with that hat on, I would use int/bigint identity for the PKs instead of UUIDs (less storage, smaller indices, faster joins).  I would have a boolean 'active' column to handle soft deletes, along with created_at and disabled_at timestamptz columns maintained by triggers.  I would use composite partitioning, first level partition by list on 'active', and second level partition by range on the id PK with the range being a few million.  If for some reason you have to use UUIDs, use time-based UUIDv7 (native on PostgreSQL v18) so you can range partition.

-- 
regards,
Kiriakos Georgiou


On 11/24/25 6:27 AM, Christoph Pieper - christoph at fecra.de wrote:
Hi,

I’m designing a schema for a family‑tree web app on PostgreSQL. Users register accounts and can create one or more family trees. Each tree consists of persons (the user themself, relatives, ancestors). Many persons in a tree will never have an account (e.g. great‑grandparents). Because of GDPR, when a user deletes their account we must remove/anonymise their user profile, but we want to keep the family tree data intact so that other users can still reference those ancestors.

We expect hundreds of thousands to millions of persons and deep ancestry queries (N generations, inbreeding/relationship calculations).
I’m hesitating between two schema designs:

Option A – Separate family_tree_node table

create table app_user (
  id          uuid primary key,
  email       text unique not null,
  created_at  timestamptz not null default now()
);

create table person (
  id                 uuid primary key,
  created_by_user_id uuid references app_user(id) on delete set null,
  first_name         text,
  last_name          text,
  birth_date         date
  -- more non-account-specific attributes may be added her in future!
);

create table family_tree (
  id            uuid primary key,
  owner_user_id uuid not null references app_user(id) on delete cascade,
  created_at    timestamptz not null default now()
);

create table family_tree_node (
  id              uuid primary key,
  family_tree_id  uuid not null references family_tree(id) on delete cascade,
  person_id       uuid references person(id) on delete set null,
  father_node_id  uuid references family_tree_node(id),
  mother_node_id  uuid references family_tree_node(id)
);

create index on family_tree_node (family_tree_id);
create index on family_tree_node (person_id);
create index on family_tree_node (father_node_id);
create index on family_tree_node (mother_node_id);


Here family_tree_node is the structural graph for a specific tree. A node may point to a person, but can also exist without one (minimal data only). If a user/account is deleted, we only drop/anonymise data in app_user (and optionally created_by_user_id), while person and family_tree_node remain.

Option B – Use person directly as the graph node (soft delete)

create table app_user (
  id          uuid primary key,
  email       text unique not null,
  created_at  timestamptz not null default now()
);

create table person (
  id                 uuid primary key,
  created_by_user_id uuid references app_user(id) on delete set null,
  first_name         text,
  last_name          text,
  birth_date         date,
  father_id          uuid references person(id),
  mother_id          uuid references person(id),
  deleted_at         timestamptz     -- soft delete flag
);

create index on person (father_id);
create index on person (mother_id);
create index on person (deleted_at);


In this model, the pedigree graph is just a person(father_id, mother_id). When a user deletes their account we never hard‑delete persons; instead we set deleted_at and/or anonymise some fields. All queries must filter on deleted_at is null to hide soft‑deleted persons.

Question:
From a PostgreSQL point of view (database best practices, data integrity, performance and long‑term maintainability at millions of rows), which approach would you prefer, or is there a better pattern for this kind of “account can be deleted, but genealogy should remain” use case?

Regards and many thanks!
Christoph



--
fecra company logo

Christoph Pieper

christoph@fecra.de

fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland

www.fecra.de  | HRB 268518 B


  


--------------oh8V2QZZ6D3Rz1oWmxC8AAyD--