Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vNUkJ-004PRg-1C for pgsql-general@arkaria.postgresql.org; Mon, 24 Nov 2025 11:28:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vNUkH-000YsV-2Z for pgsql-general@arkaria.postgresql.org; Mon, 24 Nov 2025 11:28:38 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vNUkH-000YsM-11 for pgsql-general@lists.postgresql.org; Mon, 24 Nov 2025 11:28:37 +0000 Received: from mail-ed1-x534.google.com ([2a00:1450:4864:20::534]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vNUkE-001DoR-2h for pgsql-general@lists.postgresql.org; Mon, 24 Nov 2025 11:28:37 +0000 Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-640ca678745so6949378a12.2 for ; Mon, 24 Nov 2025 03:28:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fecra-de.20230601.gappssmtp.com; s=20230601; t=1763983713; x=1764588513; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=jMo1/mFevIKYz8fU54HeUqwjDZI0MDXGcYrTjILZDfU=; b=LfkssnLHvQ6kvVV/v1XfQAMwgorPuu34qAACbze3fTNI6YT7KtGGGI8Bm6lLvmjBu3 NLqyfO1Ijx7x2iRN7ipIjyqUA6mS/FoXVs24i47FrfgXcFzYNo/hdUXO0HFspUNOI5ND rGtr+zQqsGf+natrdhmKTQR188YmVFKBe69HTK1yYUTC8eR8Spd0Mu3BCqGW1RflsApL GTQvNCxSgosBbJ1Ywi9Kq0Pp0wJhPOx+CuisamNaDNB1UHgHpQX1b0/nWovL+apMmuJd VzrM2jnRdg8x9Un/E1+UPTfO9H/0b2HhZXmfWT6Ei4ZndzkQcZp2ZfrqKVJk92hf54PX pg+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763983713; x=1764588513; h=to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jMo1/mFevIKYz8fU54HeUqwjDZI0MDXGcYrTjILZDfU=; b=OoLTiNycYbOyL7NTkngk3Wo6e6foKV7aqyZ4HRQ9Z0cuO6GVInyIRt86uNZ7MLGvTU 0Rz+sAgw7t4TwFNtvlixmoY9NtIvSzn/kzqOUZAFUS17UVxWA67G1FHsuMuLUA6otYXI f2saqy2EpcN3YZ3X+j3ncHDoHUeL60ecuFnBFbkuQKwb1J9IA/UQR5Kba6pMFICqFdRD BQ65aVFwAPqgYXAie8G7VwoI/bi2VtFBDbESlSj1sb0N4klP/BT1bMrejhLDIKHieo4y //ynMU9QiQoT8vwyc6sSPeO6A+Lz4m7GNh/DPvxBUd3i/CvzUwqAA6OqO8bCS/JnLTPQ 6cjw== X-Gm-Message-State: AOJu0YxUc+YLzw+U1gxgh1OkIs6QYqykbR3vdVRi3g81qaNVWKesWuIf UZHLjnBWz2Y1Iv3hyvW+s/veWzzK0wsMiFXC2ThzMHstgSTSNARJrxtbIRHrut4+KnT3aqgxQKj IdUNEzr9U1TQNYatliJ79hiLVc2hwu5IKByzBDEQLyAO7zWP3sPuk5xWNZQ== X-Gm-Gg: ASbGnct1ENdc5pd2970by1OOgMdeZAeVvIGbaAA+cDMqAzrlWK78dngSPaDjDyOruCJ mkHW9ZFjiLS3o5NvxhPxhpGzBe9a5LQDcGQQOe4HxRuEs6YU+ShE45XCMe14QvwnN2YLIP123E+ UWOVqL6a4CKiTmCJVQH78nD9gMYatO2RwqDVqrzidHgqXPYWFc/7KVrz+aNLIiZJ+wpAQyY6MZJ E4T81b4i1cFg6opDGHIH5+pE28P+ggt0R3QDjTSEGP9fRx+MWobve7Rh+7BEd/3wzygMVs+egDu TfC0zm9O6Lqs1J+VwIiMHXjM90c= X-Google-Smtp-Source: AGHT+IFmyuwX+XDYH+r7fgeV4otLXQQMg2GH/WLrxWSnzIf0UhF59NDNpcvdkRPTqXwPOvJrSRiRCpB4vch8nMFSJt0= X-Received: by 2002:a17:907:3f1c:b0:b70:ae6a:5fdb with SMTP id a640c23a62f3a-b7671731e9emr1195402366b.45.1763983712876; Mon, 24 Nov 2025 03:28:32 -0800 (PST) MIME-Version: 1.0 From: Christoph Pieper Date: Mon, 24 Nov 2025 12:27:56 +0100 X-Gm-Features: AWmQ_bk444NahFaTRMR-WX6opyPXKWJzoRwMtktLmz7M9wRT3mESMGciWdCWWHU Message-ID: Subject: Schema design: user account deletion vs. keeping family tree data To: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000009887a2064455761f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000009887a2064455761f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, I=E2=80=99m designing a schema for a family=E2=80=91tree web app on Postgre= SQL. Users register accounts and can create one or more family trees. Each tree consists of persons (the user themself, relatives, ancestors). Many persons in a tree will never have an account (e.g. great=E2=80=91grandparents). Bec= ause of GDPR, when a user deletes their account we must remove/anonymise their user profile, but we want to keep the family tree data intact so that other users can still reference those ancestors. We expect hundreds of thousands to millions of persons and deep ancestry queries (N generations, inbreeding/relationship calculations). I=E2=80=99m hesitating between two schema designs: *Option A =E2=80=93 Separate family_tree_node table* create table app_user ( id uuid primary key, email text unique not null, created_at timestamptz not null default now() ); create table person ( id uuid primary key, created_by_user_id uuid references app_user(id) on delete set null, first_name text, last_name text, birth_date date -- more non-account-specific attributes may be added her in future! ); create table family_tree ( id uuid primary key, owner_user_id uuid not null references app_user(id) on delete cascade, created_at timestamptz not null default now() ); create table family_tree_node ( id uuid primary key, family_tree_id uuid not null references family_tree(id) on delete cascade, person_id uuid references person(id) on delete set null, father_node_id uuid references family_tree_node(id), mother_node_id uuid references family_tree_node(id) ); create index on family_tree_node (family_tree_id); create index on family_tree_node (person_id); create index on family_tree_node (father_node_id); create index on family_tree_node (mother_node_id); Here family_tree_node is the structural graph for a specific tree. A node may point to a person, but can also exist without one (minimal data only). If a user/account is deleted, we only drop/anonymise data in app_user (and optionally created_by_user_id), while person and family_tree_node remain. *Option B =E2=80=93 Use person directly as the graph node (soft delete)* create table app_user ( id uuid primary key, email text unique not null, created_at timestamptz not null default now() ); create table person ( id uuid primary key, created_by_user_id uuid references app_user(id) on delete set null, first_name text, last_name text, birth_date date, father_id uuid references person(id), mother_id uuid references person(id), deleted_at timestamptz -- soft delete flag ); create index on person (father_id); create index on person (mother_id); create index on person (deleted_at); In this model, the pedigree graph is just a person(father_id, mother_id). When a user deletes their account we never hard=E2=80=91delete persons; ins= tead we set deleted_at and/or anonymise some fields. All queries must filter on deleted_at is null to hide soft=E2=80=91deleted persons. Question: From a PostgreSQL point of view (database best practices, data integrity, performance and long=E2=80=91term maintainability at millions of rows), whi= ch approach would you prefer, or is there a better pattern for this kind of =E2=80=9Caccount can be deleted, but genealogy should remain=E2=80=9D use c= ase? Regards and many thanks! Christoph --=20 [image: fecra company logo] *Christoph Pieper* christoph@fecra.de fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland www.fecra.de | HRB 268518 B --0000000000009887a2064455761f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

I=E2=80=99m designing a schema for a famil= y=E2=80=91tree web app on PostgreSQL. Users register accounts and can creat= e one or more family trees. Each tree consists of persons (the user themsel= f, relatives, ancestors). Many persons in a tree will never have an account= (e.g. great=E2=80=91grandparents). Because of GDPR, when a user deletes th= eir account we must remove/anonymise their user profile, but we want to kee= p the family tree data intact so that other users can still reference those= ancestors.

We expect hundreds of thousands to millions of persons a= nd deep ancestry queries (N generations, inbreeding/relationship calculatio= ns).
I=E2=80=99m hesitating between two schema designs:

Option A =E2=80=93 Separate family_tree_node table

create table app_user (
=C2=A0 id = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uuid primary key,
=C2=A0 email =C2=A0 = =C2=A0 =C2=A0 text unique not null,
=C2=A0 created_at =C2=A0timestamptz = not null default now()
);

create table person (
=C2=A0 id =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 uuid primary key,
= =C2=A0 created_by_user_id uuid references app_user(id) on delete set null,<= br>=C2=A0 first_name =C2=A0 =C2=A0 =C2=A0 =C2=A0 text,
=C2=A0 last_name = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0text,
=C2=A0 birth_date =C2=A0 =C2=A0 = =C2=A0 =C2=A0 date
=C2=A0 -- more non-account-specific attributes may be= added her in future!
);

create table family_tree (
=C2=A0 id = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uuid primary key,
=C2=A0 owner_= user_id uuid not null references app_user(id) on delete cascade,
=C2=A0 = created_at =C2=A0 =C2=A0timestamptz not null default now()
);

cre= ate table family_tree_node (
=C2=A0 id =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0uuid primary key,
=C2=A0 family_tree_id =C2=A0uuid not = null references family_tree(id) on delete cascade,
=C2=A0 person_id =C2= =A0 =C2=A0 =C2=A0 uuid references person(id) on delete set null,
=C2=A0 = father_node_id =C2=A0uuid references family_tree_node(id),
=C2=A0 mother= _node_id =C2=A0uuid references family_tree_node(id)
);

create ind= ex on family_tree_node (family_tree_id);
create index on family_tree_nod= e (person_id);
create index on family_tree_node (father_node_id);
cre= ate index on family_tree_node (mother_node_id);


Here family_t= ree_node is the structural graph for a specific tree. A node may point to a= person, but can also exist without one (minimal data only). If a user/acco= unt is deleted, we only drop/anonymise data in app_user (and optionally cre= ated_by_user_id), while person and family_tree_node remain.

<= /div>
Option B =E2=80=93 Use person directly as the graph node (soft= delete)

create table app_user (=
=C2=A0 id =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uuid primary key,
=C2=A0= email =C2=A0 =C2=A0 =C2=A0 text unique not null,
=C2=A0 created_at =C2= =A0timestamptz not null default now()
);

create table person (=C2=A0 id =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 uuid pri= mary key,
=C2=A0 created_by_user_id uuid references app_user(id) on dele= te set null,
=C2=A0 first_name =C2=A0 =C2=A0 =C2=A0 =C2=A0 text,
=C2= =A0 last_name =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0text,
=C2=A0 birth_date = =C2=A0 =C2=A0 =C2=A0 =C2=A0 date,
=C2=A0 father_id =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0uuid references person(id),
=C2=A0 mother_id =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0uuid references person(id),
=C2=A0 deleted_at =C2=A0= =C2=A0 =C2=A0 =C2=A0 timestamptz =C2=A0 =C2=A0 -- soft delete flag
);
create index on person (father_id);
create index on person (mother= _id);
create index on person (deleted_at);


In this model, = the pedigree graph is just a person(father_id, mother_id). When a user dele= tes their account we never hard=E2=80=91delete persons; instead we set dele= ted_at and/or anonymise some fields. All queries must filter on deleted_at = is null to hide soft=E2=80=91deleted persons.

Question:
Fr= om a PostgreSQL point of view (database best practices, data integrity, per= formance and long=E2=80=91term maintainability at millions of rows), which = approach would you prefer, or is there a better pattern for this kind of = =E2=80=9Caccount can be deleted, but genealogy should remain=E2=80=9D use c= ase?

Regards and many thanks!
Christoph<= /div>



--

Christ= oph Pieper

c= hristoph@fecra.de

fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland

www.fecra.de=C2=A0 |=C2=A0HRB 268518 B

--0000000000009887a2064455761f--