Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1skawG-001NOK-Nk for pgsql-general@arkaria.postgresql.org; Sun, 01 Sep 2024 03:07:41 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1skavF-007mJr-Sn for pgsql-general@arkaria.postgresql.org; Sun, 01 Sep 2024 03:06:38 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1skavF-007mJi-B2 for pgsql-general@lists.postgresql.org; Sun, 01 Sep 2024 03:06:37 +0000 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1skavC-0004RV-Qq for pgsql-general@lists.postgresql.org; Sun, 01 Sep 2024 03:06:36 +0000 Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-a868831216cso354938966b.3 for ; Sat, 31 Aug 2024 20:06:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725159992; x=1725764792; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yjfhFdDChqzT7+oA40//+Ux70DZs6w5t69VhaVL4u28=; b=DPw54ksGr8TkV7NaLYJSEYcNrz8HtfoTF+l8fzteN9JIKt0lUiHNuutwovPW6Hr+kX jXUefoXEkBxawXeWFsal3O6Zy5wJQ1psnB9p2CGIW+wUYqsFs3qST3Hpcg0Liz0H82yy IL6HCIkEpi2d+bxELU22e/0n2SseGv0oPQ3kukMErz8RvKS9o13vd/vy8B5ZxEkW92xB KA9Nqm/78M20txxwzkrdoCoiqWGyaxB7O4stKR8dMnddcRWtW4iBpcL5yhumREN66v9d 49MkCpFLiz0HomVb/hnDZBDLiPvDn5AH/M2Gv/dgZXGykZYjen8QxJnMR9dL1vJcDFen Veqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725159992; x=1725764792; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yjfhFdDChqzT7+oA40//+Ux70DZs6w5t69VhaVL4u28=; b=oXlskZ9bpdHRMBo4hukuhDk/oVyyX+0J199XGWuMKh/uZhOd1bTIerkp3fRIV6NNrn dj38F3ZGd0U95Q28DVH1MfLdzEgYPGrFwR+aPPm6FufIrm7S4bOyPzQm27jJEXH6mazL i718m6LmCwiJZUGB+uY+yjakGrw+8k9Iz3QvoGikvMl3p8Zm1Su90nrEZSiY3wrV1cRe P1I3dlgqqA1sJKqS7skSk8uMPV69Mm5kvWoRLDfkSyvUxx4qDJ23xd61VRqVXJprYgsJ S6XXa5XNAU9X4fVCfg7gbNMmPTyicscsdy+Zoi2aKii6XB+QubxODuBk+mRp+6AOciiI +Dwg== X-Gm-Message-State: AOJu0Yx/MVq+wTiU39uKKYwtd7FmycCgxBSqjKRFIcMm9woYpZXkkFeo TscHifsi997YG/pJCi72aXAd5+A31gvCHJNPZteRN4rBzKR2Qh/49Yz1X3zQtjF+bjPDyGKlo2R l1gbEnJfOZxN78hvpOwi/cJczeUnDyHJh X-Google-Smtp-Source: AGHT+IGafRJNcfqp9ODzSwapisA7Ub9WDXpX6f7J5JVXvtq1vGlZwxbPYN7V41uZCStvMirkX/13yyPtPHWZoqxI5TI= X-Received: by 2002:a17:907:96a5:b0:a7a:8e98:8911 with SMTP id a640c23a62f3a-a89b956c3d4mr246463266b.38.1725159991447; Sat, 31 Aug 2024 20:06:31 -0700 (PDT) MIME-Version: 1.0 References: <7fa81130-0b33-4e53-bb32-39d84f06c680@aklaver.com> In-Reply-To: <7fa81130-0b33-4e53-bb32-39d84f06c680@aklaver.com> From: veem v Date: Sun, 1 Sep 2024 08:36:19 +0530 Message-ID: Subject: Re: Partitioning and unique key To: Adrian Klaver Cc: pgsql-general Content-Type: multipart/alternative; boundary="0000000000007894c20621061c5c" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000007894c20621061c5c Content-Type: text/plain; charset="UTF-8" On Sun, 1 Sept 2024 at 03:58, Adrian Klaver wrote: > > The model is at odds with itself and untenable. If the tables hold > multiple rows for a given transaction_id then you cannot have a > PK/Unique constraint on that column. Seems there is a decided lack of > any planning. The only way I can see this happening is consolidating all > the duplicate transaction_id rows into a single row for each > transaction_id. That then leads to the question of how to do that and > retain the 'correct' information from the selection of rows for each > transaction_id. > > Yes we had messed up the data for now and have multiple records for each transaction_id persisted and thus we need to fix the data. But more than that , as I stated , I wanted to understand first 1) if it's technically possible to have a unique key on only the transaction_id column having the partition key on the transaction_timestamp, because the table is going to be queried/purged based on the transaction_timestamp? 2) Additionally we were thinking if above is technically not possible, then the maximum granularity which we can have for each transaction_id will be a day, so the partition key transaction_timestmp can be truncated to have only date component but no time component. So the primary key will be (transaction_id, transaction_date). But we also don't want to lose the time component and persist the existing data of transaction_timestmp (which will have a time component in it, in a separate column). And in above case , for fixing the existing data in least disruptive way, as we have currently duplicate transaction_id inserted into the table already because of the composite primary key(transaction_id, transaction_timestmp).Can we simply i)rename the existing column transaction_timestmp to transaction_date and then add new column transaction_timestmp using the values of existing column partition by partition. ii)And then delete the duplicate data using query something as below , each partition by partition. iii)And then alter the datatype of the partition key transaction_date to DATE in one shot at the table level(which should be fast as its having more granularity as compare to existing timestamptype, so should be catalog or dictionary change only), and that will remain the part of composite PK (transaction_id,transaction_date). iv) Repeat this step for all child partition tables and then for the parent partition tables. Will this technique be the most efficient way of fixing this mess? WITH ranked_records AS ( SELECT column1_id, column2_timestamptz, ROW_NUMBER() OVER (PARTITION BY column1_id, date_trunc('day', column2_timestamptz) ORDER BY column2_timestamptz DESC) AS rn FROM partition_name ) DELETE FROM partition_name T1 WHERE EXISTS ( SELECT 1 FROM ranked_records T2 WHERE T1.column1_id = T2.column1_id AND T1.column2_timestamptz = T2.column2_timestamptz AND T2.rn > 1 ) --0000000000007894c20621061c5c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Sun, 1 Sept 2024 at 03:58, Adrian Klav= er <adrian.klaver@aklaver.c= om> wrote: