Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sjoPS-004Gfl-Dd for pgsql-general@arkaria.postgresql.org; Thu, 29 Aug 2024 23:18:34 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sjoPQ-009GIC-Jb for pgsql-general@arkaria.postgresql.org; Thu, 29 Aug 2024 23:18:33 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sjoPQ-009GHx-6H for pgsql-general@lists.postgresql.org; Thu, 29 Aug 2024 23:18:32 +0000 Received: from mail-lj1-x233.google.com ([2a00:1450:4864:20::233]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sjoPN-0029p2-4r for pgsql-general@lists.postgresql.org; Thu, 29 Aug 2024 23:18:32 +0000 Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-2f3fea6a0a9so11432451fa.0 for ; Thu, 29 Aug 2024 16:18:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724973508; x=1725578308; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=/fQCWXNDBP8Y0X+PCmZuwP5IzvSqORlfdj9eEfZy0mE=; b=kXarIr0OTUjK7oOcAmB/dQ+bKhiqLjZtG2wtBDabO6wJbV9HH7zLeTQi03XL4wsAFw 0E9qzwxIZRhNAksyo6m2BBON7D3w7ga4JErU79w7IM2vWl/5sqjdL3I7prPc9nVOkLFz B0CL/qiBRLfb0GOFvVptm5x387kWicdNz/k2dikHb09FaBLuThaMmqwcpeh/MzkjH+D5 h5zqi7AYICyLoS6eDodv2AwB/zBVogJLOzHNQ6r7uMDFQNuySokgL8RRtCRph+4EvFn9 vJGMPKULgDDTySwSSVxnGGcDk6cTbA566GtyHvT4r6vU2aDlP1dihRjQ/CVFpKa8zkM3 9bYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724973508; x=1725578308; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/fQCWXNDBP8Y0X+PCmZuwP5IzvSqORlfdj9eEfZy0mE=; b=qW/WJjexiiBtl5dhbNnFwRoqmUUIrjFXlnhsdIcM2lcdMweQPZfCs9SO8waN1sMSFW UFQaN8KyJRyPkvFosBT7RJRGVkxwGHYeEpSR7OGWqblY53M9OKFb0hx6cnwJaveUwTJt 25VoGK23GypDU2rA5LcRHZKvGTMWgpWCkzLJkLFkBDiN5GzeURFXJd+34Sn7K0fT5gnM J/OmNBFYZhEDmDQ2wVMt5WYYUWJHoXXE5Q/9AGQtFRzITfi0nLsS84vEYyI40ht+pETh 9AM+DvCy75SLArnH6fIh5LATTcNO5lXn5gtlY4sBbY7Rp4Y3gbW98+5YrIh09rHXlAoq 8G+A== X-Gm-Message-State: AOJu0YxAmFJFrn/pDPW3MTRRxpJpONJFd/lOni9S9Pb65g3i9qMjXcv0 tQNzby5zBqWneLKv6XbOvhrzzJPhk26xIAFqzcB0LFyyj7jvjwliAJNTCYBRCCMmi4h2alenE8J GG9ARprizxtGZvCqNcgRY7uc9NKNrxZd/ X-Google-Smtp-Source: AGHT+IGVf/7rSvaST/WLGaES3rvrE7R2d8EhzjOpcVUHN9J+kQqAIvjCW4m01yYKES/E4G5nhixcGANeb2M3E01/07o= X-Received: by 2002:a2e:bea2:0:b0:2f5:66a:627 with SMTP id 38308e7fff4ca-2f6127e0b23mr14563951fa.0.1724973507566; Thu, 29 Aug 2024 16:18:27 -0700 (PDT) MIME-Version: 1.0 From: Rumpi Gravenstein Date: Thu, 29 Aug 2024 19:18:16 -0400 Message-ID: Subject: Analytic Function Bug To: PostgreSQL Content-Type: multipart/alternative; boundary="0000000000002a64f00620dab1e2" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000002a64f00620dab1e2 Content-Type: text/plain; charset="UTF-8" Experts, I am running on PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit I have the following query which returns what I expect: with d(logical_partition_key, model_usage) as ( values( 'TEST_DATA' , 'F(T61)(EXPORT)' ) ) , usg_txt as ( SELECT DISTINCT logical_partition_key, MODEL_USAGE as usage_text, REPLACE ( REPLACE ( REPLACE ( REPLACE ( PIU.MODEL_USAGE::text, '', ''), '', ''), '', ''), '', '') AS txt FROM d /*CAO_CALLOUT_GHOST_COMB_W*/ piu ) , parse( logical_partition_key, usage_text,txt, rpo_txt, indx ) as ( select d.logical_partition_key, d.usage_text,d.txt, coalesce(a.rpo[1],a.rpo[ 2]) as rpo_txt, a.pos from usg_txt d left join lateral regexp_matches( txt, '([ ,\-()/&])|([^ ,\-()/&]+)','g') with ordinality as a(rpo,pos) on true ) , prv_nxt_token( logical_partition_key, usage_text,txt, rpo_txt, indx, mx_indx, prev,nxt,nxt2, prv2,prv3) as ( /* Get prior and next token to support later logic */ select p.logical_partition_key, p.usage_text, p.txt, p.rpo_txt, indx, max( indx) over ( partition by p.txt ) mx_indx, lag( p.rpo_txt,1 ) over ( partition by p.logical_partition_key,p.txt order by indx ) prev, lag( p.rpo_txt,-1 ) over ( partition by p.logical_partition_key,p.txt order by indx ) nxt, lag( p.rpo_txt,-2) over ( partition by p.logical_partition_key,p.txt order by indx ) nxt2, lag( p.rpo_txt,2 ) over ( partition by p.logical_partition_key,p.txt order by indx ) prv2, lag( p.rpo_txt,3 ) over ( partition by p.logical_partition_key,p.txt order by indx ) prv3 from parse p ) select * from prv_nxt_token; logical_partition_keyusage_texttxtrpo_txtindxmx_indxprevnxtnxt2prv2prv3 "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "F" "1" "7" "[NULL]" "(" "T61" "[NULL]" "[NULL]" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "(" "2" "7" "F" "T61" ")" "[NULL]" "[NULL]" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "T61" "3" "7" "(" ")" "(" "F" "[NULL]" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" ")" "4" "7" "T61" "(" "EXPORT" "(" "F" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "(" "5" "7" ")" "EXPORT" ")" "T61" "(" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "EXPORT" "6" "7" "(" ")" "[NULL]" ")" "T61" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" ")" "7" "7" "EXPORT" "[NULL]" "[NULL]" "(" ")" When I run the same statement, except this time using a large table instead of a values statement, I get the wrong answer. The difference in the SQL statement that produced the following data is that the large table (10M records) and there is a closing where condition used to limit the result to what is shown: select * from prv_nxt_token; is replaced by: select * from prv_nxt_token where logical_partition_key='TEST_DATA' and usage_text='F(T61)(EXPORT)'; Which returns: logical_partition_keyusage_texttxtrpo_txtindxmx_indxprvnxtnxt2prv2prv3 "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "F" "1" "7" *"F"* "(" "(" "[NULL]" "[NULL]" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "(" "2" "7" "F" "(" "T61" "F" "[NULL]" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "T61" "3" "7" *"T61"* ")" ")" "(" "(" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" ")" "4" "7" "T61" ")" "(" "T61" "(" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "(" "5" "7" "*("* "EXPORT" "EXPORT" ")" ")" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "EXPORT" "6" "7" "(" "EXPORT" ")" "(" ")" "TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" ")" "7" "7" ")" "[NULL]" "[NULL]" "EXPORT" "EXPORT" Notice that the prv column (lag - 1 ) is just wrong. I've highlighted obvious bad values. Other columns are wrong as well. Is this a PostgreSQL bug? -- Rumpi Gravenstein --0000000000002a64f00620dab1e2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Experts,

I am running on=C2=A0

PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (= GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit

I h= ave the following query which returns what I expect:


with

d(logical_partition_key, model_usage) as ( values( 'TEST_DATA' ,'F(T61)(EXPORT)' )

)

,

usg_txt as (

SELECT DISTINCT logical_partition_key,

MODEL_USAGE as usage_text,

REPLACE ( REPLACE ( REPLACE ( REPLACE ( PIU.MODEL_USAGE::text, '<t context=3D"USAGE_TEXT">', ''= ), '<t context= =3D"FCN_NAME_MODFR">', ''), '<t context=3D"FCN_USAGE_MODFR">&#= 39;, ''<= /span>), '</t>= ;', '= 9;) AS txt

FROM d /*CAO_CALLOUT_GHOST_C= OMB_W*/ piu

)

,

parse( logical_partition_key, = usage_text,txt, rpo_txt, indx ) as

(

select d.logical_partition_key, d.usage_text,d.txt, coales= ce(a.= rpo[1],a.rpo[2]) as= rpo_txt, a.pos

from usg_txt = d

left join lateral regexp_matches( txt, '([ ,\-()/&])|([^ ,\-()/&]+)','g') with ordinality as a(rpo,pos) on true

)

,

prv_nxt_token( logical_partition_key, usage_text,txt, rpo_txt, indx, = mx_indx, prev,nxt,nxt2, prv2,prv3) as

(

/* Get prior and next token to support later logic */

select p.logical_partition_key, p.usage_text,

p.txt, p.rpo_txt, indx,

max( indx) over ( partition by p.txt ) mx_indx,

= lag( p.rpo_t= xt,1 ) over ( partition by p.logical_partition_key,p.txt order by indx ) prev,

lag( p.rpo_txt,-1 ) over ( partition by p.logical_partition_key,p.txt order by indx ) nxt,

lag( p.rpo_txt,-2)= over ( partition by p.logical_partition_key,p.t= xt order by indx ) nxt2, =

lag( p.rpo_txt,2 ) over ( partition by<= /span> p.logical_partition_key,p.txt order by indx ) prv2,

lag( p.rpo_txt,3 ) over ( partition by= p.logical_partition_key,p.txt order by indx ) prv3=

from parse p

)

select * from prv_nxt_token;


= <= td>")" <= td>"(" <= td>"EXPORT"
logical_partition_keyusage_texttxtrpo_txtindxmx_indxprevnxtnxt2pr= v2prv3
"TEST_DATA" "F(T61)(EXPORT)" "F(T61)(EXPORT)" "F" "1&q= uot; "7" "[NULL]" "(&qu= ot; "T61" "[NULL]" "[NU= LL]"
"TEST_DATA" "F(T61)(EXPORT)" = "F(T61)(EXPORT)" "(" "2" "7" "F" "T61" "[NULL]" "[NULL]"
"TEST_DATA" "F(T61)(EXPORT)" = "F(T61)(EXPORT)" "T61" "3"<= /td> "7" "(" ")" "F" "[NULL]"
"TEST_DATA" "F(T61)(EXPORT)" = "F(T61)(EXPORT)" ")" "4" "7" "T61" "(" "(" "F"
"TEST_DATA" "F(T61)(EXPORT)" = "F(T61)(EXPORT)" "(" "5" "7" ")" "EXPORT" ")" "T61" "("
"TEST_DATA" "F(T61)(EXPORT)" = "F(T61)(EXPORT)" "EXPORT" "6&quo= t; "7" "(" ")" "[NULL]" ")" "T61"
"TEST_DATA" "F(T61)(EXPORT)" = "F(T61)(EXPORT)" ")" "7" "7" "EXPORT" "[NULL]"= ; "[NULL]" "(" ")"=


=
When I run the same statement, except this time using a large ta= ble instead of a values statement, I get the wrong answer.=C2=A0 The differ= ence in the SQL statement that produced the following data is that the larg= e table=C2=A0(10M records) and there is a closing where condition used to l= imit the result to what is shown:

select * from prv_nxt_token;
is replaced by:

select * from prv_nxt_token where logical_partition_key=3D'TEST_DATA' and usage_text=3D'F(T61)(EXPORT)';

=
Which returns:
indx <= td>"[NULL]"
= logical_partition_keyusage_texttxtrpo_txtmx_indxprvnxtnxt2prv2prv3
"TEST_DATA"
"F(T61)(EXPORT)= " "F(T61)(EXPORT)" "F" = "1" "7" &q= uot;F" "(" "(" "[NULL]"
"TEST_DATA"
"F(T61)(EXPORT)"<= /td> "F(T61)(EXPORT)" "(" "2= " "7" "F" "("= "T61" "F" "[NULL]"= ;
"TEST_DATA"
"F(T61)(EXPORT)"<= /td> "F(T61)(EXPORT)" "T61" "= ;3" "7" "T= 61" ")" ")" = "(" "("
"TEST_DATA"
"F(T61)(EXPORT)"<= /td> "F(T61)(EXPORT)" ")" "4= " "7" "T61" ")&quo= t; "(" "T61" "("
"TEST_DATA"
"F(T61)(EXPORT)"<= /td> "F(T61)(EXPORT)" "(" "5= " "7" "(&q= uot; "EXPORT" "EXPORT" ")" ")"
"TEST_DATA"
"F(T61)(EXPORT)"<= /td> "F(T61)(EXPORT)" "EXPORT" &q= uot;6" "7" "(" "EX= PORT" ")" "(" ")&q= uot;
"TEST_DATA"
"F(T61)(EXPORT)"<= /td> "F(T61)(EXPORT)" ")" "7= " "7" ")" "[NULL]&= quot; "[NULL]" "EXPORT" &quo= t;EXPORT"

Notice that the prv column (lag - 1 ) is just wrong.= =C2=A0 I've highlighted obvious bad values.=C2=A0 Other columns are wro= ng as well.

Is this a PostgreSQL bug?



--
Rumpi Gravenstein<= input type=3D"hidden">
--0000000000002a64f00620dab1e2--