Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tCsTG-00ECvr-PU for pgsql-hackers@arkaria.postgresql.org; Mon, 18 Nov 2024 03:30:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tCsTE-002gVp-8A for pgsql-hackers@arkaria.postgresql.org; Mon, 18 Nov 2024 03:30:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tCsTD-002gVh-RF for pgsql-hackers@lists.postgresql.org; Mon, 18 Nov 2024 03:30:36 +0000 Received: from mail-vk1-xa2a.google.com ([2607:f8b0:4864:20::a2a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tCsTA-002VfM-Sj for pgsql-hackers@postgresql.org; Mon, 18 Nov 2024 03:30:35 +0000 Received: by mail-vk1-xa2a.google.com with SMTP id 71dfb90a1353d-50d525be53eso950616e0c.0 for ; Sun, 17 Nov 2024 19:30:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731900632; x=1732505432; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8tVRytOSLRYG/zf12D9C8u9F8aKdZk+y5Ipp/2fb0sM=; b=KmFhf3ukIz1ifUz8Nv4vv/D7l/7yeCJERZukarriyXwi4ExKGAzUzhSP2JszrmOtXL vXQWjgECWs+X6Pmoyvy+tjxQwqKgnrZjOsKjq+HFElttYWOOmF9x4FYCMmtjbkMuNhvq LQo/t9VVYKmN1rLmb8MuGeglAswdjWj6qS01uZbnXqLpI+ZEYkYW8SaT/FZCo6HRtUAg ONN/XKwdjnyF9BTCdwXmQK/ImXYa4oyg+ftuIfG21inDqOrP5VpTaupTYWMx+VFpibvm bVYqVGW6bA9Xs9OYqpVe8jApvCkY7o0DywjwfHuOINhubS10GDboZe9TIXtk5BIoCfjb QABA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731900632; x=1732505432; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8tVRytOSLRYG/zf12D9C8u9F8aKdZk+y5Ipp/2fb0sM=; b=dCJguv9OdqG6W7+/C/BULpb5oNbryEaWWNhy9Y2L/GTs1Izt/4a8ShbCCEj8uceIcq Voad6kQ+VkWO5w296s6jP/vbJqXcBVSQGYoUKxa0OoXRcxuuUF7Oyh+7gFkKSfSu6AbF gpFbOSlZyJXN1nh1+3P8r5bFC0UbLynK/W+OsByvOoVMXrBxrOixlewBai1rJXSHwXas v6hxD71bovYFeHSwFFkVUWJa08zvxeA8NXwr65WXac0dTwBt5hoTe9EV98+dn96bclt2 aPgg011+0eR0Vd4Qx5LgALWL/xbCrWJjYTzww31bF1MXA7HzFLLUP7tjPm8pXxV85T7P T8kA== X-Forwarded-Encrypted: i=1; AJvYcCUtEHWtDJSRTvH4KakZV7u/9n8xN9J0tL02QIgXFxwqfGABIhljzwF7UCfZwDyXV1eYVKosUQWFWpQSV3lO@postgresql.org X-Gm-Message-State: AOJu0YzNv5Q5slfZXNcb75Vyx3ZVUGxzg5Jxal2Zjm3tb2XnMW4UzKIa yjIh+mKI5JGyk6w0NMUx3VXGvcAU3qOXVFmS+7USJ1LsK9cVEwFKNswFumcxQKGgHgkVpghCaNG yaKUQDr0h1aXpoUBFWFn1z0X7caU= X-Google-Smtp-Source: AGHT+IGdk1PHSy4+1oSjNHu5c/DGDi/MntlsjCFuB73h7Wjr8fCxTuRwzVchunISNFargzPgz/Gu+wX/O/bXQ03l4fM= X-Received: by 2002:a05:6122:2008:b0:50d:a31c:678c with SMTP id 71dfb90a1353d-51477ee90c0mr8310085e0c.2.1731900631902; Sun, 17 Nov 2024 19:30:31 -0800 (PST) MIME-Version: 1.0 References: <3104729e-1fbf-4368-ac21-1f670062de28@eisentraut.org> <0ca761b5-7b62-42a1-bffd-8bedefad48dd@eisentraut.org> <62156cde-b2c5-4375-b47a-9730aa8f7b5e@eisentraut.org> <07fdbb85-c530-48d0-adc0-7d43d7951e1b@eisentraut.org> In-Reply-To: <07fdbb85-c530-48d0-adc0-7d43d7951e1b@eisentraut.org> From: jian he Date: Mon, 18 Nov 2024 11:30:20 +0800 Message-ID: Subject: Re: Support LIKE with nondeterministic collations To: Peter Eisentraut Cc: Heikki Linnakangas , Jacob Champion , pgsql-hackers , Daniel Verite , Paul A Jungwirth Content-Type: multipart/mixed; boundary="000000000000f3befb0627278962" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f3befb0627278962 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Nov 15, 2024 at 11:42=E2=80=AFPM Peter Eisentraut wrote: > > On 15.11.24 05:26, jian he wrote: > > /* > > * Now build a substring of the text and try to match it against > > * the subpattern. t is the start of the text, t1 is one past the > > * last byte. We start with a zero-length string. > > */ > > t1 =3D t > > t1len =3D tlen; > > for (;;) > > { > > int cmp; > > CHECK_FOR_INTERRUPTS(); > > cmp =3D pg_strncoll(subpat, subpatlen, t, (t1 - t), locale); > > > > select '.foo.' LIKE '_oo' COLLATE ign_punct; > > pg_strncoll's iteration of the first 4 argument values. > > oo 2 foo. 0 > > oo 2 foo. 1 > > oo 2 foo. 2 > > oo 2 foo. 3 > > oo 2 foo. 4 > > > > seems there is a shortcut/optimization. > > if subpat don't have wildcard(percent sign, underscore) > > then we can have less pg_strncoll calls? > > How would you do that? You need to try all combinations to find one > that matches. > we can optimize when trailing (last character) is not wildcards. SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents; within the for loop for(;;) { int cmp; CHECK_FOR_INTERRUPTS(); .... } pg_strncoll comparison will become Ha12foo foo a12foo foo 12foo foo 2foo foo foo foo it's safe because in MatchText we have: else if (*p =3D=3D '%') { while (tlen > 0) { if (GETCHAR(*t, locale) =3D=3D firstpat || (locale && !locale->determin= istic)) { int matched =3D MatchText(t, tlen, p, plen, locale); if (matched !=3D LIKE_FALSE) return matched; /* TRUE or ABORT */ } NextChar(t, tlen); } } please check attached. > > minimum case to trigger error within GenericMatchText > > since no related tests. > > create table t1(a text collate case_insensitive, b text collate "C"); > > insert into t1 values ('a','a'); > > select a like b from t1; > > This results in > > ERROR: 42P22: could not determine which collation to use for LIKE > HINT: Use the COLLATE clause to set the collation explicitly. > > which is the expected behavior. > sorry, didn't mention it clearly, i mean we can add it to the regress test. --000000000000f3befb0627278962 Content-Type: application/octet-stream; name="v8-0001-LIKE-with-nondeterministic-collations-no-trail.no-cfbot" Content-Disposition: attachment; filename="v8-0001-LIKE-with-nondeterministic-collations-no-trail.no-cfbot" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_m3mgvtb30 RnJvbSAzZjgyYWQ0MzZkZWFmODEyMThkYzRiNDg5OGY2ZDZlZGIzMDA1Yzc4IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBqaWFuIGhlIDxqaWFuLnVuaXZlcnNhbGl0eUBnbWFpbC5jb20+ CkRhdGU6IE1vbiwgMTggTm92IDIwMjQgMTE6MDY6MTcgKzA4MDAKU3ViamVjdDogW1BBVENIIHY4 IDEvMV0gTElLRSB3aXRoIG5vbmRldGVybWluaXN0aWMgY29sbGF0aW9ucyBubyB0cmFpbGluZyB3 aWxkCiBjYXJkcwoKb3B0aW1pemUgd2hlcmUgdGhlcmUgaXMgbm8gdHJhaWxpbmcgd2lsZGNhcmRz LgotLS0KIHNyYy9iYWNrZW5kL3V0aWxzL2FkdC9saWtlX21hdGNoLmMgfCAzOCArKysrKysrKysr KysrKysrKysrKysrKysrKysrLS0KIDEgZmlsZSBjaGFuZ2VkLCAzNiBpbnNlcnRpb25zKCspLCAy IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9iYWNrZW5kL3V0aWxzL2FkdC9saWtlX21h dGNoLmMgYi9zcmMvYmFja2VuZC91dGlscy9hZHQvbGlrZV9tYXRjaC5jCmluZGV4IGYzY2E0ZDMw ODIuLmRkMWIxMzYyNTUgMTAwNjQ0Ci0tLSBhL3NyYy9iYWNrZW5kL3V0aWxzL2FkdC9saWtlX21h dGNoLmMKKysrIGIvc3JjL2JhY2tlbmQvdXRpbHMvYWR0L2xpa2VfbWF0Y2guYwpAQCAtMjEyLDYg KzIxMiw3IEBAIE1hdGNoVGV4dChjb25zdCBjaGFyICp0LCBpbnQgdGxlbiwgY29uc3QgY2hhciAq cCwgaW50IHBsZW4sIHBnX2xvY2FsZV90IGxvY2FsZSkKIAkJCWNvbnN0IGNoYXIgKnQxOwogCQkJ c2l6ZV90CQl0MWxlbjsKIAkJCWJvb2wJCWZvdW5kX2VzY2FwZTsKKwkJCWJvb2wJCXRyYWlsaW5n X3dpbGRjYXJkcyA9IGZhbHNlOwogCQkJY29uc3QgY2hhciAqc3VicGF0OwogCQkJc2l6ZV90CQlz dWJwYXRsZW47CiAJCQljaGFyCSAgICpidWYgPSBOVUxMOwpAQCAtMjc5LDcgKzI4MCwzMyBAQCBN YXRjaFRleHQoY29uc3QgY2hhciAqdCwgaW50IHRsZW4sIGNvbnN0IGNoYXIgKnAsIGludCBwbGVu LCBwZ19sb2NhbGVfdCBsb2NhbGUpCiAKIAkJCQlDSEVDS19GT1JfSU5URVJSVVBUUygpOwogCi0J CQkJY21wID0gcGdfc3RybmNvbGwoc3VicGF0LCBzdWJwYXRsZW4sIHQsICh0MSAtIHQpLCBsb2Nh bGUpOworCQkJCWlmICghdHJhaWxpbmdfd2lsZGNhcmRzKQorCQkJCXsKKwkJCQkJLyogd2UgY2hl Y2sgc3VicGF0bGVuIGFuZCBuZXh0IGNoYXJhY3RlciBpcyB3aWxkLWNhcmQgb3IKKwkJCQkJICog bm90LiBpZiBzdWJwYXRsZW4gPT0gMSwgdGhlbiB3ZSBkb24ndCBoYXZlIHRvIGNoZWNrIHRoZQor CQkJCQkgKiBuZXh0IGNoYXIuIGlmIHdlIGRpZG4ndCBmb3VuZCB0cmFpbGluZ193aWxkY2FyZHMg dGhlbiB3ZSBjYW4KKwkJCQkJICogc2FmZWx5IGNvbXBhcmUgd2hvbGUgc3VicGF0IHdpdGggdC4g aWYgY21wICE9IDAgd2UgcmV0dXJuCisJCQkJCSAqIExJS0VfRkFMU0UuCisJCQkJCSovCisJCQkJ CWlmIChzdWJwYXRsZW4gPT0gMSAmJiAoc3VicGF0WzBdID09ICclJyB8fCBzdWJwYXRbMF0gPT0g J18nKSkKKwkJCQkJeworCQkJCQkJdHJhaWxpbmdfd2lsZGNhcmRzID0gdHJ1ZTsKKwkJCQkJCWJy ZWFrOworCQkJCQl9CisJCQkJCWZvciAoaW50IGkgPSAwOyBpIDw9c3VicGF0bGVuOyBpKyspCisJ CQkJCXsKKwkJCQkJCWlmIChzdWJwYXRbaV0gPT0gJyUnIHx8IHN1YnBhdFtpXSA9PSAnXycpCisJ CQkJCQl7CisJCQkJCQkJdHJhaWxpbmdfd2lsZGNhcmRzID0gdHJ1ZTsKKwkJCQkJCQlicmVhazsK KwkJCQkJCX0KKwkJCQkJfQorCQkJCX0KKworCQkJCWlmICghdHJhaWxpbmdfd2lsZGNhcmRzKQor CQkJCQljbXAgPSBwZ19zdHJuY29sbChzdWJwYXQsIHN1YnBhdGxlbiwgdCwgdGxlbiwgbG9jYWxl KTsKKwkJCQllbHNlCisJCQkJCWNtcCA9IHBnX3N0cm5jb2xsKHN1YnBhdCwgc3VicGF0bGVuLCB0 LCAodDEgLSB0KSwgbG9jYWxlKTsKIAogCQkJCS8qCiAJCQkJICogSWYgd2UgZm91bmQgYSBtYXRj aCwgd2UgaGF2ZSB0byB0ZXN0IGlmIHRoZSByZXN0IG9mIHBhdHRlcm4KQEAgLTI5OSw3ICszMjYs MTIgQEAgTWF0Y2hUZXh0KGNvbnN0IGNoYXIgKnQsIGludCB0bGVuLCBjb25zdCBjaGFyICpwLCBp bnQgcGxlbiwgcGdfbG9jYWxlX3QgbG9jYWxlKQogCQkJCSAqLwogCQkJCWlmIChjbXAgPT0gMCkK IAkJCQl7Ci0JCQkJCWludAkJCW1hdGNoZWQgPSBNYXRjaFRleHQodDEsIHQxbGVuLCBwMSwgcDFs ZW4sIGxvY2FsZSk7CisJCQkJCWludAkJCW1hdGNoZWQ7CisKKwkJCQkJaWYgKCF0cmFpbGluZ193 aWxkY2FyZHMpCisJCQkJCQlyZXR1cm4gTElLRV9UUlVFOworCisJCQkJCW1hdGNoZWQgPSBNYXRj aFRleHQodDEsIHQxbGVuLCBwMSwgcDFsZW4sIGxvY2FsZSk7CiAKIAkJCQkJaWYgKG1hdGNoZWQg PT0gTElLRV9UUlVFKQogCQkJCQl7CkBAIC0zMDgsNiArMzQwLDggQEAgTWF0Y2hUZXh0KGNvbnN0 IGNoYXIgKnQsIGludCB0bGVuLCBjb25zdCBjaGFyICpwLCBpbnQgcGxlbiwgcGdfbG9jYWxlX3Qg bG9jYWxlKQogCQkJCQkJcmV0dXJuIG1hdGNoZWQ7CiAJCQkJCX0KIAkJCQl9CisJCQkJZWxzZSBp ZiAoIXRyYWlsaW5nX3dpbGRjYXJkcykKKwkJCQkJcmV0dXJuIExJS0VfRkFMU0U7CiAKIAkJCQkv KgogCQkJCSAqIERpZG4ndCBtYXRjaC4gIElmIHdlIHVzZWQgdXAgdGhlIHdob2xlIHRleHQsIHRo ZW4gdGhlIG1hdGNoCi0tIAoyLjM0LjEKCg== --000000000000f3befb0627278962--