Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tBnur-006EQR-MJ for pgsql-hackers@arkaria.postgresql.org; Fri, 15 Nov 2024 04:26:41 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tBnuo-007op9-W2 for pgsql-hackers@arkaria.postgresql.org; Fri, 15 Nov 2024 04:26:39 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tBnuo-007op1-Jc for pgsql-hackers@lists.postgresql.org; Fri, 15 Nov 2024 04:26:39 +0000 Received: from mail-ua1-x92a.google.com ([2607:f8b0:4864:20::92a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tBnul-0020hf-KD for pgsql-hackers@postgresql.org; Fri, 15 Nov 2024 04:26:38 +0000 Received: by mail-ua1-x92a.google.com with SMTP id a1e0cc1a2514c-856cb012055so157037241.3 for ; Thu, 14 Nov 2024 20:26:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731644795; x=1732249595; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=W/uW0zSwhQ2Vph7scffBi1+dSQ3TcTDp3NT+KsDxddQ=; b=hvh2nm4PvG3mrkg9AOzQyKHvZQo/zogB4WAL4SjcKz3o/AObbYqIYuwQ8BGrIMH8RV 2nsImYy46da+/3GqX78Z618vs8U3ZQJtiPvfR+Obs+HBKctlbzmlXl03bgiAvcYI5Fv2 4stphKgOufpIJ0Nck4X48jN4BIK6sHrGkksGHf4Ggp4kw2rOj0f1eYNEOYUa0XrxZW9a MvH8XinIRPMFigrCS0XH8fA8C5MOrPVT9h9X1l7i0Mi/lzSaCswqggDikTDbLF8JWc6i ZDRCUh3K+Y8Ei3QyCDZrxDNzHI5xHQAVD41/naNnBs9/ckLX+ruJ5BmjouPlT7JIU/14 59lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731644795; x=1732249595; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W/uW0zSwhQ2Vph7scffBi1+dSQ3TcTDp3NT+KsDxddQ=; b=QhvrBybmLn2zI/CS1m2H+FypQDT38eH3eptheiOKf+tBn7HZk0dX+/R8+8HH9hD6i8 3maabHAfDqji+0NTEc5pIwMTFfAPEM7oiWVNIdBMEfgKhhfNLJNPqmEiG8RFpch4rEer XeDtOaxGnIgdU+MAe9MP2qN+Y/ZwbRj4ygAYBhdjy9DFWOY//DQMLPoDFiRslmBtxEZu oEZoJyvA8gMCCXSXppOOovRSPGOzwcdKtsBeNxNJIHN9X8UDMG0w2a30jiXaPrTc3Qgh TRTVO4T2ncMs4CEdYSOeSeAIQzWh7vG8arAZ/ZPqA9XCcTzgzMA+ybJJuUQdcj1C8nEM sZcg== X-Forwarded-Encrypted: i=1; AJvYcCXRG8SzpWQhis+42Hr1CR06enwmXAlZbYGF1vwUS0Q4gII5LObcp5Hs0sFLTGMDjNER8D33BvLpEDwgMNfV@postgresql.org X-Gm-Message-State: AOJu0Yy5RXpuIJmcKdMFyqBp5MOzadNHtrpsw/XG5iYDTlZkLgiNUZua Ek7WVcQmc82OS5vbvUVsLPUzUt78mFAvu96zRQ3H6mYHr6GnmpdljK9Ro2oL8pSQGIJQ/+6wgXK H1r9h0IWqqVg5aq0BoWVpQQD+mS54aHzKJCM= X-Google-Smtp-Source: AGHT+IF1/R8tYocsIwKtjCk1DA2Yk60Je6AxLk5cEwwmgGIT+xnKfxOl+jpjEe017wt76nu38agJK+o/VZb/s6SIxT0= X-Received: by 2002:a05:6102:26c7:b0:4a7:4900:4b7c with SMTP id ada2fe7eead31-4ad62948531mr1932985137.0.1731644794889; Thu, 14 Nov 2024 20:26:34 -0800 (PST) MIME-Version: 1.0 References: <3104729e-1fbf-4368-ac21-1f670062de28@eisentraut.org> <0ca761b5-7b62-42a1-bffd-8bedefad48dd@eisentraut.org> <62156cde-b2c5-4375-b47a-9730aa8f7b5e@eisentraut.org> In-Reply-To: <62156cde-b2c5-4375-b47a-9730aa8f7b5e@eisentraut.org> From: jian he Date: Fri, 15 Nov 2024 12:26:24 +0800 Message-ID: Subject: Re: Support LIKE with nondeterministic collations To: Peter Eisentraut Cc: Heikki Linnakangas , Jacob Champion , pgsql-hackers , Daniel Verite , Paul A Jungwirth Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Nov 12, 2024 at 3:45=E2=80=AFPM Peter Eisentraut wrote: > > On 11.11.24 14:25, Heikki Linnakangas wrote: > > Sadly the algorithm is O(n^2) with non-deterministic collations.Is ther= e > > any way this could be optimized? We make no claims on how expensive any > > functions or operators are, so I suppose a slow implementation is > > nevertheless better than throwing an error. > > Yeah, maybe someone comes up with new ideas in the future. > /* * Now build a substring of the text and try to match it against * the subpattern. t is the start of the text, t1 is one past the * last byte. We start with a zero-length string. */ t1 =3D t t1len =3D tlen; for (;;) { int cmp; CHECK_FOR_INTERRUPTS(); cmp =3D pg_strncoll(subpat, subpatlen, t, (t1 - t), locale); select '.foo.' LIKE '_oo' COLLATE ign_punct; pg_strncoll's iteration of the first 4 argument values. oo 2 foo. 0 oo 2 foo. 1 oo 2 foo. 2 oo 2 foo. 3 oo 2 foo. 4 seems there is a shortcut/optimization. if subpat don't have wildcard(percent sign, underscore) then we can have less pg_strncoll calls? minimum case to trigger error within GenericMatchText since no related tests. create table t1(a text collate case_insensitive, b text collate "C"); insert into t1 values ('a','a'); select a like b from t1; at 9.7.1. LIKE section, we still don't know what "wildcard" is. we mentioned it at 9.7.2. maybe we can add a sentence at the end of: If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE acts like the equals operator. An underscore (_) in pattern stands for (matches) any single character; a percent sign (%) matches any sequence of zero or more characters. saying underscore and percent sign are wildcards in LIKE. other than that, I can understand the doc.