Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tDf9e-002NWI-SH for pgsql-hackers@arkaria.postgresql.org; Wed, 20 Nov 2024 07:29:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tDf9d-002qlN-F1 for pgsql-hackers@arkaria.postgresql.org; Wed, 20 Nov 2024 07:29:37 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tDf9d-002qlE-4u for pgsql-hackers@lists.postgresql.org; Wed, 20 Nov 2024 07:29:37 +0000 Received: from mail-vk1-xa36.google.com ([2607:f8b0:4864:20::a36]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tDf9a-002q5P-OH for pgsql-hackers@postgresql.org; Wed, 20 Nov 2024 07:29:35 +0000 Received: by mail-vk1-xa36.google.com with SMTP id 71dfb90a1353d-514668ec665so475727e0c.0 for ; Tue, 19 Nov 2024 23:29:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732087774; x=1732692574; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=29jkPiivDOo+ztUDTGtTG+oCxBHmTpiyflR0LY6dbiU=; b=TYNylZNFbwtZp3S2QqLp0fwb/+6mXKzqeLUxP8L4B0C0OCEK9hVM+2Tm2c0uXXa/rY KRLkLI1ek/BgghSq2W/bAV+OAGEdl8S8RBw+Z5c21wPQVuR8gwIV4YBJ/2X0kB3vzgCB DbLjpSnoVt6/tA4F2mEJR4X+EeYrHHh9oi4amkbCOagZjiA/biLuzpFKMkGArSLPDLuY CbOg++S2VI6hUC7UIK40TDzeWeqZN1XNb2jTSTyLp1/OiSxviNblMoAYovTVDFedjv1n hnLfjsts28wDLgM7EN9Z7H86TY93jL0yk+FxXZyrME36UjZcUOi0iq8ONhisHMGXGwXr a7mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732087774; x=1732692574; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=29jkPiivDOo+ztUDTGtTG+oCxBHmTpiyflR0LY6dbiU=; b=ZzKsIz7Iiz2zlUzYBF2bI1V2Df9yqWRTr8Xo6f6WKZ4mpgyjs5LAdNDk1TbKVVb+R5 TFDS0clcjfmuP/llOXQH8CI4cqype3lq1+B5qTKj4RsCtMx4NqrLauVr6nIW5q4WbaoT b+nwtB4J5BucKCGUHdlP5PCzK+8PbLKi1MApwry4MNW+CEMDsQEngyX4nB5srNpWBDra vPloKE0KRsgJVtDKLs3ONazsUzLFQdEj4sMZpyVhIU8wlGZK7w57ODwx8kJdGgvfA7EN Y5qoa3xk2kdZy7NGBxdVG2E4bh/7AhmMq0Cfffmc5SmOoAozEd0Z/pT5Urse1p+Fo5WH /D2w== X-Forwarded-Encrypted: i=1; AJvYcCWt59ZhIP3eKOUmiO0/x4l/6sVHMshrDy3An0bRqoPgcutamWKO2PaG+uSsjjhI/9bIDJ82rsHLB/KTPgPV@postgresql.org X-Gm-Message-State: AOJu0YybPDeH8an0nIUy2kr7tjlcKVmiEnVIWvDieFy7IRxTHQvlm4a2 m39/wdkaMMYg5kHZg843I8WQ2Zftuyzh+sI4o0lm6ElldxV/SMcpZYuS1fyzEJGesrxDkBQZLi6 CyyWXCMXqFFCdzQOeJYkJ6iFDDso= X-Google-Smtp-Source: AGHT+IEBxvzUPb/087BorGnSR9XzM/QD29ZTIb/T1NdyK4K60FykjBlf4J/Cq+/qAAHWcTvt9lov7pRedPSdvulXlME= X-Received: by 2002:a05:6102:2456:b0:4a4:87a7:88ed with SMTP id ada2fe7eead31-4ad9efec883mr5251546137.2.1732087773886; Tue, 19 Nov 2024 23:29:33 -0800 (PST) MIME-Version: 1.0 References: <3104729e-1fbf-4368-ac21-1f670062de28@eisentraut.org> <0ca761b5-7b62-42a1-bffd-8bedefad48dd@eisentraut.org> <62156cde-b2c5-4375-b47a-9730aa8f7b5e@eisentraut.org> <07fdbb85-c530-48d0-adc0-7d43d7951e1b@eisentraut.org> In-Reply-To: From: jian he Date: Wed, 20 Nov 2024 15:29:22 +0800 Message-ID: Subject: Re: Support LIKE with nondeterministic collations To: Peter Eisentraut Cc: Heikki Linnakangas , Jacob Champion , pgsql-hackers , Daniel Verite , Paul A Jungwirth Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Nov 19, 2024 at 9:51=E2=80=AFPM Peter Eisentraut wrote: > > On 18.11.24 04:30, jian he wrote: > > we can optimize when trailing (last character) is not wildcards. > > > > SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents; > > within the for loop > > for(;;) > > { > > int cmp; > > CHECK_FOR_INTERRUPTS(); > > .... > > } > > > > pg_strncoll comparison will become > > Ha12foo foo > > a12foo foo > > 12foo foo > > 2foo foo > > foo foo > > > > it's safe because in MatchText we have: > > else if (*p =3D=3D '%') > > { > > while (tlen > 0) > > { > > if (GETCHAR(*t, locale) =3D=3D firstpat || (locale && !locale->det= erministic)) > > { > > int matched =3D MatchText(t, tlen, p, plen, locale)= ; > > if (matched !=3D LIKE_FALSE) > > return matched; /* TRUE or ABORT */ > > } > > NextChar(t, tlen); > > } > > } > > > > please check attached. > > I see, good idea. I implemented it a bit differently. See "Shortcut: > If this is the end of the pattern ..." in this patch. Please check if > this is what you had in mind. your implementation is far more simpler than mine. I think I understand it. i am trying to optimize case where pattern is begin_with like `pattern%` but failed on case like: SELECT U&'\0061\0308bc' LIKE U&'\00E4bc%' COLLATE ignore_accents; basically the_string like the_pattern%. the length of the_string and length of the_pattern can vary, we can not just do one pg_strncoll. in match_pattern_prefix maybe change if (expr_coll && !get_collation_isdeterministic(expr_coll)) return NIL; to if (OidIsValid(expr_coll) && !get_collation_isdeterministic(expr_coll)) return NIL; other than that, I didn't find any issue.