Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tAUQG-00G1Vh-GJ for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 13:25:40 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tAUQD-00EFsG-DX for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 13:25:38 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tAUQC-00EFs8-VL for pgsql-hackers@lists.postgresql.org; Mon, 11 Nov 2024 13:25:37 +0000 Received: from meesny.iki.fi ([195.140.195.201]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tAUQA-001JSJ-9R for pgsql-hackers@postgresql.org; Mon, 11 Nov 2024 13:25:36 +0000 Received: from [192.168.1.110] (dsl-hkibng22-50ddb7-241.dhcp.inet.fi [80.221.183.241]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: hlinnaka) by meesny.iki.fi (Postfix) with ESMTPSA id 4Xn9K574RrzyQw; Mon, 11 Nov 2024 15:25:29 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=meesny; t=1731331530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=142nDxAlLfubPQS5G9LQq4O3LbTqJIG3m94iSw7Umzk=; b=oTzuhytpfyi5oDyheM9pB9F9S2bpBk2DIHZ1LMHuMz78mAuU4kp5gfK1RuyhevJvhJ7+jH yZ8iP2QvpFaab+Aoax6Tj9wGuBD2glhth8vWfz1d1t/heI3nojec32bxTlz9HSs3MKGSlT fBtXi8nCCcDJPQdtKEsGXGEeHQuS7q8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=meesny; t=1731331530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=142nDxAlLfubPQS5G9LQq4O3LbTqJIG3m94iSw7Umzk=; b=C8zY7R/xDg16is0njEoJYlPgC056cpzZdEAd0JhNO4yhChN0nZnR546nuhS1XaARPGowFW r6eCCcREVxeodIdD4kXuFnZ8Q6m84rEN63dJm1h8C5c1QAqNvvbp8DCrL9LkuLCDdoMlVD yfhl93YlmKFgtx8F2QQjon2Z45AzFJk= ARC-Seal: i=1; s=meesny; d=iki.fi; t=1731331530; a=rsa-sha256; cv=none; b=Dtz0rJiSV0fHcpch3tbOo7ujgkEVSna5pBAi++2RCv5BoV3AnZBJAIrXw38wZxYprhGVCl slQUakmPFzcHvzYsufidevfIyAOMArhJsb+d8j7ZkgZGflRd8QLZCqov16AL/qIYHGPRpL 4hs4fen0qsRLtlDeYREMRf7ChLmgW5s= ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=hlinnaka smtp.mailfrom=hlinnaka@iki.fi Message-ID: Date: Mon, 11 Nov 2024 15:25:29 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Support LIKE with nondeterministic collations To: Peter Eisentraut , Jacob Champion Cc: pgsql-hackers , Daniel Verite , Paul A Jungwirth References: <3104729e-1fbf-4368-ac21-1f670062de28@eisentraut.org> <0ca761b5-7b62-42a1-bffd-8bedefad48dd@eisentraut.org> Content-Language: en-US From: Heikki Linnakangas In-Reply-To: <0ca761b5-7b62-42a1-bffd-8bedefad48dd@eisentraut.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 04/11/2024 10:26, Peter Eisentraut wrote: > On 29.10.24 18:15, Jacob Champion wrote: >> libfuzzer is unhappy about the following code in MatchText: >> >>> +            while (p1len > 0) >>> +            { >>> +                if (*p1 == '\\') >>> +                { >>> +                    found_escape = true; >>> +                    NextByte(p1, p1len); >>> +                } >>> +                else if (*p1 == '_' || *p1 == '%') >>> +                    break; >>> +                NextByte(p1, p1len); >>> +            } >> >> If the pattern ends with a backslash, we'll call NextByte() twice, >> p1len will wrap around to INT_MAX, and we'll walk off the end of the >> buffer. (I fixed it locally by duplicating the ERROR case that's >> directly above this.) > > Thanks.  Here is an updated patch with that fixed. Sadly the algorithm is O(n^2) with non-deterministic collations.Is there any way this could be optimized? We make no claims on how expensive any functions or operators are, so I suppose a slow implementation is nevertheless better than throwing an error. Let's at least add some CHECK_FOR_INTERRUPTS(). For example, this takes a very long time and is uninterruptible: SELECT repeat('x', 100000) LIKE '%xxxy%' COLLATE ignore_accents; -- Heikki Linnakangas Neon (https://neon.tech)