Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b7Rnw-00056r-UF for pgsql-performance@arkaria.postgresql.org; Mon, 30 May 2016 18:20:45 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1b7Rnv-0001E8-Gu for pgsql-performance@arkaria.postgresql.org; Mon, 30 May 2016 18:20:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1b7Rnu-0001Dx-TB for pgsql-performance@postgresql.org; Mon, 30 May 2016 18:20:43 +0000 Received: from mail-qk0-x231.google.com ([2607:f8b0:400d:c09::231]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1b7Rnn-0004Yo-9v for pgsql-performance@postgresql.org; Mon, 30 May 2016 18:20:42 +0000 Received: by mail-qk0-x231.google.com with SMTP id h185so88068796qke.2 for ; Mon, 30 May 2016 11:20:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=NqNs6QgP5dCH60yhGHWvGFxPdubvfzWijlmI1ypNYJA=; b=hm6MksWMHsL4UOaIpNVHgK9cuC9TmtR5Qc5BtXaA3RsYbahOa/P1Ax3xsWgBZLjLHb VROFhnKBQKMazwshedc6AIFR3s6p5T0QWzAhahzDnfwCisDgVuFeLExM2dXZ9HMHOu1e 78Tr2OqgfIqIs0E7kqoz8mAKG8G9oCgJuj0ACTHg5EqsxllCiz4rXkpMg/q5hFxF3JjD 8TnrGhHTf1qLtIRUfZDMNw0RB/kSlfmJ3vI+lGLUJC5rEbQZdC8ZxahSNdX2ovojl5lX X9u9bdlZNGg+VSYPN8YolB/SA3C+QSMPNS6w9roYze3iQG7A20r6FXi5Wg6Ul9+QzYkA eqEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=NqNs6QgP5dCH60yhGHWvGFxPdubvfzWijlmI1ypNYJA=; b=iQyOlldQjhlWmDMBfd4nGYiTkzJ1lxApgmv2FmggIy4XrUXp/yHWY2MLCou6WEXU9c u3LO0BJzIhlXm9YiII8wk+5WAL1lQyVYC4lcxA1qDkQVyWHKwiZHuTmfApZ4cCyroShw 764pHdvmSA8UsRD8n5OiP5pN2OzIg/GFfjaOEJTeN8Trs7og0zFJhrX24VH1x7KKeBLY kymDfTo7BhN9A75mCbExMup+bhovEQgZK0fsLkq3tUQeni8mLwwh8j7I3m076lEgTcHF nB1PaHe6OV4VTkv7qRK01XSBso3TX8G30pnXg20bj+sQ+H67aK32wzWqPbysbDrOgwiW imXw== X-Gm-Message-State: ALyK8tKs69jtgNJm4R5KwrlaWeah1bW+Xq9wHcZGOCNSls/i8yj8kNRew/dLLGb4SZS8KzYOzzOQyIwqNdsAuA== MIME-Version: 1.0 X-Received: by 10.200.48.54 with SMTP id f51mr28546443qte.84.1464632433087; Mon, 30 May 2016 11:20:33 -0700 (PDT) Received: by 10.55.149.67 with HTTP; Mon, 30 May 2016 11:20:33 -0700 (PDT) In-Reply-To: <574C7E37.7080102@vboehm.de> References: <574C7E37.7080102@vboehm.de> Date: Mon, 30 May 2016 14:20:33 -0400 Message-ID: Subject: Re: similarity and operator '%' From: "David G. Johnston" To: Volker Boehm Cc: "pgsql-performance@postgresql.org" Content-Type: multipart/alternative; boundary=001a11402cf2e6b2650534134fee X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a11402cf2e6b2650534134fee Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, May 30, 2016 at 1:53 PM, Volker Boehm wrote: > > The reason for using the similarity function in place of the '%'-operator > is that I want to use different similarity values in one query: > > select name, street, zip, city > from addresses > where name % $1 > and street % $2 > and (zip % $3 or city % $4) > or similarity(name, $1) > 0.8 > > which means: take all addresses where name, street, zip and city have > little similarity _plus_ all addresses where the name matches very good. > > > The only way I found, was to create a temporary table from the first > query, change the similarity value with set_limit() and then select the > second query UNION the temporary table. > > Is there a more elegant and straight forward way to achieve this result? > =E2=80=8BNot that I can envision. You are forced into using an operator due to our index implementation. You are thus forced into using a GUC to control the parameter that the index scanning function uses to compute true/false. A GUC can only take on a single value within a given query - well, not quite true[1] but the exception doesn't seem like it will help here. Th us you are consigned to=E2=80=8B =E2=80=8Busing two queries. *=E2=80=8BA functional index=E2=80=8B doesn't work since the second argumen= t is query specific [1]=E2=80=8B When defining a function you can attach a "SET" clause to it; = commonly used for search_path but should work with any GUC. If you could wrap the operator comparison into a custom function you could use this capability. It also would require a function that would take the threshold as a value - the extension only provides variations that use the GUC. I don't think this will use the index even if it compiles (not tested): CREATE FUNCTION similarity_80(col, val) RETURNS boolean SET similarity_threshold =3D 0.80 LANGUAGE sql AS $$ =E2=80=8BSELECT =E2=80=8Bcol % val; $$; =E2=80=8BDavid J.=E2=80=8B --001a11402cf2e6b2650534134fee Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On Mon, Ma= y 30, 2016 at 1:53 PM, Volker Boehm <volker@vboehm.de> wrote:

The reason for using the similarity function in place of the '%'-op= erator is that I want to use different similarity values in one query:

=C2=A0 =C2=A0 select name, street, zip, city
=C2=A0 =C2=A0 from addresses
=C2=A0 =C2=A0 where name % $1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 and street % $2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 and (zip % $3 or city % $4)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 or similarity(name, $1) > 0.8

which means: take all addresses where name, street, zip and city have littl= e similarity _plus_ all addresses where the name matches very good.


The only way I found, was to create a temporary table from the first query,= change the similarity value with set_limit() and then select the second qu= ery UNION the temporary table.

Is there a more elegant and straight forward way to achieve this result?

=E2=80=8BNot that I can= envision.

You are forced into using an operator due to our index implementat= ion.

You are thus forced into using a GUC to control the parameter that the i= ndex scanning function uses to compute true/false.

A GUC can only take on a s= ingle value within a given query - well, not quite true[1] but the exceptio= n doesn't seem like it will help here.

Th
us you a= re consigned to=E2=80=8B
=C2=A0
=E2=80=8Busing two qu= eries.

*=E2=80=8BA functional index=E2=80= =8B doesn't work since the second argument is query specific

[1]=E2=80=8B When defining a function = you can attach a "SET" clause to it; commonly used for search_pat= h but should work with any GUC.=C2=A0 If you could wrap the operator compar= ison into a custom function you could use this capability.=C2=A0 It also wo= uld require a function that would take the threshold as a value - the exten= sion only provides variations that use the GUC.

I don't think this will = use the index even if it compiles (not tested):

CREATE FUNCTION similarity_8= 0(col, val)
RETURNS boolean
SET=C2=A0similarity_threshold =3D 0.80
LANGUAGE sql
AS $$
=E2=80=8BSELECT =E2=80=8Bcol % val;
$$;

=E2=80=8BDavid J.=E2=80=8B=

--001a11402cf2e6b2650534134fee--