MIME-Version: 1.0
In-Reply-To: <574C7E37.7080102@vboehm.de>
References: <574C7E37.7080102@vboehm.de>
Date: Mon, 30 May 2016 14:20:33 -0400
Message-ID: 
 <CAKFQuwaEUYS75qJVGQZJ7FGDZtM+kMrzTQzdPNLFhxVk+vQkDg@mail.gmail.com>
Subject: Re: similarity and operator '%'
From: "David G. Johnston" <david.g.johnston@gmail.com>
To: Volker Boehm <volker@vboehm.de>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Content-Type: multipart/alternative; boundary=001a11402cf2e6b2650534134fee
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

--001a11402cf2e6b2650534134fee
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Mon, May 30, 2016 at 1:53 PM, Volker Boehm <volker@vboehm.de> wrote:

>
> The reason for using the similarity function in place of the '%'-operator
> is that I want to use different similarity values in one query:
>
>     select name, street, zip, city
>     from addresses
>     where name % $1
>         and street % $2
>         and (zip % $3 or city % $4)
>         or similarity(name, $1) > 0.8
>
> which means: take all addresses where name, street, zip and city have
> little similarity _plus_ all addresses where the name matches very good.
>
>
> The only way I found, was to create a temporary table from the first
> query, change the similarity value with set_limit() and then select the
> second query UNION the temporary table.
>
> Is there a more elegant and straight forward way to achieve this result?
>

=E2=80=8BNot that I can envision.

You are forced into using an operator due to our index implementation.

You are thus forced into using a GUC to control the parameter that the
index scanning function uses to compute true/false.

A GUC can only take on a single value within a given query - well, not
quite true[1] but the exception doesn't seem like it will help here.

Th
us you are consigned to=E2=80=8B

=E2=80=8Busing two queries.

*=E2=80=8BA functional index=E2=80=8B doesn't work since the second argumen=
t is query
specific

[1]=E2=80=8B When defining a function you can attach a "SET" clause to it; =
commonly
used for search_path but should work with any GUC.  If you could wrap the
operator comparison into a custom function you could use this capability.
It also would require a function that would take the threshold as a value -
the extension only provides variations that use the GUC.

I don't think this will use the index even if it compiles (not tested):

CREATE FUNCTION similarity_80(col, val)
RETURNS boolean
SET similarity_threshold =3D 0.80
LANGUAGE sql
AS $$
=E2=80=8BSELECT =E2=80=8Bcol % val;
$$;

=E2=80=8BDavid J.=E2=80=8B

--001a11402cf2e6b2650534134fee
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:arial,he=
lvetica,sans-serif"><span style=3D"font-family:arial,sans-serif">On Mon, Ma=
y 30, 2016 at 1:53 PM, Volker Boehm </span><span dir=3D"ltr" style=3D"font-=
family:arial,sans-serif">&lt;<a href=3D"mailto:volker@vboehm.de" target=3D"=
_blank">volker@vboehm.de</a>&gt;</span><span style=3D"font-family:arial,san=
s-serif"> wrote:</span><br></div><div class=3D"gmail_extra"><div class=3D"g=
mail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(20=
4,204,204);padding-left:1ex"><br>
The reason for using the similarity function in place of the &#39;%&#39;-op=
erator is that I want to use different similarity values in one query:<br>
<br>
=C2=A0 =C2=A0 select name, street, zip, city<br>
=C2=A0 =C2=A0 from addresses<br>
=C2=A0 =C2=A0 where name % $1<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 and street % $2<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 and (zip % $3 or city % $4)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 or similarity(name, $1) &gt; 0.8<br>
<br>
which means: take all addresses where name, street, zip and city have littl=
e similarity _plus_ all addresses where the name matches very good.<br>
<br>
<br>
The only way I found, was to create a temporary table from the first query,=
 change the similarity value with set_limit() and then select the second qu=
ery UNION the temporary table.<br>
<br>
Is there a more elegant and straight forward way to achieve this result?<br=
></blockquote><div><br></div><div><div class=3D"gmail_default" style=3D"fon=
t-family:arial,helvetica,sans-serif;display:inline">=E2=80=8BNot that I can=
 envision.</div></div><div><div class=3D"gmail_default" style=3D"font-famil=
y:arial,helvetica,sans-serif;display:inline"><br></div></div><div><div clas=
s=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;display=
:inline">You are forced into using an operator due to our index implementat=
ion.</div></div><div><div class=3D"gmail_default" style=3D"font-family:aria=
l,helvetica,sans-serif;display:inline"><br></div></div><div><div class=3D"g=
mail_default" style=3D"font-family:arial,helvetica,sans-serif;display:inlin=
e">You are thus forced into using a GUC to control the parameter that the i=
ndex scanning function uses to compute true/false.</div></div><div><div cla=
ss=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;displa=
y:inline"><br></div></div><div><div class=3D"gmail_default" style=3D"font-f=
amily:arial,helvetica,sans-serif;display:inline">A GUC can only take on a s=
ingle value within a given query - well, not quite true[1] but the exceptio=
n doesn&#39;t seem like it will help here.</div></div><div><div class=3D"gm=
ail_default" style=3D"font-family:arial,helvetica,sans-serif;display:inline=
"><br></div></div><div><div class=3D"gmail_default" style=3D"font-family:ar=
ial,helvetica,sans-serif;display:inline">Th</div><div class=3D"gmail_defaul=
t" style=3D"font-family:arial,helvetica,sans-serif;display:inline">us you a=
re consigned to=E2=80=8B</div>=C2=A0<div class=3D"gmail_default" style=3D"f=
ont-family:arial,helvetica,sans-serif;display:inline">=E2=80=8Busing two qu=
eries.</div></div><div><br></div><div><div class=3D"gmail_default" style=3D=
"font-family:arial,helvetica,sans-serif">*=E2=80=8BA functional index=E2=80=
=8B doesn&#39;t work since the second argument is query specific</div></div=
><div class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-ser=
if"><br></div><div><div class=3D"gmail_default" style=3D"font-family:arial,=
helvetica,sans-serif;display:inline">[1]=E2=80=8B When defining a function =
you can attach a &quot;SET&quot; clause to it; commonly used for search_pat=
h but should work with any GUC.=C2=A0 If you could wrap the operator compar=
ison into a custom function you could use this capability.=C2=A0 It also wo=
uld require a function that would take the threshold as a value - the exten=
sion only provides variations that use the GUC.</div></div><div><div class=
=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;display:=
inline"><br></div></div><div><div class=3D"gmail_default" style=3D"font-fam=
ily:arial,helvetica,sans-serif;display:inline">I don&#39;t think this will =
use the index even if it compiles (not tested):</div></div><div><div class=
=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;display:=
inline"><br></div></div><div><div class=3D"gmail_default" style=3D"font-fam=
ily:arial,helvetica,sans-serif;display:inline">CREATE FUNCTION similarity_8=
0(col, val)</div></div><div><div class=3D"gmail_default" style=3D"font-fami=
ly:arial,helvetica,sans-serif;display:inline">RETURNS boolean</div></div><d=
iv><div class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-s=
erif;display:inline">SET=C2=A0similarity_threshold =3D 0.80</div></div><div=
><div class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-ser=
if;display:inline">LANGUAGE sql</div></div><div><div class=3D"gmail_default=
" style=3D"font-family:arial,helvetica,sans-serif;display:inline">AS $$</di=
v></div><div><div class=3D"gmail_default" style=3D"font-family:arial,helvet=
ica,sans-serif">=E2=80=8BSELECT =E2=80=8Bcol % val;</div></div><div><div cl=
ass=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;displ=
ay:inline">$$;</div></div><div><br></div><div><div class=3D"gmail_default" =
style=3D"font-family:arial,helvetica,sans-serif">=E2=80=8BDavid J.=E2=80=8B=
</div><br></div></div></div></div>

--001a11402cf2e6b2650534134fee--