public inbox for [email protected]
help / color / mirror / Atom feedFrom: Volker Boehm <[email protected]>
To: [email protected]
Subject: similarity and operator '%'
Date: Mon, 30 May 2016 19:53:59 +0200
Message-ID: <[email protected]> (raw)
List-Unsubscribe: <mailto:[email protected]?body=unsub%20pgsql-performance>
Hello,
I'm trying to find persons in an address database where I have built
trgm-indexes on name, street, zip and city.
When I search for all four parts of the address (name, street, zip and city)
select name, street, zip, city
from addresses
where name % $1
and street % $2
and (zip % $3 or city % $4)
everything works fine: It takes less than a second to get some (5 - 500)
proposed addresses out of 500,000 addresses and the query plan shows
Bitmap Heap Scan on addresses (cost=168.31..1993.38 rows=524 ...
Recheck Cond: ...
-> Bitmap Index Scan on ...
Index Cond: ...
The same happens when I search only by name with
select name, street, zip, city
from addresses
where name % $1
But when I rewrite this query to
select name, street, zip, city
from addresses
where similarity(name, $1) > 0.3
which means exactly then same as the second example, the query plan
changes to
Seq Scan on addresses (cost=0.00..149714.42 rows=174675 width=60)
Filter: ...
and the query lasts about a minute.
The reason for using the similarity function in place of the
'%'-operator is that I want to use different similarity values in one query:
select name, street, zip, city
from addresses
where name % $1
and street % $2
and (zip % $3 or city % $4)
or similarity(name, $1) > 0.8
which means: take all addresses where name, street, zip and city have
little similarity _plus_ all addresses where the name matches very good.
The only way I found, was to create a temporary table from the first
query, change the similarity value with set_limit() and then select the
second query UNION the temporary table.
Is there a more elegant and straight forward way to achieve this result?
regards Volker
--
Volker Böhm Tel.: +49 4141 981155
Voßkuhl 5 mailto:[email protected]
21682 Stade http://www.vboehm.de
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected]
Subject: Re: similarity and operator '%'
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox