Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b7ROB-0003FB-U0 for pgsql-performance@arkaria.postgresql.org; Mon, 30 May 2016 17:54:08 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1b7ROB-0001z7-DI for pgsql-performance@arkaria.postgresql.org; Mon, 30 May 2016 17:54:07 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1b7ROA-0001yf-MV for pgsql-performance@postgresql.org; Mon, 30 May 2016 17:54:06 +0000 Received: from d101a1.x-mailer.de ([212.162.53.5]) by magus.postgresql.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1b7RO7-0003Yu-65 for pgsql-performance@postgresql.org; Mon, 30 May 2016 17:54:06 +0000 Received: from [213.61.101.188] (helo=[10.128.2.197]) by d101.x-mailer.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1b7RO4-0007W1-0R for pgsql-performance@postgresql.org; Mon, 30 May 2016 19:54:00 +0200 To: pgsql-performance@postgresql.org From: Volker Boehm Subject: similarity and operator '%' Message-ID: <574C7E37.7080102@vboehm.de> Date: Mon, 30 May 2016 19:53:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Info: valid message X-Info: original Date X-Pg-Spam-Score: -1.9 (-) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org Hello, I'm trying to find persons in an address database where I have built trgm-indexes on name, street, zip and city. When I search for all four parts of the address (name, street, zip and city) select name, street, zip, city from addresses where name % $1 and street % $2 and (zip % $3 or city % $4) everything works fine: It takes less than a second to get some (5 - 500) proposed addresses out of 500,000 addresses and the query plan shows Bitmap Heap Scan on addresses (cost=168.31..1993.38 rows=524 ... Recheck Cond: ... -> Bitmap Index Scan on ... Index Cond: ... The same happens when I search only by name with select name, street, zip, city from addresses where name % $1 But when I rewrite this query to select name, street, zip, city from addresses where similarity(name, $1) > 0.3 which means exactly then same as the second example, the query plan changes to Seq Scan on addresses (cost=0.00..149714.42 rows=174675 width=60) Filter: ... and the query lasts about a minute. The reason for using the similarity function in place of the '%'-operator is that I want to use different similarity values in one query: select name, street, zip, city from addresses where name % $1 and street % $2 and (zip % $3 or city % $4) or similarity(name, $1) > 0.8 which means: take all addresses where name, street, zip and city have little similarity _plus_ all addresses where the name matches very good. The only way I found, was to create a temporary table from the first query, change the similarity value with set_limit() and then select the second query UNION the temporary table. Is there a more elegant and straight forward way to achieve this result? regards Volker -- Volker Böhm Tel.: +49 4141 981155 Voßkuhl 5 mailto:volker@vboehm.de 21682 Stade http://www.vboehm.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance