public inbox for [email protected]  
help / color / mirror / Atom feed
From: Achilleas Mantzios - cloud <[email protected]>
To: [email protected]
Subject: Regex matching where text is input and regex stored in column
Date: Fri, 15 Sep 2023 13:52:52 +0300
Message-ID: <[email protected]> (raw)

Hello

I have a table like :

amantzio@sma/dynacom=# \d mail_vessel_addressbook
                                    Table "public.mail_vessel_addressbook"
Column  |  Type   | Collation | Nullable | 
                              Default
---------+---------+-----------+----------+-------------------------------------------------------------------- 

id      | integer |           | not null | 
nextval(('public.mail_vessel_addressbook_id_seq'::text)::regclass)
name    | text    |           | not null |
address | text    |           | not null |
vslid   | integer |           | not null |
Indexes:

.....

where address stores basically email patterns, human / edited text with 
possible wild cards, meant to be input mainly by our admins. There is an 
query that checks a particular new address (this runs via an exim4 mail 
server) which has become quite slow :

SELECT text('[email protected]') from mail_global_addressbook where 
text('[email protected]') ~* 
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')  UNION SELE
CT text('[email protected]') from mail_vessel_addressbook where 
text('[email protected]') ~* 
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')  LIMIT 1;

It basically seq scans both tables, and the 2nd table in particular is 
fairly big : 123966 rows. So the above query tends to take 3.6 secs on 
average to execute. A pg_trgm index would not help since there is no way 
(that I know of) to index a regex in pgsql. Most common use is about 
indexing values to be checked against a given regex, quite the reverse 
from what we need here.

I got great improvement by materializing 
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?'), we are 
running pgsql 10.23 :( , no generated columns possible, so I used a 
trigger, and then adding a normal b-tree index on the column. This 
causes Index only scan which is much better than the seq scan. Also I 
had to re-write the query as :

with

qry1 as

     (SELECT text('[email protected]') from mail_vessel_addressbook 
where

text('[email protected]') ~* 
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')

     LIMIT 1),
qry2 as

     (SELECT text('[email protected]') from mail_global_addressbook 
where

text('[email protected]') ~* 
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')

     LIMIT 1)
select text('[email protected]') from qry1 UNION select 
text('[email protected]') from qry2 LIMIT 1;

I was wondering if there is some more elegant and better solution, 
without the need for extra column and trigger.



view thread (2+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected]
  Subject: Re: Regex matching where text is input and regex stored in column
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox