public inbox for [email protected]
help / color / mirror / Atom feedFrom: Achilleas Mantzios - cloud <[email protected]>
To: [email protected]
Subject: Regex matching where text is input and regex stored in column
Date: Fri, 15 Sep 2023 13:52:52 +0300
Message-ID: <[email protected]> (raw)
Hello
I have a table like :
amantzio@sma/dynacom=# \d mail_vessel_addressbook
Table "public.mail_vessel_addressbook"
Column | Type | Collation | Nullable |
Default
---------+---------+-----------+----------+--------------------------------------------------------------------
id | integer | | not null |
nextval(('public.mail_vessel_addressbook_id_seq'::text)::regclass)
name | text | | not null |
address | text | | not null |
vslid | integer | | not null |
Indexes:
.....
where address stores basically email patterns, human / edited text with
possible wild cards, meant to be input mainly by our admins. There is an
query that checks a particular new address (this runs via an exim4 mail
server) which has become quite slow :
SELECT text('[email protected]') from mail_global_addressbook where
text('[email protected]') ~*
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?') UNION SELE
CT text('[email protected]') from mail_vessel_addressbook where
text('[email protected]') ~*
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?') LIMIT 1;
It basically seq scans both tables, and the 2nd table in particular is
fairly big : 123966 rows. So the above query tends to take 3.6 secs on
average to execute. A pg_trgm index would not help since there is no way
(that I know of) to index a regex in pgsql. Most common use is about
indexing values to be checked against a given regex, quite the reverse
from what we need here.
I got great improvement by materializing
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?'), we are
running pgsql 10.23 :( , no generated columns possible, so I used a
trigger, and then adding a normal b-tree index on the column. This
causes Index only scan which is much better than the seq scan. Also I
had to re-write the query as :
with
qry1 as
(SELECT text('[email protected]') from mail_vessel_addressbook
where
text('[email protected]') ~*
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')
LIMIT 1),
qry2 as
(SELECT text('[email protected]') from mail_global_addressbook
where
text('[email protected]') ~*
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')
LIMIT 1)
select text('[email protected]') from qry1 UNION select
text('[email protected]') from qry2 LIMIT 1;
I was wondering if there is some more elegant and better solution,
without the need for extra column and trigger.
view thread (2+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected]
Subject: Re: Regex matching where text is input and regex stored in column
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox