Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dQRrk-0007kg-Rk for pgsql-performance@arkaria.postgresql.org; Thu, 29 Jun 2017 05:19:45 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1dQRrk-00059j-4p for pgsql-performance@arkaria.postgresql.org; Thu, 29 Jun 2017 05:19:44 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dQRpy-000222-3Y for pgsql-performance@postgresql.org; Thu, 29 Jun 2017 05:17:54 +0000 Received: from mail-qk0-x234.google.com ([2607:f8b0:400d:c09::234]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1dQRpq-0007Lx-2A for pgsql-performance@postgresql.org; Thu, 29 Jun 2017 05:17:52 +0000 Received: by mail-qk0-x234.google.com with SMTP id d78so67567943qkb.1 for ; Wed, 28 Jun 2017 22:17:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=z8uQ3qt52CY+KSVpvfp5BxUEln7/aCj1hmme1MCDtD0=; b=DfknqjzqFmhY3W7QnDmV/IHEAfi14vIEkuLBAoEasPWd00h5dmCRG69bOS4e33+1DO JOhiBQ9NBXpW2O47Y9DgjL2SLHtmyfzjOMg94oo5PSM7RUc3KrtgS9CUHN4LJe4bjeHA l4ruek5rYICFoppbi5erx7JcjbmTEPhJ8SoFf65nOOrxkoe3ffYn8eaozBFzM0i5OXb7 JqtgjeF3vRkBGGNJO8Ko2hZNYgxw3KWNEXEcdQyUBsJp8ZOPvOnuHXP7BUyThxsLZw3/ KdKeMyzcFzKB9aA/tKy2b2FMbDFSb44Gu2RUsDr19S2Hby/lIzYxYZM9TX4TmCU2MDQZ TRUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=z8uQ3qt52CY+KSVpvfp5BxUEln7/aCj1hmme1MCDtD0=; b=hHwON7afFC46UUWTIkjWKycgtTkyoOaRLt+HaLX03fdvgEoOU+Uknn3OdtVA4lXo0T w9SZDB/qkdxcPSdkfhMEmIpbsuU+LXgBT8IWTVc23VYdPztGOETZfe3pxuz7fLlMAq8e VjBabOsUXs99pRHIDTsq+3xj/DUgrM834whu80ePD3i2pC6Cy15n6LP6n+w3wPhWjdKv JTjkrtkp1iBHGIjE7JGKCdk+IOPx4z9BdbXm3t12KIilZOBhtqc49OM5BmzIgwt0nw9v yBtYXO7UI1tZTD8zQ6lzOrHTSvIVwEbaPSeehq8wKblayhBatDbHQmPvDJOP791XIGhH F6IA== X-Gm-Message-State: AKS2vOxtAERfgKaa2ye8DZu83PhLubdP7ktXIhyYk5b1CmqCZdH+9hKG U9AK4uCBxOTiyUBnHN6wRGfnldRA8Q== X-Received: by 10.55.27.96 with SMTP id b93mr15662037qkb.235.1498713464929; Wed, 28 Jun 2017 22:17:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.179.16 with HTTP; Wed, 28 Jun 2017 22:17:44 -0700 (PDT) In-Reply-To: References: From: Yevhenii Kurtov Date: Thu, 29 Jun 2017 12:17:44 +0700 Message-ID: Subject: Re: To: Brad DeJong Cc: Pavel Stehule , "pgsql-performance@postgresql.org" Content-Type: multipart/alternative; boundary="001a1147e7a0b282340553126b9d" List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a1147e7a0b282340553126b9d Content-Type: text/plain; charset="UTF-8" Hello folks, Thank you very much for analysis and suggested - there is a lot to learn here. I just tried UNION queries and got following error: ERROR: FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT I made a table dump for anyone who wants to give it a spin https://app.box.com/s/464b12glmlk5o4gvzz7krc4c8s2fxlwr and here is the gist for the original commands https://gist.github.com/lessless/33215d0c147645db721e74e07498ac53 On Wed, Jun 28, 2017 at 8:10 PM, Brad DeJong wrote: > > > On 2017-06-28, Pavel Stehule wrote ... > > On 2017-06-28, Yevhenii Kurtov wrote ... > >> On 2017-06-28, Pavel Stehule wrote ... > >>> On 2017-06-28, Yevhenii Kurtov wrote ... > >>>> We have a query that is run almost each second and it's very > important to squeeze every other ms out of it. The query is: > >>>> ... > >>>> I added following index: CREATE INDEX ON campaign_jobs(id, status, > failed_at, started_at, priority DESC, times_failed); > >>>> ... > >>> There are few issues > >>> a) parametrized LIMIT > >>> b) complex predicate with lot of OR > >>> c) slow external sort > >>> > >>> b) signalize maybe some strange in design .. try to replace "OR" by > "UNION" query > >>> c) if you can and you have good enough memory .. try to increase > work_mem .. maybe 20MB > >>> > >>> if you change query to union queries, then you can use conditional > indexes > >>> > >>> create index(id) where status = 0; > >>> create index(failed_at) where status = 2; > >>> create index(started_at) where status = 1; > >> > >> Can you please give a tip how to rewrite the query with UNION clause? > > > > SELECT c0."id" FROM "campaign_jobs" AS c0 > > WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2)))) > > UNION SELECT c0."id" FROM "campaign_jobs" AS c0 > > WHERE ((c0."status" = $3) AND (c0."failed_at" > $4)) > > UNION SELECT c0."id" FROM "campaign_jobs" AS c0 > > WHERE ((c0."status" = $5) AND (c0."started_at" < $6)) > > ORDER BY c0."priority" DESC, c0."times_failed" > > LIMIT $7 > > FOR UPDATE SKIP LOCKED > > > Normally (at least for developers I've worked with), that kind of query > structure is used when the "status" values don't overlap and don't change > from query to query. Judging from Pavel's suggested conditional indexes > (i.e. "where status = "), he also thinks that is likely. > > Give the optimizer that information so that it can use it. Assuming $1 = 0 > and $3 = 2 and $5 = 1, substitute literals. Substitute literal for $7 in > limit. Push order by and limit to each branch of the union all (or does > Postgres figure that out automatically?) Replace union with union all (not > sure about Postgres, but allows other dbms to avoid sorting and merging > result sets to eliminate duplicates). (Use of UNION ALL assumes that "id" > is unique across rows as implied by only "id" being selected with FOR > UPDATE. If multiple rows can have the same "id", then use UNION to > eliminate the duplicates.) > > SELECT "id" FROM "campaign_jobs" WHERE "status" = 0 AND NOT "id" = ANY($1) > UNION ALL > SELECT "id" FROM "campaign_jobs" WHERE "status" = 2 AND "failed_at" > $2 > UNION ALL > SELECT "id" FROM "campaign_jobs" WHERE "status" = 1 AND "started_at" < $3 > ORDER BY "priority" DESC, "times_failed" > LIMIT 100 > FOR UPDATE SKIP LOCKED > > > Another thing that you could try is to push the ORDER BY and LIMIT to the > branches of the UNION (or does Postgres figure that out automatically?) and > use slightly different indexes. This may not make sense for all the > branches but one nice thing about UNION is that each branch can be tweaked > independently. Also, there are probably unmentioned functional dependencies > that you can use to reduce the index size and/or improve your match rate. > Example - if status = 1 means that the campaign_job has started but not > failed or completed, then you may know that started_at is set, but > failed_at and ended_at are null. The < comparison in and of itself implies > that only rows where "started_at" is not null will match the condition. > > SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE (((c0."status" = 0) AND > NOT (c0."id" = ANY($1)))) ORDER BY c0."priority" DESC, c0."times_failed" > LIMIT 100 > UNION ALL > SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 2) AND > (c0."failed_at" > $2)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT > 100 > UNION ALL > SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 1) AND > (c0."started_at" < $3)) ORDER BY c0."priority" DESC, c0."times_failed" > LIMIT 100 > ORDER BY c0."priority" DESC, c0."times_failed" > LIMIT 100 > FOR UPDATE SKIP LOCKED > > Including the "priority", "times_failed" and "id" columns in the indexes > along with "failed_at"/"started_at" allows the optimizer to do index only > scans. (May still have to do random I/O to the data page to determine tuple > version visibility but I don't think that can be eliminated.) > > create index ... ("priority" desc, "times_failed", "id") > where "status" = 0; > create index ... ("priority" desc, "times_failed", "id", "failed_at") > where "status" = 2 and "failed_at" is not null; > create index ... ("priority" desc, "times_failed", "id", "started_at") > where "status" = 1 and "started_at" is not null; -- and ended_at is null > and ... > > > I'm assuming that the optimizer knows that "where status = 1 and > started_at < $3" implies "and started_at is not null" and will consider the > conditional index. If not, then the "and started_at is not null" needs to > be explicit. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > --001a1147e7a0b282340553126b9d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello folks,

Thank you very much for an= alysis and suggested - there is a lot to learn here. I just =C2=A0tried UNI= ON queries and got following error:

ERROR: =C2=A0FOR UPDATE is not a= llowed with UNION/INTERSECT/EXCEPT

I made a ta= ble dump for anyone who wants to give it a spin=C2=A0https://app.box.com/s/464b12gl= mlk5o4gvzz7krc4c8s2fxlwr
and here is the gist for the origina= l commands=C2=A0https://gist.github.com/lessless/33215d0c147645db721e74e= 07498ac53

On Wed, Jun 28, 2017 at 8:10 PM, Brad DeJong <= Brad.Dejong@info= r.com> wrote:


On 2017-06-28, Pavel Stehule wrote ...
> On 2017-06-28, Yevhenii Kurtov wrote ...
>> On 2017-06-28, Pavel Stehule wrote ...
>>> On 2017-06-28, Yevhenii Kurtov wrote ...
>>>> We have a query that is run almost each s= econd and it's very important to squeeze every other ms out of it. The = query is:
>>>> ...
>>>> I added following index: CREATE INDEX ON = campaign_jobs(id, status, failed_at, started_at, priority DESC, times_faile= d);
>>>> ...
>>> There are few issues
>>> a) parametrized LIMIT
>>> b) complex predicate with lot of OR
>>> c) slow external sort
>>>
>>> b) signalize maybe some strange in design .. try to replace &q= uot;OR" by "UNION" query
>>> c) if you can and you have good enough memory .. try to increa= se work_mem .. maybe 20MB
>>>
>>> if you change query to union queries, then you can use conditi= onal indexes
>>>
>>> create index(id) where status =3D 0;
>>> create index(failed_at) where status =3D 2;
>>> create index(started_at) where status =3D 1;
>>
>> Can you please give a tip how to rewrite t= he query with UNION clause?
>
> SELECT c0."id" FROM "campaign_j= obs" AS c0
> WHERE (((c0."status" =3D $1) AND NOT (c0."id" =3D = ANY($2))))
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0 > WHERE ((c0."status" =3D $3) AND (c0."failed_at" &g= t; $4))
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0 > WHERE ((c0."status" =3D $5) AND (c0."started_at" &= lt; $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED


Normally (at least for developers I've worked with), that kind o= f query structure is used when the "status" values don't over= lap and don't change from query to query. Judging from Pavel's sugg= ested conditional indexes (i.e. "where status =3D <constant>&quo= t;), he also thinks that is likely.

Give the optimizer that information so that it can use it. Assuming $1 =3D = 0 and $3 =3D 2 and $5 =3D 1, substitute literals. Substitute literal for $7= in limit. Push order by and limit to each branch of the union all (or does= Postgres figure that out automatically?) Replace union with union all (not= sure about Postgres, but allows other dbms to avoid sorting and merging re= sult sets to eliminate duplicates). (Use of UNION ALL assumes that "id= " is unique across rows as implied by only "id" being select= ed with FOR UPDATE. If multiple rows can have the same "id", then= use UNION to eliminate the duplicates.)

SELECT "id" FROM "campaign_jobs" WHERE "status&quo= t; =3D 0 AND NOT "id" =3D ANY($1)
=C2=A0 UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status&quo= t; =3D 2 AND "failed_at" > $2
=C2=A0 UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status&quo= t; =3D 1 AND "started_at" < $3
ORDER BY "priority" DESC, "times_failed"
LIMIT 100
FOR UPDATE SKIP LOCKED


Another thing that you could try is to push the ORDER BY and LIMIT to the b= ranches of the UNION (or does Postgres figure that out automatically?) and = use slightly different indexes. This may not make sense for all the branche= s but one nice thing about UNION is that each branch can be tweaked indepen= dently. Also, there are probably unmentioned functional dependencies that y= ou can use to reduce the index size and/or improve your match rate. Example= - if status =3D 1 means that the campaign_job has started but not failed o= r completed, then you may know that started_at is set, but failed_at and en= ded_at are null. The < comparison in and of itself implies that only row= s where "started_at" is not null will match the condition.

SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE (((c0.&= quot;status" =3D 0) AND NOT (c0."id" =3D ANY($1)))) ORDER BY= c0."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0.&q= uot;status" =3D 2) AND (c0."failed_at" > $2)) ORDER BY c0= ."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0.&q= uot;status" =3D 1) AND (c0."started_at" < $3)) ORDER BY c= 0."priority" DESC, c0."times_failed" LIMIT 100
ORDER BY c0."priority" DESC, c0."times_fail= ed"
LIMIT 100
FOR UPDATE SKIP LOCKED

Including the "priority", "times_failed" and "id&q= uot; columns in the indexes along with "failed_at"/"started_= at" allows the optimizer to do index only scans. (May still have to do= random I/O to the data page to determine tuple version visibility but I do= n't think that can be eliminated.)

create index ... ("priority" desc, "times_failed", &quo= t;id")=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0where &qu= ot;status" =3D 0;
create index ... ("priority" desc, "times_failed", &quo= t;id", "failed_at")=C2=A0 where "status" =3D 2 and= "failed_at" is not null;
create index ... ("priority" desc, "times_failed", &quo= t;id", "started_at") where "status" =3D 1 and &quo= t;started_at" is not null; -- and ended_at is null and ...


I'm assuming that the optimizer knows that "where status =3D 1 and= started_at < $3" implies "and started_at is not null" an= d will consider the conditional index. If not, then the "and started_a= t is not null" needs to be explicit.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-pe= rformance

--001a1147e7a0b282340553126b9d--