Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1asQj6-0002mE-56 for pgsql-performance@arkaria.postgresql.org; Tue, 19 Apr 2016 08:09:40 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1asQj5-0004dy-ON for pgsql-performance@arkaria.postgresql.org; Tue, 19 Apr 2016 08:09:39 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1asQhP-0002lJ-VJ for pgsql-performance@postgresql.org; Tue, 19 Apr 2016 08:07:56 +0000 Received: from mail-vk0-x231.google.com ([2607:f8b0:400c:c05::231]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1asQhK-0003oJ-SN for pgsql-performance@postgresql.org; Tue, 19 Apr 2016 08:07:55 +0000 Received: by mail-vk0-x231.google.com with SMTP id e185so10071640vkb.1 for ; Tue, 19 Apr 2016 01:07:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bigpurpledot.com; s=google; h=mime-version:date:message-id:subject:from:to; bh=aC9M5WtQasvhCdIq5WKGQcUBaNNAJMKc1QDWB0+kzfA=; b=C0NkWiPzXzqO0Psm6lS4IWuY+Jk2PcXesdU/EgFpNSbcY0lDMZxFtFdFP56xXLGctz 6turrbsBPUEQMhfhEB3Pq4O9g69fxUOrTaei7eTRqOTi6qfaOzDnXoUcRxg3L3jRS39E 21wAtllvbBk4AXUct6+ZKCUFDhEbjbXEmXS8k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=aC9M5WtQasvhCdIq5WKGQcUBaNNAJMKc1QDWB0+kzfA=; b=jyBv9dYrxGVe2R8OiVyYc7wbW49YSqcUu59eDZEtGBsdqgvY3xTkigCQh+t/PH6FBz QOHijHD/Ue/DPJqnn13m7Ku9IH0v6DAUESm/olAg2c+lWc5G7EN8iMVtO7xaDm/zMPEr PgiTEXF9I4l5GrkK3pogyXnWMkp1UDOMNqsGdk5TmIPRuR9/I/O08Qo9TSaMR4w9nB4X ja1JISTjOuAlju5w5Quo0/7UPH8/zmEQf0x4WYljsT5t89N7AdecwdVll+5DmTBv6OFo wvH5BZRKXrUpV2yd8rroxayMao9fsfJPtxaHaIn+koQKQzgtPv6l/qHDGTbil04PPPBV mJNw== X-Gm-Message-State: AOPr4FV2LtLsglPDufqv0HDGI4+Oq6Pxw6Up1sN9CeOd/BAVNS2kmDK8XxMO43b55x0v9Ak7uKjsMBOgkaulBQ== MIME-Version: 1.0 X-Received: by 10.31.161.130 with SMTP id k124mr835234vke.13.1461053268804; Tue, 19 Apr 2016 01:07:48 -0700 (PDT) Received: by 10.176.6.226 with HTTP; Tue, 19 Apr 2016 01:07:48 -0700 (PDT) X-Originating-IP: [2600:8802:3200:1b55:a40b:c4ee:c54d:852a] Date: Tue, 19 Apr 2016 01:07:48 -0700 Message-ID: Subject: Hash join seq scan slow From: Aldo Sarmiento To: pgsql-performance@postgresql.org Content-Type: multipart/alternative; boundary=001a1143f8a4160a6a0530d1f97a X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a1143f8a4160a6a0530d1f97a Content-Type: text/plain; charset=UTF-8 Hello, I'm assuming this topic has probably been bludgeoned to a pulp, but my google-fu can't seem to find a solution. I have two relatively largish tables that I'm trying to join that result in a slow query. Hardware: 2014 iMac w/ SSD & i5 processor Tables: contacts: 1.14 million rows permissions: 2.49 million rows contacts have many permissions Goal: get first page of contacts (limit 40) a user has access to, sorted by creation date. For simplicity's sake, I've taken out some of the complexity of how I retrieve the permissions & just got all permissions with id less than 2,100,000 to simulate access of ~151k contacts. stage4=# EXPLAIN ANALYZE WITH perms AS ( stage4(# SELECT DISTINCT(contact_id) from permissions where id < 2100000 stage4(# ) SELECT stage4-# contacts.id, stage4-# contacts.first_name stage4-# FROM contacts stage4-# INNER JOIN perms ON perms.contact_id = contacts.id stage4-# ORDER BY contacts.updated_at desc NULLS LAST LIMIT 40 OFFSET 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=226777.45..226777.55 rows=40 width=20) (actual time=1556.133..1556.143 rows=40 loops=1) CTE perms -> HashAggregate (cost=34891.61..35404.25 rows=51264 width=4) (actual time=107.107..151.455 rows=151920 loops=1) Group Key: permissions.contact_id -> Bitmap Heap Scan on permissions (cost=3561.50..34416.43 rows=190074 width=4) (actual time=14.839..50.445 rows=192372 loops=1) Recheck Cond: (id < 2100000) Heap Blocks: exact=2435 -> Bitmap Index Scan on permissions_pkey (cost=0.00..3513.98 rows=190074 width=0) (actual time=14.496..14.496 rows=192372 loops=1) Index Cond: (id < 2100000) -> Sort (cost=191373.19..191501.35 rows=51264 width=20) (actual time=1556.132..1556.137 rows=40 loops=1) Sort Key: contacts.updated_at DESC NULLS LAST Sort Method: top-N heapsort Memory: 27kB -> Hash Join (cost=180911.60..189752.76 rows=51264 width=20) (actual time=1124.969..1532.269 rows=124152 loops=1) Hash Cond: (perms.contact_id = contacts.id) -> CTE Scan on perms (cost=0.00..1025.28 rows=51264 width=4) (actual time=107.110..203.330 rows=151920 loops=1) -> Hash (cost=159891.71..159891.71 rows=1144871 width=20) (actual time=1017.354..1017.354 rows=1145174 loops=1) Buckets: 65536 Batches: 32 Memory Usage: 2521kB -> Seq Scan on contacts (cost=0.00..159891.71 rows=1144871 width=20) (actual time=0.035..684.361 rows=1145174 loops=1) Planning time: 0.222 ms Execution time: 1561.693 ms (20 rows) It is to my understanding that the query requires the entire 150k matched contacts to be joined in order for the Sort to run its operation. I don't see a way around this, however, I can't also see how this can be a unique case where it's acceptable to have a 1.5 second query time? There has to be lots of other companies out there that manage much more data that needs to be sorted for the presentation layer. Thanks in advance, -Aldo --001a1143f8a4160a6a0530d1f97a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello,

I'm assuming this topic has = probably been bludgeoned to a pulp, but my google-fu can't seem to find= a solution.

<= div>I have two relatively largish tables that I'm trying to join that r= esult in a slow query.

Hardware:=C2=A0
<= br>
2014 iMac w/ SSD & i5 processor

= Tables:
contacts: 1.14 million rows
permissions: 2.49 m= illion rows

contacts have many permissions

Goal: get first page of contacts (limit 40) a user has ac= cess to, sorted by creation date.

For simplicity&#= 39;s sake, I've taken out some of the complexity of how I retrieve the = permissions & just got all permissions with id less than 2,100,000 to s= imulate access of ~151k contacts.

stage4=3D# = EXPLAIN ANALYZE WITH perms AS (
stage4(# =C2=A0 SELECT DISTINCT(c= ontact_id) from permissions where id < 2100000
stage4(# ) SELE= CT
stage4-# =C2=A0 =C2=A0 contacts= .id,
stage4-# =C2=A0 =C2=A0 contacts.first_name
sta= ge4-# =C2=A0 FROM contacts
stage4-# =C2=A0 INNER JOIN perms ON = =C2=A0perms.contact_id =3D contacts.id
stage4-# =C2=A0 ORDER BY contacts.updated_at desc NULLS LAST LIMIT= 40 OFFSET 0;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN
-----= ---------------------------------------------------------------------------= ----------------------------------------------------------------------
=C2=A0Limit =C2=A0(cost=3D226777.45..226777.55 rows=3D40 width=3D20) = (actual time=3D1556.133..1556.143 rows=3D40 loops=3D1)
=C2=A0 =C2= =A0CTE perms
=C2=A0 =C2=A0 =C2=A0-> =C2=A0HashAggregate =C2=A0= (cost=3D34891.61..35404.25 rows=3D51264 width=3D4) (actual time=3D107.107..= 151.455 rows=3D151920 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0Group Key: permissions.contact_id
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0-> =C2=A0Bitmap Heap Scan on permissions =C2=A0(cost= =3D3561.50..34416.43 rows=3D190074 width=3D4) (actual time=3D14.839..50.445= rows=3D192372 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0Recheck Cond: (id < 2100000)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Heap Blocks: exact= =3D2435
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0-> =C2=A0Bitmap Index Scan on permissions_pkey =C2=A0(cost=3D0.00.= .3513.98 rows=3D190074 width=3D0) (actual time=3D14.496..14.496 rows=3D1923= 72 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Index Cond: (id < 2100000)
= =C2=A0 =C2=A0-> =C2=A0Sort =C2=A0(cost=3D191373.19..191501.35 rows=3D512= 64 width=3D20) (actual time=3D1556.132..1556.137 rows=3D40 loops=3D1)
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Key: contacts.updated_at DESC N= ULLS LAST
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Method: top-N he= apsort =C2=A0Memory: 27kB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0->= =C2=A0Hash Join =C2=A0(cost=3D180911.60..189752.76 rows=3D51264 width=3D20= ) (actual time=3D1124.969..1532.269 rows=3D124152 loops=3D1)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Hash Cond: (perms.conta= ct_id =3D contacts.id)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0CTE Scan on per= ms =C2=A0(cost=3D0.00..1025.28 rows=3D51264 width=3D4) (actual time=3D107.1= 10..203.330 rows=3D151920 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Hash =C2=A0(cost=3D159891.71..159891= .71 rows=3D1144871 width=3D20) (actual time=3D1017.354..1017.354 rows=3D114= 5174 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0Buckets: 65536 =C2=A0Batches: 32 =C2=A0Memory Us= age: 2521kB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Seq Scan on contacts =C2=A0(cost=3D0.00= ..159891.71 rows=3D1144871 width=3D20) (actual time=3D0.035..684.361 rows= =3D1145174 loops=3D1)
=C2=A0Planning time: 0.222 ms
=C2= =A0Execution time: 1561.693 ms
(20 rows)

It is to my understanding that the query requires the entire 150k ma= tched contacts to be joined in order for the Sort to run its operation. I d= on't see a way around this, however, I can't also see how this can = be a unique case where it's acceptable to have a 1.5 second query time?= There has to be lots of other companies out there that manage much more da= ta that needs to be sorted for the presentation layer.

=
Thanks in advance,

-Aldo
--001a1143f8a4160a6a0530d1f97a--