Instead of using the bloom index, a parallel sequencial scan is used with this example

public inbox for [email protected]  
help / color / mirror / Atom feed

Instead of using the bloom index, a parallel sequencial scan is used with this example
4+ messages / 3 participants
[nested] [flat]

* Instead of using the bloom index, a parallel sequencial scan is used with this example
@ 2019-10-24 15:17  PG Doc comments form <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: PG Doc comments form @ 2019-10-24 15:17 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/bloom.html
Description:

postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5,
i6);
ERROR:  access method "bloom" does not exist
postgres=# create extension bloom;
CREATE EXTENSION
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5,
i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 =
123451;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Gather  (cost=1000.00..127220.00 rows=250 width=24) (actual
time=974.467..974.513 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Seq Scan on tbloom  (cost=0.00..126195.00 rows=104 width=24)
(actual time=895.448..895.448 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning Time: 16.006 ms
Execution Time: 974.635 ms
(8 rows)

postgres=# analyze tbloom;
ANALYZE
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 =
123451;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather  (cost=1000.00..127195.82 rows=1 width=24) (actual
time=803.314..803.436 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Seq Scan on tbloom  (cost=0.00..126195.72 rows=1 width=24)
(actual time=775.911..775.911 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning Time: 0.416 ms
Execution Time: 803.471 ms
(8 rows)


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Instead of using the bloom index, a parallel sequencial scan is used with this example
@ 2019-11-05 20:02  Bruce Momjian <[email protected]>
  parent: PG Doc comments form <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Bruce Momjian @ 2019-11-05 20:02 UTC (permalink / raw)
  To: [email protected]; [email protected]

On Thu, Oct 24, 2019 at 03:17:06PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/12/bloom.html
> Description:

I cleaned up your script and created an SQL file that can be piped into
psql, attached.  I see the bloomidx index being used without and with
the ANALYZE, output attached.  I tested this on git master, and back
through PG 10.  Would you please run these queries and post the output:

	SELECT version();
	
	SELECT name, current_setting(name), source
	FROM pg_settings
	WHERE source NOT IN ('default', 'override');

My guess is that you have some non-default setting that is causing
bloomidx not to be used.

-- 
  Bruce Momjian  <[email protected]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

SELECT 10000000
CREATE INDEX
CREATE INDEX
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=178436.06..179392.83 rows=250 width=24) (actual time=98.114..98.114 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2375
   Heap Blocks: exact=2317
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178436.00 rows=250 width=0) (actual time=86.796..86.797 rows=2375 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning time: 0.326 ms
 Execution time: 98.156 ms
(8 rows)

ANALYZE
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=178434.71..178438.73 rows=1 width=24) (actual time=104.982..104.982 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2375
   Heap Blocks: exact=2317
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178434.71 rows=1 width=0) (actual time=93.657..93.657 rows=2375 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning time: 0.215 ms
 Execution time: 105.018 ms
(8 rows)



Attachments:

  [application/x-sql] bloom.sql (617B, 2-bloom.sql)
  download

  [text/plain] out.txt (1.5K, 3-out.txt)
  download | inline:
SELECT 10000000
CREATE INDEX
CREATE INDEX
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=178436.06..179392.83 rows=250 width=24) (actual time=98.114..98.114 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2375
   Heap Blocks: exact=2317
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178436.00 rows=250 width=0) (actual time=86.796..86.797 rows=2375 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning time: 0.326 ms
 Execution time: 98.156 ms
(8 rows)

ANALYZE
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=178434.71..178438.73 rows=1 width=24) (actual time=104.982..104.982 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2375
   Heap Blocks: exact=2317
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178434.71 rows=1 width=0) (actual time=93.657..93.657 rows=2375 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning time: 0.215 ms
 Execution time: 105.018 ms
(8 rows)


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Instead of using the bloom index, a parallel sequencial scan is used with this example
@ 2019-11-05 20:28  Daniel Westermann (DWE) <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Daniel Westermann (DWE) @ 2019-11-05 20:28 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; [email protected] <[email protected]>


>>On Thu, Oct 24, 2019 at 03:17:06PM +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>>
>>Page: https://www.postgresql.org/docs/12/bloom.html
>> Description:

>I cleaned up your script and created an SQL file that can be piped into
>psql, attached.  I see the bloomidx index being used without and with
>the ANALYZE, output attached.  I tested this on git master, and back
>through PG 10.  Would you please run these queries and post the output:

Thanks for having a look, Bruce. You test case is not exactly the same as in the documentation. For you "temporary table" test case I indeed see the bloom index getting used. Doing the same with a normal table results in a parallel seq scan.

postgres=# SELECT version();
                                                          version
----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 12.0  on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)

postgres=# SELECT name, current_setting(name), source
postgres-#         FROM pg_settings
postgres-#         WHERE source NOT IN ('default', 'override');
            name            |  current_setting   |
----------------------------+--------------------+----------------------
 application_name           | psql               | client
 client_encoding            | UTF8               | client
 DateStyle                  | ISO, MDY           | configuration file
 default_text_search_config | pg_catalog.english | configuration file
 dynamic_shared_memory_type | posix              | configuration file
 lc_messages                | en_US.utf8         | configuration file
 lc_monetary                | de_CH.UTF-8        | configuration file
 lc_numeric                 | de_CH.UTF-8        | configuration file
 lc_time                    | en_US.UTF-8        | configuration file
 listen_addresses           | *                  | configuration file
 log_timezone               | Europe/Zurich      | configuration file
 max_connections            | 100                | configuration file
 max_stack_depth            | 2MB                | environment variable
 max_wal_size               | 1GB                | configuration file
 min_wal_size               | 80MB               | configuration file
 port                       | 5432               | environment variable
 shared_buffers             | 128MB              | configuration file
 TimeZone                   | Europe/Zurich      | configuration file

Regards
Daniel


Attachments:

  [application/sql] bloom2.sql (625B, 3-bloom2.sql)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Instead of using the bloom index, a parallel sequencial scan is used with this example
@ 2019-11-05 23:18  Bruce Momjian <[email protected]>
  parent: Daniel Westermann (DWE) <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Bruce Momjian @ 2019-11-05 23:18 UTC (permalink / raw)
  To: Daniel Westermann (DWE) <[email protected]>; +Cc: [email protected] <[email protected]>

On Tue, Nov  5, 2019 at 08:28:57PM +0000, Daniel Westermann (DWE) wrote:
> 
> >>On Thu, Oct 24, 2019 at 03:17:06PM +0000, PG Doc comments form wrote:
> >> The following documentation comment has been logged on the website:
> >>
> >>Page: https://www.postgresql.org/docs/12/bloom.html
> >> Description:
> 
> >I cleaned up your script and created an SQL file that can be piped into
> >psql, attached.  I see the bloomidx index being used without and with
> >the ANALYZE, output attached.  I tested this on git master, and back
> >through PG 10.  Would you please run these queries and post the output:
> 
> Thanks for having a look, Bruce. You test case is not exactly the same as in
> the documentation. For you "temporary table" test case I indeed see the bloom
> index getting used. Doing the same with a normal table results in a parallel
> seq scan.

Ah, I see your point about the temporary tables.  I have retested with
permanent tables, and see the same output you do.  By reducing the
generate_series by 100x, I am able to get the desired EXPLAIN plans by
turning on/off various optimizer settings.  I am attaching my test
script and its output.

I suggestion is what I reduce generate_series by 100x and update the
EXPLAIN plans in the docs.  The docs are unclear on how these different
plans are generated, though I am not excited about adding the exact
optimizer settings to generate each plan.

-- 
  Bruce Momjian  <[email protected]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

DROP TABLE
SELECT 100000
CREATE EXTENSION
CREATE INDEX
CREATE INDEX
ANALYZE
SET
SET
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Seq Scan on tbloom  (cost=0.00..2137.00 rows=1 width=24) (actual time=12.228..12.228 rows=0 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 100000
 Planning Time: 0.264 ms
 Execution Time: 12.249 ms
(5 rows)

SET
RESET
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using btreeidx on tbloom  (cost=0.42..2992.43 rows=1 width=24) (actual time=6.462..6.463 rows=0 loops=1)
   Index Cond: ((i2 = 898732) AND (i5 = 123451))
   Heap Fetches: 0
 Planning Time: 0.102 ms
 Execution Time: 6.484 ms
(5 rows)

RESET
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=1792.00..1796.02 rows=1 width=24) (actual time=0.438..0.438 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 31
   Heap Blocks: exact=29
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..1792.00 rows=1 width=0) (actual time=0.394..0.394 rows=31 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning Time: 0.070 ms
 Execution Time: 0.461 ms
(8 rows)



Attachments:

  [application/x-sql] bloom.sql (886B, 2-bloom.sql)
  download

  [text/plain] out.txt (1.7K, 3-out.txt)
  download | inline:
DROP TABLE
SELECT 100000
CREATE EXTENSION
CREATE INDEX
CREATE INDEX
ANALYZE
SET
SET
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Seq Scan on tbloom  (cost=0.00..2137.00 rows=1 width=24) (actual time=12.228..12.228 rows=0 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 100000
 Planning Time: 0.264 ms
 Execution Time: 12.249 ms
(5 rows)

SET
RESET
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using btreeidx on tbloom  (cost=0.42..2992.43 rows=1 width=24) (actual time=6.462..6.463 rows=0 loops=1)
   Index Cond: ((i2 = 898732) AND (i5 = 123451))
   Heap Fetches: 0
 Planning Time: 0.102 ms
 Execution Time: 6.484 ms
(5 rows)

RESET
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=1792.00..1796.02 rows=1 width=24) (actual time=0.438..0.438 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 31
   Heap Blocks: exact=29
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..1792.00 rows=1 width=0) (actual time=0.394..0.394 rows=31 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning Time: 0.070 ms
 Execution Time: 0.461 ms
(8 rows)


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2019-11-05 23:18 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24 15:17 Instead of using the bloom index, a parallel sequencial scan is used with this example PG Doc comments form <[email protected]>
2019-11-05 20:02 ` Bruce Momjian <[email protected]>
2019-11-05 20:28   ` Daniel Westermann (DWE) <[email protected]>
2019-11-05 23:18     ` Bruce Momjian <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox