Received: from makus.postgresql.org (makus.postgresql.org [98.129.198.125]) by mail.postgresql.org (Postfix) with ESMTP id 7228231F0C5 for ; Sun, 13 May 2012 21:09:34 -0300 (ADT) Received: from mail-bk0-f46.google.com ([209.85.214.46]) by makus.postgresql.org with esmtp (Exim 4.72) (envelope-from ) id 1STiqx-0005cM-Q4 for pgsql-docs@postgresql.org; Mon, 14 May 2012 00:09:34 +0000 Received: by bkcji2 with SMTP id ji2so3351260bkc.19 for ; Sun, 13 May 2012 17:09:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=Iy8zrUN3EW9e7mrZs/O9x023ksfyxCBQVWa4SeMiO0w=; b=NRpCzQqzG/Lg+vmhaxrCsw2FqPfM1SKy9tubvtSV9O03hJvfBU99t1swSVYD3bNui3 KBid8qji5dxH8qvulTclNb9GEAMB68NC9INfpuQMEDNkKAnf1ENIgmhJ98T3O9IU9BjJ En4KZxyoxfMVbUrtcBfs+QNJg4eBgBME6ZQ/SQX/erZj0IibQ8n1ABi53S5Legd56FwN 5xTO4Z31MgeLQVw2gWGH0fuMe/gqMFmpHa4qbhycLfx+vTQmgsi5f3Op/Dr2PDhGV5Pt 4KK2HMg4bVPJNZ9VnmhUOji5+OXPB9lJvF7GQe3q6AsWGAH2tx8PEgQC9sLEmpS6YgOM zUqQ== Received: by 10.205.128.8 with SMTP id hc8mr2316229bkc.17.1336954158108; Sun, 13 May 2012 17:09:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.38.12 with HTTP; Sun, 13 May 2012 17:08:57 -0700 (PDT) In-Reply-To: References: From: Marcelo Sena Date: Sun, 13 May 2012 21:08:57 -0300 Message-ID: Subject: Re: Observation on integer types documentation To: Dan McGee Cc: pgsql-docs@postgresql.org Content-Type: multipart/alternative; boundary=000e0ce00a74accee104bff3e5da X-Pg-Spam-Score: -2.6 (--) X-Archive-Number: 201205/13 X-Sequence-Number: 7338 --000e0ce00a74accee104bff3e5da Content-Type: text/plain; charset=UTF-8 I'm new here but your proposal makes sense to me. Are the query plans equal on both architectures? -- Marcelo Lacerda On Mon, Apr 30, 2012 at 5:59 PM, Dan McGee wrote: > Hey everyone, > > Reading the docs today, I came across this paragraph > ( > http://www.postgresql.org/docs/devel/static/datatype-numeric.html#DATATYPE-INT > ), > which goes back several major versions: > > > The type integer is the common choice, as it offers the best balance > between range, storage size, and performance. The smallint type is > generally only used if disk space is at a premium. The bigint type should > only be used if the range of the integer type is insufficient, because the > latter is definitely faster. > > A few thoughts on this. > 1) the use of the word "latter" isn't totally clear, or at least I had > to re-read it to realize former was 'bigint' and latter was 'integer'. > It might just be the style of writing. > 2) I'm less than convinced this note belongs in modern documentation, > and set out to test that theory. My full results are below, but the > summary is this: on a 64-bit system, there seems to be only a minimal > measurable performance difference (< 5%) and very little size > difference. In the case of the indexes, the size difference is zero. > This is not true for a 32-bit system (where it is 39% slower), but the > blanket statement doesn't hold true, which is why I'm writing all this > up. > > On a final note, the following paragraph also seems like it has > outlived its useful life: > > > On very minimal operating systems the bigint type might not function > correctly, because it relies on compiler support for eight-byte integers. > On such machines, bigint acts the same as integer, but still takes up eight > bytes of storage. (We are not aware of any modern platform where this is > the case.) > > Thanks! > -Dan > > > Table setup (only difference is type of 'id' column): > > Table "public.package_files" > Column | Type | Modifiers > --------------+------------------------+------------------------ > id | integer | not null > pkg_id | integer | not null > is_directory | boolean | not null default false > directory | character varying(255) | not null > filename | character varying(255) | > Indexes: > "package_files_pkey" PRIMARY KEY, btree (id) > "package_files_pkg_id" btree (pkg_id) CLUSTER > > > Table "public.package_files_int8" > Column | Type | Modifiers > --------------+------------------------+------------------------ > id | bigint | not null > pkg_id | integer | not null > is_directory | boolean | not null default false > directory | character varying(255) | not null > filename | character varying(255) | > Indexes: > "package_files_int8_pkey" PRIMARY KEY, btree (id) > "package_files_int8_pkg_id" btree (pkg_id) CLUSTER > > > # select count(*) from package_files; > 2621418 > # select count(*) from package_files_int8 ; > 2621418 > > > All runs below were done after issuing a few warm up queries, and both > tables went through a VACUUM/CLUSTER/ANALYZE sequence. > > 32-bit P4 2.4 GHz (single core). no enabled CPU frequency scaling, 1GB > total ram, shared_buffers 128MB, work_mem 4MB: > > relation | size > ----------------------------------+------------ > public.package_files_int8 | 239 MB > public.package_files | 229 MB > public.package_files_int8_pkey | 56 MB > public.package_files_int8_pkg_id | 45 MB > public.package_files_pkey | 45 MB > public.package_files_pkg_id | 45 MB > > archweb=> \timing on > Timing is on. > archweb=> \t > Showing only tuples. > archweb=> select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 516.558 ms > archweb=> select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 519.720 ms > archweb=> select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 533.330 ms > archweb=> select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 519.095 ms > archweb=> select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 520.253 ms > > archweb=> select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 731.194 ms > archweb=> select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 730.329 ms > archweb=> select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 724.646 ms > archweb=> select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 710.815 ms > archweb=> select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 724.519 ms > > > > > 64-bit Core2 Quad 2.66 GHz (four core), CPU freq scaling disabled > (performance governor used), 8GB total ram, shared_buffers 128MB, > work_mem 4MB: > > relation | size > ----------------------------------+------------ > public.package_files_int8 | 245 MB > public.package_files | 234 MB > public.package_files_int8_pkey | 56 MB > public.package_files_pkg_id | 56 MB > public.package_files_int8_pkg_id | 56 MB > public.package_files_pkey | 56 MB > > dmcgee=# select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 177.078 ms > dmcgee=# select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 176.109 ms > dmcgee=# select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 177.478 ms > dmcgee=# select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 176.639 ms > dmcgee=# select count(*) from (select * from package_files where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 176.453 ms > > dmcgee=# select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 185.768 ms > dmcgee=# select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 185.159 ms > dmcgee=# select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 184.407 ms > dmcgee=# select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 184.555 ms > dmcgee=# select count(*) from (select * from package_files_int8 where > pkg_id in (48024, 48025, 40343) order by id) a; > 119325 > Time: 184.290 ms > > > Overall results: > > i686 x86_64 > int4 int8 int4 int8 > 516.558 731.194 177.078 185.768 > 519.72 730.329 176.109 185.159 > 533.33 724.646 177.478 184.407 > 519.095 710.815 176.639 184.555 > 520.253 724.519 176.453 184.29 > > > > Average 521.7912 724.3006 176.7514 184.8358 > Stddev 6.6040841681 8.1530512264 0.5359499044 0.619288059 > Ratio 1.3881042839 1.0457388173 > > -- > Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-docs > --000e0ce00a74accee104bff3e5da Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I'm new here but your proposal makes sense to me. Are the query plans e= qual on both architectures?

--
Marcelo Lacerda


On Mon, Apr 30, 2012 at 5:59 PM, Dan McG= ee <dpmcgee@gmail.com> wrote:
Hey everyone,

Reading the docs today, I came across this paragraph
(http://www.postgresql.org/docs/devel/sta= tic/datatype-numeric.html#DATATYPE-INT),
which goes back several major versions:

> The type integer is the common choice, as it offers the best balance b= etween range, storage size, and performance. The smallint type is generally= only used if disk space is at a premium. The bigint type should only be us= ed if the range of the integer type is insufficient, because the latter is = definitely faster.

A few thoughts on this.
1) the use of the word "latter" isn't totally clear, or at le= ast I had
to re-read it to realize former was 'bigint' and latter was 'in= teger'.
It might just be the style of writing.
2) I'm less than convinced this note belongs in modern documentation, and set out to test that theory. My full results are below, but the
summary is this: on a 64-bit system, there seems to be only a minimal
measurable performance difference (< 5%) and very little size
difference. In the case of the indexes, the size difference is zero.
This is not true for a 32-bit system (where it is 39% slower), but the
blanket statement doesn't hold true, which is why I'm writing all t= his
up.

On a final note, the following paragraph also seems like it has
outlived its useful life:

> On very minimal operating systems the bigint type might not function c= orrectly, because it relies on compiler support for eight-byte integers. On= such machines, bigint acts the same as integer, but still takes up eight b= ytes of storage. (We are not aware of any modern platform where this is the= case.)

Thanks!
-Dan


Table setup (only difference is type of 'id' column):

Table "public.package_files"
=C2=A0 =C2=A0Column =C2=A0 =C2=A0| =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Type = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A0 =C2=A0 Modifiers
--------------+------------------------+------------------------
=C2=A0id =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | integer =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| not null
=C2=A0pkg_id =C2=A0 =C2=A0 =C2=A0 | integer =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0| not null
=C2=A0is_directory | boolean =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0| not null default false
=C2=A0directory =C2=A0 =C2=A0| character varying(255) | not null
=C2=A0filename =C2=A0 =C2=A0 | character varying(255) |
Indexes:
=C2=A0 =C2=A0"package_files_pkey" PRIMARY KEY, btree (id)
=C2=A0 =C2=A0"package_files_pkg_id" btree (pkg_id) CLUSTER


Table "public.package_files_int8"
=C2=A0 =C2=A0Column =C2=A0 =C2=A0| =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Type = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A0 =C2=A0 Modifiers
--------------+------------------------+------------------------
=C2=A0id =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | bigint =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | not null
=C2=A0pkg_id =C2=A0 =C2=A0 =C2=A0 | integer =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0| not null
=C2=A0is_directory | boolean =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0| not null default false
=C2=A0directory =C2=A0 =C2=A0| character varying(255) | not null
=C2=A0filename =C2=A0 =C2=A0 | character varying(255) |
Indexes:
=C2=A0 =C2=A0"package_files_int8_pkey" PRIMARY KEY, btree (id) =C2=A0 =C2=A0"package_files_int8_pkg_id" btree (pkg_id) CLUSTER<= br>

# select count(*) from package_files;
=C2=A02621418
# select count(*) from package_files_int8 ;
=C2=A02621418


All runs below were done after issuing a few warm up queries, and both
tables went through a VACUUM/CLUSTER/ANALYZE sequence.

32-bit P4 2.4 GHz (single core). no enabled CPU frequency scaling, 1GB
total ram, shared_buffers 128MB, work_mem 4MB:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 relation =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0size
----------------------------------+------------
=C2=A0public.package_files_int8 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 239 MB
=C2=A0public.package_files =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 229 = MB
=C2=A0public.package_files_int8_pkey =C2=A0 | 56 MB
=C2=A0public.package_files_int8_pkg_id | 45 MB
=C2=A0public.package_files_pkey =C2=A0 =C2=A0 =C2=A0 =C2=A0| 45 MB
=C2=A0public.package_files_pkg_id =C2=A0 =C2=A0 =C2=A0| 45 MB

archweb=3D> \timing on
Timing is on.
archweb=3D> \t
Showing only tuples.
archweb=3D> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 516.558 ms
archweb=3D> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 519.720 ms
archweb=3D> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 533.330 ms
archweb=3D> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 519.095 ms
archweb=3D> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 520.253 ms

archweb=3D> select count(*) from (select * from package_files_int8 where=
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 731.194 ms
archweb=3D> select count(*) from (select * from package_files_int8 where=
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 730.329 ms
archweb=3D> select count(*) from (select * from package_files_int8 where=
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 724.646 ms
archweb=3D> select count(*) from (select * from package_files_int8 where=
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 710.815 ms
archweb=3D> select count(*) from (select * from package_files_int8 where=
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 724.519 ms




64-bit Core2 Quad 2.66 GHz (four core), CPU freq scaling disabled
(performance governor used), 8GB total ram, shared_buffers 128MB,
work_mem 4MB:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 relation =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0size
----------------------------------+------------
=C2=A0public.package_files_int8 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 245 MB
=C2=A0public.package_files =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 234 = MB
=C2=A0public.package_files_int8_pkey =C2=A0 | 56 MB
=C2=A0public.package_files_pkg_id =C2=A0 =C2=A0 =C2=A0| 56 MB
=C2=A0public.package_files_int8_pkg_id | 56 MB
=C2=A0public.package_files_pkey =C2=A0 =C2=A0 =C2=A0 =C2=A0| 56 MB

dmcgee=3D# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 177.078 ms
dmcgee=3D# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 176.109 ms
dmcgee=3D# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 177.478 ms
dmcgee=3D# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 176.639 ms
dmcgee=3D# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 176.453 ms

dmcgee=3D# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 185.768 ms
dmcgee=3D# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 185.159 ms
dmcgee=3D# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 184.407 ms
dmcgee=3D# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 184.555 ms
dmcgee=3D# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
=C2=A0119325
Time: 184.290 ms


Overall results:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 i686 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0x86_64
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0int4 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0int8 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0in= t4 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0int8
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0516.558 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 731.194 =C2=A0 =C2=A0 =C2=A0 =C2=A0 177.078 =C2=A0 =C2=A0= =C2=A0 =C2=A0 185.768
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0519.72 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0730.329 =C2=A0 =C2=A0 =C2=A0 =C2=A0 176.109 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 185.159
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0533.33 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0724.646 =C2=A0 =C2=A0 =C2=A0 =C2=A0 177.478 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 184.407
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0519.095 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 710.815 =C2=A0 =C2=A0 =C2=A0 =C2=A0 176.639 =C2=A0 =C2=A0= =C2=A0 =C2=A0 184.555
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0520.253 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 724.519 =C2=A0 =C2=A0 =C2=A0 =C2=A0 176.453 =C2=A0 =C2=A0= =C2=A0 =C2=A0 184.29



Average =C2=A0 =C2=A0 =C2=A0 =C2=A0 521.7912 =C2=A0 =C2=A0 =C2=A0 =C2=A0724= .3006 =C2=A0 =C2=A0 =C2=A0 =C2=A0176.7514 =C2=A0 =C2=A0 =C2=A0 =C2=A0184.83= 58
Stddev =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A06.6040841681 =C2=A0 =C2=A08.153051= 2264 =C2=A0 =C2=A00.5359499044 =C2=A0 =C2=A00.619288059
Ratio =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.3881= 042839 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A01.0457388173

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

--000e0ce00a74accee104bff3e5da--