Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rqxWi-00Et4O-5u for pgsql-general@arkaria.postgresql.org; Sun, 31 Mar 2024 15:55:20 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rqxWg-00CrHv-Mt for pgsql-general@arkaria.postgresql.org; Sun, 31 Mar 2024 15:55:18 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rqxWg-00CrHn-3m for pgsql-general@lists.postgresql.org; Sun, 31 Mar 2024 15:55:18 +0000 Received: from mail-oo1-xc32.google.com ([2607:f8b0:4864:20::c32]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rqxWb-007SEv-Kj for pgsql-general@lists.postgresql.org; Sun, 31 Mar 2024 15:55:16 +0000 Received: by mail-oo1-xc32.google.com with SMTP id 006d021491bc7-5a467dae51dso1356335eaf.1 for ; Sun, 31 Mar 2024 08:55:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711900514; x=1712505314; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=rpW837gNrJz2eazZMJyPcWsV4DXHsmGYKjvlS0LrFlw=; b=C+FTm6VcecivGJZsFNosR71R0UKdmNzH/uPylpCilHWPA8wIsN2d3HQtJxThCli3Jx Cl00mcNAJ5+0l2tnCGQ9EFtEIhObZ1NrcsN4FunaBZ6/UEB5EEvxQaj0AI3jGBrsTxMr +J2l1FAs07UUrDQBCbqtZ0YVEXkYFz2++TRoyuz/S1Pb8VhZx8SzjhzBxlwiQ5dgUWr8 WD+SFXst8LiHRETDVEN//ZJPM3V9fp0vHROfZ3MZAPYNObU6aUOT+7tMlDhw7YdNxh42 v4Ayz3FRDHNFY9JIQHNiAfbZV/Tne/kzAYWQ0IlURVIOvXKPH0Ga3Xd9HoiwhXMm+mPf q9mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711900514; x=1712505314; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rpW837gNrJz2eazZMJyPcWsV4DXHsmGYKjvlS0LrFlw=; b=OWWOXGPEXYLWFiL9KAVJlz0Ip1ftPrzil7bi3aDfROE5z/zr23HTRg2rhrGarHLhCJ CH00JmcG53UZUrdOSEMwHGb4/aqekeI7vL9ZwMeNbkGZvv8FdyJkbz1abGFyXdfIUlYi u4VTVd42I1Q4eoxgEPG94q0c8/NPNdMvuwqqKhIQxOJOt/JObPokp8f63FteYW+7LnWy Iscml8qoFVVJVfYRz/OlbIZtiyW+8t4lQxWeYGzkbQDOklGzt56NOGAwqqDr5704G6g+ oafP3LBR8z0M013mBeFVnSgAuhOLcP5ZdMCfnL5ZhtZLAORNfzSZ/FsVnbLuA5CVQiZd Ro2g== X-Gm-Message-State: AOJu0YybNezTfgKpznUEriQDh3h7ha2KtbC8OVKGtT2JXHJuZgn1Y8d0 OMZTOuc29BJMpvEnncsfLI9nfmvjdjtCJnyHcOUxxuNYj19vZYv7znL0D8xALWw5u0wacRm0I73 9UMqdFL64VZMdiFEtCl4FW4KYzxw= X-Google-Smtp-Source: AGHT+IEYyFhRsR/xgWLWkFuAK9YTuCzXNeaNrkjIi3mgKt9UVV5+wIwl427ln9MyzjQ3zCP0XdyovgRvly2mlEuvTnU= X-Received: by 2002:a05:6820:1515:b0:5a7:838b:4bc5 with SMTP id ay21-20020a056820151500b005a7838b4bc5mr7218140oob.1.1711900513991; Sun, 31 Mar 2024 08:55:13 -0700 (PDT) MIME-Version: 1.0 References: <98df75338710b3cb63817f3db532cdf6@postgresql.org> In-Reply-To: From: hector vass Date: Sun, 31 Mar 2024 16:55:02 +0100 Message-ID: Subject: Re: Please recommend postgresql.conf improvements for osm2pgsql loading Europe To: Alexander Farber Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000000703150614f6e6b6" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000703150614f6e6b6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, 30 Mar 2024, 10:04 Alexander Farber, wrote: > Thank you, Justin - > > On Sat, Mar 30, 2024 at 4:33=E2=80=AFAM Justin Clift > wrote: > >> On 2024-03-30 05:53, Alexander Farber wrote: >> > I use the following postgresql.conf in my Dockerfile >> > ( the full version at https://stackoverflow.com/a/78243530/165071 ), >> > when loading a 28 GByte large europe-latest.osm.pbf >> >> Not specific conf file improvements, but for an initial data load >> have you done things like turning off fsync(), deferring index >> creating until after the data load finishes, and that kind of thing? >> > > I will try the following commands in my Dockerfile then > and later report back on any improvements: > > RUN set -eux && \ > pg_ctl init && \ > echo "shared_buffers =3D 1GB" >> $PGDATA/postgresql.co= nf > && \ > echo "work_mem =3D 50MB" >> $PGDATA/postgresql.co= nf > && \ > echo "maintenance_work_mem =3D 10GB" >> $PGDATA/postgresql.co= nf > && \ > echo "autovacuum_work_mem =3D 2GB" >> $PGDATA/postgresql.co= nf > && \ > echo "wal_level =3D minimal" >> $PGDATA/postgresql.co= nf > && \ > echo "checkpoint_timeout =3D 60min" >> $PGDATA/postgresql.co= nf > && \ > echo "max_wal_size =3D 10GB" >> $PGDATA/postgresql.co= nf > && \ > echo "checkpoint_completion_target =3D 0.9" >> $PGDATA/postgresql.co= nf > && \ > echo "max_wal_senders =3D 0" >> $PGDATA/postgresql.co= nf > && \ > echo "random_page_cost =3D 1.0" >> $PGDATA/postgresql.co= nf > && \ > echo "password_encryption =3D scram-sha-256" >> $PGDATA/postgresql.co= nf > && \ > echo "fsync =3D off" >> > $PGDATA/postgresql.conf && \ > pg_ctl start && \ > createuser --username=3Dpostgres $PGUSER && \ > createdb --username=3Dpostgres --encoding=3DUTF8 --owner=3D$PGUSER > $PGDATABASE && \ > psql --username=3Dpostgres $PGDATABASE --command=3D"ALTER USER $PGUSE= R > WITH PASSWORD '$PGPASSWORD';" && \ > psql --username=3Dpostgres $PGDATABASE --command=3D'CREATE EXTENSION = IF > NOT EXISTS postgis;' && \ > psql --username=3Dpostgres $PGDATABASE --command=3D'CREATE EXTENSION = IF > NOT EXISTS hstore;' && \ > osm2pgsql --username=3D$PGUSER --database=3D$PGDATABASE --create > --cache=3D60000 --hstore --latlong /data/map.osm.pbf && \ > rm -f /data/map.osm.pbf && \ > pg_ctl stop && \ > echo "fsync =3D on" >> > $PGDATA/postgresql.conf && \ > echo '# TYPE DATABASE USER ADDRESS METHOD' > > $PGDATA/pg_hba.conf && \ > echo "local all postgres peer" >> > $PGDATA/pg_hba.conf && \ > echo "local $PGDATABASE $PGUSER scram-sha-256" >> > $PGDATA/pg_hba.conf && \ > echo "host $PGDATABASE $PGUSER 0.0.0.0/0 scram-sha-256" >> > $PGDATA/pg_hba.conf > > The later fsync =3D on will override the former, right? > > Best regards > Alex > > > 2hrs sounds reasonable for Europe, it's a big place in terms of osm data and osm2pgsql is doing processing to convert to geometry objects prior to doing anything on the Postgresql side. If you examine the --log--sql output for a small test country you can see what it does in terms of the postgresql. osm2pgsql gives options to trim the output to only what you need (so if you don't want waterways, traffic features, parking places or places of worship etc.. why load them) Hopefully you have found the excellent geofabrik https://download.geofabrik.de/ source for osm data. Rather than load this data afresh each update cycle you would be better off simply loading the changes so the .osc files or ... osm osmosis will create the equivalent of a diff file for you Looks like you are already using osm2psql's recommended postgresql.config settings, I'd be surprised if this was way off. Getting as close to tin rather than virtual machines and containers will also help, lots of io going on here. If you are only interested in the geography you might consider geofabrik's shapefile available for many countries, they have already done some of the work for you. Apologies if you are already a long way down this route & just asking about the final stage of loading the osm2pgsql output to Postgresql but however well you do here I would only expect small marginal gains. --0000000000000703150614f6e6b6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Sat, 30 Mar 2024, 10:04 Alexander = Farber, <alexander.farber@gmail.com> wrote:
Thank you, Justin -

On Sat, Mar 30, 2024 at 4:33=E2=80=AFAM Justin Cli= ft <justin@postgresql.org> wrote:
On 2024-03-30 05:53, Alexander Farber wrot= e:
> I use the following postgresql.conf in my Dockerfile
> ( the full version at https://stackoverfl= ow.com/a/78243530/165071 ),
> when loading a 28 GByte large europe-latest.osm.pbf

Not specific conf file improvements, but for an initial data load
have you done things like turning off fsync(), deferring index
creating until after the data load finishes, and that kind of thing?

I will try the following commands in my Dockerfile then= =C2=A0
and later report back on any improvements:

RUN= set -eux && \
=C2=A0 =C2=A0 pg_ctl init && \
=C2=A0 =C2=A0 echo "shared_buffers =3D 1GB"=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> $PGDATA/postgresql.conf = && \
=C2=A0 =C2=A0 echo "work_mem =3D 50MB"=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&g= t;> $PGDATA/postgresql.conf && \
=C2=A0 =C2=A0 echo &q= uot;maintenance_work_mem =3D 10GB"=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&g= t;> $PGDATA/postgresql.conf && \
=C2=A0 =C2=A0 echo &q= uot;autovacuum_work_mem =3D 2GB"=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0>> $PGDATA/postgresql.conf && \
=C2=A0 =C2=A0 ec= ho "wal_level =3D minimal"=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0>> $PGDATA/postgresql.conf && \
=
=C2=A0 =C2=A0 echo "checkpoint_timeout =3D 60min"=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 >> $PGDATA/postgresql.conf && \
=C2=A0 =C2=A0 echo "max_wal_size =3D 10GB"=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>> $PGDATA/postgresql.co= nf && \
=C2=A0 =C2=A0 echo "checkpoint_completion_ta= rget =3D 0.9"=C2=A0 >> $PGDATA/postgresql.conf && \
=C2=A0 =C2=A0 echo "max_wal_senders =3D 0"=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>> $PGDATA/postgresql.co= nf && \
=C2=A0 =C2=A0 echo "random_page_cost =3D 1.0= "=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> $PGDATA/pos= tgresql.conf && \
=C2=A0 =C2=A0 echo "password_encry= ption =3D scram-sha-256" >> $PGDATA/postgresql.conf && \=
=C2=A0 =C2=A0 echo "fsync =3D off"=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 >> $PGDATA/postgresql.conf && \
=C2=A0 =C2=A0 p= g_ctl start && \=C2=A0
=C2=A0 =C2=A0 createuser --usernam= e=3Dpostgres $PGUSER && \
=C2=A0 =C2=A0 createdb --userna= me=3Dpostgres --encoding=3DUTF8 --owner=3D$PGUSER $PGDATABASE && \<= /div>
=C2=A0 =C2=A0 psql --username=3Dpostgres $PGDATABASE --command=3D= "ALTER USER $PGUSER WITH PASSWORD '$PGPASSWORD';" &&a= mp; \
=C2=A0 =C2=A0 psql --username=3Dpostgres $PGDATABASE --comm= and=3D'CREATE EXTENSION IF NOT EXISTS postgis;' && \
<= div>=C2=A0 =C2=A0 psql --username=3Dpostgres $PGDATABASE --command=3D'C= REATE EXTENSION IF NOT EXISTS hstore;' && \
=C2=A0 = =C2=A0 osm2pgsql --username=3D$PGUSER --database=3D$PGDATABASE --create --c= ache=3D60000 --hstore --latlong /data/map.osm.pbf && \
= =C2=A0 =C2=A0 rm -f /data/map.osm.pbf && \
=C2=A0 =C2=A0 = pg_ctl stop && \
=C2=A0 =C2=A0 echo "fsync =3D on&qu= ot;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> $PGDATA/postgresql.conf && \
=C2=A0 =C2=A0 echo '# TYPE DATABASE USER ADDRESS METHOD'= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > $PGDATA/pg_hba= .conf && \
=C2=A0 =C2=A0 echo "local all postgres pe= er"=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>> $PGDATA/pg_hba.conf && \
=C2=A0 =C2=A0 echo "local $PGDATABASE $PGUSER=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0scram-sha-256" >> $PGDATA/pg_hba.conf &a= mp;& \
=C2=A0 =C2=A0 echo "host=C2=A0 $PGDATABASE $PGUSE= R 0.0.0.0= /0 scram-sha-256" >> $PGDATA/pg_hba.conf

The later fs= ync =3D on will override the former, right?

= Best regards
Alex

=C2=A0

2hrs sounds reasonable for Europe, it's a big place in = terms of osm data and osm2pgsql is doing processing to convert to geometry = objects prior to doing anything on the Postgresql side.
If you examine the --log--sql output for a small test country you can s= ee what it does in terms of the postgresql.
osm2pgsq= l gives options to trim the output to only what you need (so if you don'= ;t=C2=A0want waterways, traffic features, parking places or places of worsh= ip etc.. why load them)
Hopefully you have found the= excellent geofabrik=C2=A0https://download.geofabrik.de/ source for osm data.=C2=A0= =C2=A0
Rather than load this data afresh each update cycle you wo= uld be better off simply loading the changes so the .osc files or ... osm o= smosis will create the equivalent of a diff file for you
Looks like you are already using osm2psql's recommended postgresql= .config settings, I'd be surprised if this was way off.=C2=A0 Getting a= s close to tin rather than virtual machines and containers will also help, = lots of io going on here.=C2=A0=C2=A0
If you are onl= y interested in the geography=C2=A0you might consider geofabrik's shape= file available for many countries, they have already done some of the work = for you.

Apologies if you are already a long way down this route & just a= sking about the final stage of loading the osm2pgsql output to Postgresql b= ut however well you do here I would only expect small marginal gains.
=


--0000000000000703150614f6e6b6--