public inbox for [email protected]
help / color / mirror / Atom feedFrom: Daniel Blanch Bataller <[email protected]>
To: Mariel Cherkassky <[email protected]>
Cc: [email protected]
Subject: Re: performance problem on big tables
Date: Mon, 14 Aug 2017 17:11:55 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CA+t6e1m06=RkReCDxdyjwAMwVwuKxakV8TKmQ2cnMLFYbOjMCA@mail.gmail.com>
References: <CA+t6e1m06=RkReCDxdyjwAMwVwuKxakV8TKmQ2cnMLFYbOjMCA@mail.gmail.com>
List-Unsubscribe: <mailto:[email protected]?body=unsub%20pgsql-performance>
Hi.
In general using COPY is *much faster* than anything else. You can even split the data load and run it in parallel, start with as many jobs as processors you have. Same with indexes, run them in parallel. With parallel I mean various psql running at the same time.
Tuning postgres will help too, but not as much as using COPY.
https://www.postgresql.org/docs/9.6/static/performance-tips.html <https://www.postgresql.org/docs/9.6/static/performance-tips.html;
https://www.postgresql.org/docs/9.6/static/populate.html <https://www.postgresql.org/docs/9.6/static/populate.html;
https://www.postgresql.org/docs/9.6/static/populate.html#POPULATE-COPY-FROM
Regards,
Daniel Blanch..
> El 14 ago 2017, a las 15:24, Mariel Cherkassky <[email protected]> escribió:
>
> I have performance issues with two big tables. Those tables are located on an oracle remote database. I'm running the quert : insert into local_postgresql_table select * from oracle_remote_table.
>
> The first table has 45M records and its size is 23G. The import of the data from the oracle remote database is taking 1 hour and 38 minutes. After that I create 13 regular indexes on the table and it takes 10 minutes per table ->2 hours and 10 minutes in total.
>
> The second table has 29M records and its size is 26G. The import of the data from the oracle remote database is taking 2 hours and 30 minutes. The creation of the indexes takes 1 hours and 30 minutes (some are indexes on one column and the creation takes 5 min and some are indexes on multiples column and it takes 11 min.
>
> Those operation are very problematic for me and I'm searching for a solution to improve the performance. The parameters I assigned :
>
> min_parallel_relation_size = 200MB
> max_parallel_workers_per_gather = 5
> max_worker_processes = 8
> effective_cache_size = 2500MB
> work_mem = 16MB
> maintenance_work_mem = 1500MB
> shared_buffers = 2000MB
> RAM : 5G
> CPU CORES : 8
> -I tried running select count(*) from table in oracle and in postgresql the running time is almost equal.
>
> -Before importing the data I drop the indexes and the constraints.
>
> -I tried to copy a 23G file from the oracle server to the postgresql server and it took me 12 minutes.
>
> Please advice how can I continue ? How can I improve something in this operation ?
>
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: performance problem on big tables
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox