Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dhH4C-0003ul-O0 for pgsql-performance@arkaria.postgresql.org; Mon, 14 Aug 2017 15:14:08 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1dhH4C-0006H5-Am for pgsql-performance@arkaria.postgresql.org; Mon, 14 Aug 2017 15:14:08 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dhH2C-0001Vd-AN for pgsql-performance@postgresql.org; Mon, 14 Aug 2017 15:12:04 +0000 Received: from mail-wr0-x243.google.com ([2a00:1450:400c:c0c::243]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1dhH28-0001PM-J2 for pgsql-performance@postgresql.org; Mon, 14 Aug 2017 15:12:03 +0000 Received: by mail-wr0-x243.google.com with SMTP id y67so7191500wrb.3 for ; Mon, 14 Aug 2017 08:12:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=l9S0NvYYuPtuqpKAT/OjiAnZ8XzA9P574UOcStVBwrU=; b=lbi7VhQThD1Y8Vmn8Goz+1Eea13PTvSJt7XwgeEYAaM2whOTphFBaD0ISFHoYMiSQh 1/6vjFRDb+2iOZzfiT7ed0BE/lYlTzug0FQAH/rzIDYu9ZgCeRMXIZ8kePAVWswK8Uiy NBSUZKf3sHAXS80xouC1BUpu3GnlJyInLqPQJ+HhEuFVtUwUkfMmBmc+fLsotanhknWl WQQgb+7Oy9zwofi9vMuufPUSbb9pKRD27bCwIoJZSLNwWox2bdI7KblUaNsWsxINF9sk VNliOApDgpB71JxTi1x0yXjAFlTuki/fEqNfC4AKABOglexB1X5VkdX3pdpU+cFzhMxO kLqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=l9S0NvYYuPtuqpKAT/OjiAnZ8XzA9P574UOcStVBwrU=; b=J3Q29Uwrf5yjGNmcYkIkQc2YsldJhXP0jd9ZiwZwelckdQJvtqcHJyxbK1SWepRgQQ dXnbZ7UP6eYJ/yY4Xtyok/E3utbq8znbjiBXZbTbdBgvEEhVKhyZXK94XyT5icYeyWzy 7yYdXNT594CVmNY1nYLy2DHnnhhESIksy2iBSFYgN1i2RClt8eHcC5b1f2xNoYE71eGg ySkc3CGhhLsUNXcPie7CwNNfLPJIZrLGLXKg2EmUMtpuKZP5UE3f4N6Nt8ZBiHYG/4Ox Yp9FZ4S193LjioXWlMkZQ194mzqI/0wkRw5rNwaBk2vUKqMKKXB26sPemm7bWLkHHlhW aRcQ== X-Gm-Message-State: AHYfb5gaxwXFPjwyXAcocHR5IGgI5IaJ8mDB8excgvdWlVmjE+zmLy40 HOLDdZxaJmcHlg== X-Received: by 10.223.170.12 with SMTP id p12mr17459330wrd.238.1502723518761; Mon, 14 Aug 2017 08:11:58 -0700 (PDT) Received: from [192.168.8.100] (204.red-79-149-77.dynamicip.rima-tde.net. [79.149.77.204]) by smtp.gmail.com with ESMTPSA id d8sm6230048wmh.48.2017.08.14.08.11.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 14 Aug 2017 08:11:58 -0700 (PDT) From: Daniel Blanch Bataller Message-Id: <8F28ABE4-047E-45CF-B502-480103D8EA97@gmail.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_88D7305F-56B6-4EED-92DA-DA624F92C4B7" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: performance problem on big tables Date: Mon, 14 Aug 2017 17:11:55 +0200 In-Reply-To: Cc: pgsql-performance@postgresql.org To: Mariel Cherkassky References: X-Mailer: Apple Mail (2.3273) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --Apple-Mail=_88D7305F-56B6-4EED-92DA-DA624F92C4B7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi. In general using COPY is *much faster* than anything else. You can even = split the data load and run it in parallel, start with as many jobs as = processors you have. Same with indexes, run them in parallel. With = parallel I mean various psql running at the same time. Tuning postgres will help too, but not as much as using COPY. https://www.postgresql.org/docs/9.6/static/performance-tips.html = https://www.postgresql.org/docs/9.6/static/populate.html = = https://www.postgresql.org/docs/9.6/static/populate.html#POPULATE-COPY-FRO= M Regards, Daniel Blanch.. > El 14 ago 2017, a las 15:24, Mariel Cherkassky = escribi=C3=B3: >=20 > I have performance issues with two big tables. Those tables are = located on an oracle remote database. I'm running the quert : insert = into local_postgresql_table select * from oracle_remote_table. >=20 > The first table has 45M records and its size is 23G. The import of the = data from the oracle remote database is taking 1 hour and 38 minutes. = After that I create 13 regular indexes on the table and it takes 10 = minutes per table ->2 hours and 10 minutes in total. >=20 > The second table has 29M records and its size is 26G. The import of = the data from the oracle remote database is taking 2 hours and 30 = minutes. The creation of the indexes takes 1 hours and 30 minutes (some = are indexes on one column and the creation takes 5 min and some are = indexes on multiples column and it takes 11 min. >=20 > Those operation are very problematic for me and I'm searching for a = solution to improve the performance. The parameters I assigned : >=20 > min_parallel_relation_size =3D 200MB > max_parallel_workers_per_gather =3D 5=20 > max_worker_processes =3D 8=20 > effective_cache_size =3D 2500MB > work_mem =3D 16MB > maintenance_work_mem =3D 1500MB > shared_buffers =3D 2000MB > RAM : 5G > CPU CORES : 8 > -I tried running select count(*) from table in oracle and in = postgresql the running time is almost equal. >=20 > -Before importing the data I drop the indexes and the constraints. >=20 > -I tried to copy a 23G file from the oracle server to the postgresql = server and it took me 12 minutes. >=20 > Please advice how can I continue ? How can I improve something in this = operation ? >=20 --Apple-Mail=_88D7305F-56B6-4EED-92DA-DA624F92C4B7 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Hi.

In general using COPY is *much faster* = than anything else. You can even split the data load and run it in = parallel, start with as many jobs as processors you have. Same with = indexes, run them in parallel. With parallel I mean various psql running = at the same time.

Tuning postgres will help too, but not as much as using = COPY.


https://www.postgresql.org/docs/9.6/static/populate.html


Regards,

Daniel Blanch..



El 14 ago 2017, a las 15:24, = Mariel Cherkassky <mariel.cherkassky@gmail.com> escribi=C3=B3:

I have performance issues with two big tables. = Those tables are located on an oracle remote database. I'm running the = quert : insert into = local_postgresql_table select * from oracle_remote_table.

The first table has 45M records and its size is = 23G. The import of the data from the oracle remote database is taking 1 = hour and 38 minutes. After that I create 13 regular indexes on the table = and it takes 10 minutes per table ->2 hours and 10 minutes in = total.

The second table has 29M records and its size = is 26G. The import of the data from the oracle remote database is taking = 2 hours and 30 minutes. The creation of the indexes takes 1 hours and 30 = minutes (some are indexes on one column and the creation takes 5 min and = some are indexes on multiples column and it takes 11 min.

Those operation are very problematic for me and = I'm searching for a solution to improve the performance. The parameters = I assigned :

min_parallel_relation_size =3D 200MB
max_parallel_workers_per_gather =3D 5
max_worker_processes =3D 8
effective_cache_size =3D 2500MB
work_mem =3D 16MB
maintenance_work_mem =3D 1500MB
shared_buffers =3D 2000MB
RAM : 5G
CPU CORES : 8

-I tried running = select count(*) from table in oracle and in postgresql the running time = is almost equal.

-Before importing the = data I drop the indexes and the constraints.

-I tried to copy a 23G = file from the oracle server to the postgresql server and it took me 12 = minutes.

Please advice how can I continue ? How can I = improve something in this operation ?


= --Apple-Mail=_88D7305F-56B6-4EED-92DA-DA624F92C4B7--