Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dhYkL-0000LS-MV for pgsql-performance@arkaria.postgresql.org; Tue, 15 Aug 2017 10:06:49 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1dhYkJ-0003f5-Q0 for pgsql-performance@arkaria.postgresql.org; Tue, 15 Aug 2017 10:06:47 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dhYkI-0003cg-LT for pgsql-performance@postgresql.org; Tue, 15 Aug 2017 10:06:46 +0000 Received: from mail-wm0-x22d.google.com ([2a00:1450:400c:c09::22d]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1dhYkF-0006TU-Oe for pgsql-performance@postgresql.org; Tue, 15 Aug 2017 10:06:45 +0000 Received: by mail-wm0-x22d.google.com with SMTP id t138so22621344wmt.1 for ; Tue, 15 Aug 2017 03:06:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=HHE1K8NhuuSjmcF49D8keKqc7ShG3g/16l0V0ExXlIk=; b=D1uVtYNYYYPGuM4m6vMKFmsbgSHSMVZ7OYAmtZBeqzgqvdTNlHgIpOw+6AXYCJc4gu XrjY1xOWfzlyaTUfjlhOJ6gXwjRVAnXWY/IaG+wIcgfHXve70MSnjHAdZxSbik3CZtS6 xdEj5qghC+vkLENeWDh871Kg3KP4AxptVjLllf6ZdWjAUGOipft5jf/n39d4NPy/mHEZ h8H8SO46ygM/MLRs4SujAey1hqZqJPF4roLKGnYz7mEG5qV4oPa+6b01h/QzUM/JQTaP K1mC/i3RixHj5UWq8IJQnOQetWVl93useIMarnc+lhK9sRLv+IjdSC8jTkuGVSgAkvsA 28SQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=HHE1K8NhuuSjmcF49D8keKqc7ShG3g/16l0V0ExXlIk=; b=VFvH+UI+DXTi3brr+9czkPnH/Aoirw0GNNMhWit79FJRmw26j4TG7tJb+2QLhIfIC/ DBI1TajH1JuwxhKSgRVFo3/d6D74mRpcpSnPtjK2bPkizM04yWEQIzAf1K2Jd0Bp45xr p4EhEoSggtYYYF6heHRAgDxpg1NGsxX6zdHzoKlMXnyIHrdMp7XiIXEPX/HkzP28fmQO Bm97BCprcA9ZYO2aqO9wFOKQG6h473J3YbYDxO4CxrkCke3d4ngzGxVKrloSEXNTgmZc 6fjtFdHzb2HyCG5Mzz68PH5Yr9+3DC32x6x4zM3pGlVjweXpm/wskBha1w2B1J6wSFoe tccA== X-Gm-Message-State: AHYfb5ikzwyvTNfBMwzWROOd/QhMV0KtcGTIdPZvXvy3S2ss/1W2QDqr 7KalfhWtyul+34er2pd4r9gkfxjcvI1mewU= X-Received: by 10.28.113.90 with SMTP id m87mr956165wmc.59.1502791601112; Tue, 15 Aug 2017 03:06:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.184.85 with HTTP; Tue, 15 Aug 2017 03:06:40 -0700 (PDT) In-Reply-To: References: From: Mariel Cherkassky Date: Tue, 15 Aug 2017 13:06:40 +0300 Message-ID: Subject: Re: performance problem on big tables To: pgsql-performance@postgresql.org Content-Type: multipart/alternative; boundary="001a114620c48e4abf0556c7ef5f" List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a114620c48e4abf0556c7ef5f Content-Type: text/plain; charset="UTF-8" Hi, So I I run the cheks that jeff mentioned : \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1 hour and 35 minutes \copy local_postresql_table from /tmp/tmp with binary - Didnt run because the remote oracle database is currently under maintenance work. So I decided to follow MichaelDBA tips and I set the ram on my machine to 16G and I configured the effective_cache memory to 14G,tshared_buffer to be 2G and maintenance_work_mem to 4G. I started running the copy checks again and for now it coppied 5G in 10 minutes. I have some questions : 1)When I run insert into local_postresql_table select * from remote_oracle_table I insert that data as bulk to the local table or row by row ? If the answer as bulk than why copy is a better option for this case ? 2)The copy from dump into the postgresql database should take less time than the copy to dump ? 3)What do you think about the new memory parameters that I cofigured ? 2017-08-14 16:24 GMT+03:00 Mariel Cherkassky : > I have performance issues with two big tables. Those tables are located on > an oracle remote database. I'm running the quert : insert into > local_postgresql_table select * from oracle_remote_table. > > The first table has 45M records and its size is 23G. The import of the > data from the oracle remote database is taking 1 hour and 38 minutes. After > that I create 13 regular indexes on the table and it takes 10 minutes per > table ->2 hours and 10 minutes in total. > > The second table has 29M records and its size is 26G. The import of the > data from the oracle remote database is taking 2 hours and 30 minutes. The > creation of the indexes takes 1 hours and 30 minutes (some are indexes on > one column and the creation takes 5 min and some are indexes on multiples > column and it takes 11 min. > > Those operation are very problematic for me and I'm searching for a > solution to improve the performance. The parameters I assigned : > > min_parallel_relation_size = 200MB > max_parallel_workers_per_gather = 5 > max_worker_processes = 8 > effective_cache_size = 2500MB > work_mem = 16MB > maintenance_work_mem = 1500MB > shared_buffers = 2000MB > RAM : 5G > CPU CORES : 8 > > *-I tried running select count(*) from table in oracle and in postgresql > the running time is almost equal.* > > *-Before importing the data I drop the indexes and the constraints.* > > *-I tried to copy a 23G file from the oracle server to the postgresql > server and it took me 12 minutes.* > > Please advice how can I continue ? How can I improve something in this > operation ? > --001a114620c48e4abf0556c7ef5f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,
So I I run the = cheks that jeff mentioned : =C2=A0
\copy (select * from oracle_remote_table) to /tmp/tmp with bi= nary - 1 hour and 35 minutes
\copy loc= al_postresql_table from /tmp/tmp with binary - Didnt run because the remote= oracle database is currently under maintenance work.

So I decided to f= ollow MichaelDBA tips and I set the ram on my machine to 16G and I configur= ed the effective_cache memory to 14G,tshared_buffer to be 2G and maintenanc= e_work_mem to 4G.

I started running the copy checks again and for now i= t coppied 5G in 10 minutes. I have some questions :=C2=A0
1)When I run insert into local_postresql_table select= * from remote_oracle_table I insert that data as bulk to the local table o= r row by row ?=C2=A0 If the answer as bulk than why copy is a better option= for this case ?=C2=A0
2)The copy from= dump into the postgresql database should take less time than the copy to d= ump ?
3)What do you think about the ne= w memory parameters that I cofigured ?





2017-08-14 16:24 = GMT+03:00 Mariel Cherkassky <mariel.cherkassky@gmail.com>= :

I have performance issues with two big tables. Those tables a= re located on an oracle remote database. I'm running the quert :=C2=A0<= code style=3D"margin:0px;padding:1px 5px;border:0px;font-style:inherit;font= -variant:inherit;font-weight:inherit;font-stretch:inherit;font-size:13px;li= ne-height:inherit;font-family:Consolas,Menlo,Monaco,"Lucida Console&qu= ot;,"Liberation Mono","DejaVu Sans Mono","Bitstrea= m Vera Sans Mono","Courier New",monospace,sans-serif;vertica= l-align:baseline;background-color:rgb(239,240,241);white-space:pre-wrap">in= sert into local_postgresql_table select * from oracle_remote_table.<= /p>

The first table has 45= M records and its size is 23G. The import of the data from the oracle remot= e database is taking 1 hour and 38 minutes. After that I create 13 regular = indexes on the table and it takes 10 minutes per table ->2 hours and 10 = minutes in total.

The = second table has 29M records and its size is 26G. The import of the data fr= om the oracle remote database is taking 2 hours and 30 minutes. The creatio= n of the indexes takes 1 hours and 30 minutes (some are indexes on one colu= mn and the creation takes 5 min and some are indexes on multiples column an= d it takes 11 min.

Tho= se operation are very problematic for me and I'm searching for a soluti= on to improve the performance. The parameters I assigned :

min_parallel_relation_= size =3D 200MB
max_parallel_workers_per_gather <= span class=3D"gmail-m_4968167882610058030gmail-pun" style=3D"font-family:in= herit;font-style:inherit;font-variant:inherit;font-weight:inherit;white-spa= ce:inherit;font-size:inherit;margin:0px;padding:0px;border:0px;font-stretch= :inherit;line-height:inherit;vertical-align:baseline;color:rgb(48,51,54)">= =3D 5
max_worker_processes =3D 8
effective_cache_size =3D 2500MB
work_mem =3D 16MB
maintenance_work_mem =3D 1500MB
shared_buffers =3D 2000MB
RAM : 5G
CPU CORES : 8

-I tried running select count(*) from table in oracle and in po= stgresql the running time is almost equal.

= -Before importing the data I drop the indexes and the constraints.=

-I tried to copy a 23G file from the oracle server = to the postgresql server and it took me 12 minutes.

Please advice how can I continue ? H= ow can I improve something in this operation ?

=
--001a114620c48e4abf0556c7ef5f--