Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dy4P6-0007k3-Lh for pgsql-performance@arkaria.postgresql.org; Fri, 29 Sep 2017 23:09:08 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1dy4P5-0000O9-I0 for pgsql-performance@arkaria.postgresql.org; Fri, 29 Sep 2017 23:09:07 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dy4NI-0004Qo-6T for pgsql-performance@postgresql.org; Fri, 29 Sep 2017 23:07:16 +0000 Received: from mail-oi0-x22b.google.com ([2607:f8b0:4003:c06::22b]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1dy4NA-0002H8-Gs for pgsql-performance@postgresql.org; Fri, 29 Sep 2017 23:07:14 +0000 Received: by mail-oi0-x22b.google.com with SMTP id l74so1749807oih.1 for ; Fri, 29 Sep 2017 16:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=Op2klFC26n2mXww2aV8XVI0pmqErZ7Ldo9HGi0PUCA8=; b=GnOq0JO86phx2UWZtX0LUBY0y/xGlwCr3w8edkQG3xzFcOptCwEqJWkBvWvDFoef5n eHQx6Au3+aV7G3tJ+VyBMMnX3wSyFhn0X2ieMYTqnydX+FsbAt/wlO+f67tZzCGJfEn8 WikMoo9FM40Yz/jHfyvjAORtZ3sKrRyQIX1pCQhqemHjFW/a2+tPUeXN3U5sBntNC3jq 2l9wWEm9bS/pcZqnceDI8wDMSoBywC6n2Jx2EBXfeD8BWbtYoL/FRpsbG4SYGXtHTlff 8spR3ZiQV6Z2sspkKMWMMflM7wexNHMjtrtjAem4eTYECWjZM/0PIfgLsWjXzqCvJm7w CDLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Op2klFC26n2mXww2aV8XVI0pmqErZ7Ldo9HGi0PUCA8=; b=mb0nGw2LEe9jNn6synOdMiWEF9DYEFLKvzDKru0/SYAHxXv0zDM5zTGISfBPi3+0r3 r76PvDfkQM2jz3mV/ni5dvkNA7siFA+Z2WMAPOS4T/pO9xUonsjucztfHkbZSMSpRFZ0 QkCHnrGZFgOLExeYshIEfR/8xkBZEhnEdJVV3epMpbJnS59F/aaxcYmsufmLrUOmj4xP t4cKjRscr/TqkaT4G+JEkN+vzNa/Hxq1O2UtaBESSL/oXfBQSxgdRK5Tn0GZKutDUcVb jcKeMnFIuKiJE2KWwDn9HYSSUG18QtmNdifbpYYrcbnZY25sB3c7wy0itSTgXPaGhN0j 1WGA== X-Gm-Message-State: AMCzsaWXhwt8O0Hv11jfvoTwBn3RGyRsJtecUATIugh6JbScpq4VWd+O i8VS+uz+p1NjHFhClDEMr8yOYoZ0tB83ASCMOLy+ X-Google-Smtp-Source: AOwi7QDtj9cDVtyBYi392N5Xy1XNCUrV2VU/zun7iGL1y9Kko1zA2KRQQ5fdGAedEaYnaEUxitkpqAFYFzGGro2D7Ms= X-Received: by 10.157.66.172 with SMTP id r41mr3438140ote.306.1506726427326; Fri, 29 Sep 2017 16:07:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.38.161 with HTTP; Fri, 29 Sep 2017 16:07:06 -0700 (PDT) From: milist ujang Date: Sat, 30 Sep 2017 06:07:06 +0700 Message-ID: Subject: BDR, wal sender, high system cpu, mutex_lock_common To: pgsql-performance@postgresql.org Content-Type: multipart/alternative; boundary="94eb2c1c1a42797df3055a5c158c" List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --94eb2c1c1a42797df3055a5c158c Content-Type: text/plain; charset="UTF-8" Hi all, I've an environment 9.4 + bdr: PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit kernel version: 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1 x86_64 GNU/Linux This is consolidation databases, in this machine there are around 250+ wal sender processes. top output revealed high system cpu: %Cpu(s): 1.4 us, 49.7 sy, 0.0 ni, 48.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st profiling cpu with perf: perf top -e cpu-clock Events: 142K cpu-clock 82.37% [kernel] [k] __mutex_lock_common.isra.5 4.49% [kernel] [k] do_raw_spin_lock 2.23% [kernel] [k] mutex_lock 2.16% [kernel] [k] mutex_unlock 2.12% [kernel] [k] arch_local_irq_restore 1.73% postgres [.] ValidXLogRecord 0.87% [kernel] [k] __mutex_unlock_slowpath 0.78% [kernel] [k] arch_local_irq_enable 0.63% [kernel] [k] sys_recvfrom finally get which processes (wal senders) that are using mutexes: perf top -e task-clock -p 55382 Events: 697 task-clock 88.08% [kernel] [k] __mutex_lock_common.isra.5 3.27% [kernel] [k] do_raw_spin_lock 2.34% [kernel] [k] arch_local_irq_restore 2.10% postgres [.] ValidXLogRecord 1.87% [kernel] [k] mutex_unlock 1.87% [kernel] [k] mutex_lock 0.47% [kernel] [k] sys_recvfrom I think bdr is only reading wal file (current state is we behind current wal lsn), so why reading wal file needs mutex? I wonder, is there kernel version has better handling mutexes? -- regards ujang jaenudin | DBA Consultant (Freelancer) http://ora62.wordpress.com http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab --94eb2c1c1a42797df3055a5c158c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi all,


I've an envi= ronment 9.4 + bdr:=C2=A0
PostgreSQL 9.4.4 on x86_64-unknown-linux= -gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

kernel version:
3.2.0-4-amd64 #1 SMP Debian 3.2.65-1 x86_= 64 GNU/Linux

This is consolidation databases, = in this machine there are around 250+ wal sender processes.

<= /div>
top output revealed high system cpu:
%Cpu(s): =C2= =A01.4 us, 49.7 sy, =C2=A00.0 ni, 48.8 id, =C2=A00.0 wa, =C2=A00.0 hi, =C2= =A00.0 si, =C2=A00.0 st

profiling cpu with pe= rf:

perf top -e cpu-clock

Events: 142K cpu-clock
=C2=A082.37% =C2=A0[kernel] =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0[k] __mutex_lock_common.isra.5
=C2=A0= 4.49% =C2=A0[kernel] =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[k] do_raw_s= pin_lock
=C2=A0 2.23% =C2=A0[kernel] =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0[k] mutex_lock
=C2=A0 2.16% =C2=A0[kernel] =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[k] mutex_unlock
=C2=A0 2.12% = =C2=A0[kernel] =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[k] arch_local_irq_= restore
=C2=A0 1.73% =C2=A0postgres =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0[.] ValidXLogRecord
=C2=A0 0.87% =C2=A0[kernel] =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[k] __mutex_unlock_slowpath
=C2=A0 0.78% =C2=A0[kernel] =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[k] a= rch_local_irq_enable
=C2=A0 0.63% =C2=A0[kernel] =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0[k] sys_recvfrom

=
finally get which processes (wal senders)=C2=A0that are usin= g mutexes:

perf top -e task-clock -p 55382

Events: 697 =C2=A0task-clock
=C2=A088.08% =C2= =A0[kernel] =C2=A0[k] __mutex_lock_common.isra.5
=C2=A0 3.27% =C2= =A0[kernel] =C2=A0[k] do_raw_spin_lock
=C2=A0 2.34% =C2=A0[kernel= ] =C2=A0[k] arch_local_irq_restore
=C2=A0 2.10% =C2=A0postgres = =C2=A0[.] ValidXLogRecord
=C2=A0 1.87% =C2=A0[kernel] =C2=A0[k] m= utex_unlock
=C2=A0 1.87% =C2=A0[kernel] =C2=A0[k] mutex_lock
=C2=A0 0.47% =C2=A0[kernel] =C2=A0[k] sys_recvfrom
I think bdr is only reading wal file (current state is we behi= nd current wal lsn),
so why reading wal file needs mutex?

I wonder, is there kernel version has better handling mut= exes?=C2=A0


--
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpre= ss.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab
--94eb2c1c1a42797df3055a5c158c--