Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tjeia-00CCx3-MZ for pgsql-general@arkaria.postgresql.org; Sun, 16 Feb 2025 13:29:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tjeiZ-008fKw-DA for pgsql-general@arkaria.postgresql.org; Sun, 16 Feb 2025 13:29:55 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tjeiZ-008fKo-28 for pgsql-general@lists.postgresql.org; Sun, 16 Feb 2025 13:29:55 +0000 Received: from mail-pj1-x1031.google.com ([2607:f8b0:4864:20::1031]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tjeiW-001AMG-2l for pgsql-general@lists.postgresql.org; Sun, 16 Feb 2025 13:29:54 +0000 Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2f833af7a09so5050954a91.2 for ; Sun, 16 Feb 2025 05:29:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739712590; x=1740317390; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=TaXjfSA1H8iCSlMSYocXPQPJN5sJD+V8PVVm4ctvN+Y=; b=Btuw8gnbLMsFhbDxIwbGJjgfF3sScDPBl5YdtzPIr4/yjVmz//ZmduKL5ElPZYQVgT U7WjjLPG7C8sVdnrpG5bRb0S0YEOgzZlgWhJx3IEok0WjqdMsR8nmNEs+mU1e7ObSL06 DJ4to3VHMhRavAWulqZUHbuM68A6ucJTBW40cjmbjo/uSAcC8ueIKik34m+BSalgG6s1 rwlR6/xObkXCggwNtkFC4yw0J4tdvhWzIiw51UoomFfxbudgjrxrNnF+oZ3nK1mrn9W+ P7z5vJX+5uIk+ejMKkXLbEWP63JhX+t7WLaMvrwqT1bCHCEaprQQLQ1EFpvIrKGblw68 xWzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739712590; x=1740317390; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=TaXjfSA1H8iCSlMSYocXPQPJN5sJD+V8PVVm4ctvN+Y=; b=PZtOwEAVOzRy+yaL5GT31+se8Un3D3dkJQydeNzyk6lCiRS1vOwzpq3ylxwWCefI8N BSDUUlhVKMd2KAREEN3WpztEAG/OUVmYdgy6isbFfqBVRiXLlwkfZPn+MSJPaVy3VCZ4 /TK2GE1+Di0+WNgcjQe1o7cqJeDO1yRhmrnfxKRcCIyOo8Igg2SZj2LlJR6QJFn6Js6G Zgy3/FZwgpRzfqWNibRSfafngwj4IrkNBbNkAT812Tzyw1YjBSgAX0iwg6EjO8K3DO7S 6TqxQ0yiy+hCoRa58X2cjsLOeMg1H51Wpltll9a3DxlMalF3dd+Ch4hkdFr7EgFP/rk9 HU9w== X-Gm-Message-State: AOJu0YxN+VnmBidH/F451WUkgoSfpMOFJCnaO4TXAPdEi1LKQOpRUPdh E3kiT+cgQLWBvmXfa+BXVAHYpv5WLahtQlhr1MbjrfFokbU7pKViM6LbSDySHTvGFJw0N8qLSOU /MyN6n1ikCo/r+eHWnLEy/Huwzh7uJt6Y X-Gm-Gg: ASbGnctiMXWR+Qq4HnGYoAm2uRkjCuC6dyvPlDz47ojRAy8xBWLvmUgXFZro5t8RlsW nXlm/y2Q4den9I6G6VhvwS4D5EeyXUi2jeQPZfFptkFT1P+H5i+ry2Q1wyT1AcJnbo4fnEO24O+ gyRsNOBOXNQZ98ntfduj+C05t+Tp4= X-Google-Smtp-Source: AGHT+IFOT6qvfen9B7X/YRS8Xqx7itKko4o8Y6CUg8/QYU881wmi6sy6QjN8ysDWoENB4K4rXLSQDPZ6HTMWmsWYlW4= X-Received: by 2002:a17:90b:3ec6:b0:2ee:9d36:6821 with SMTP id 98e67ed59e1d1-2fc4104502dmr8525672a91.27.1739712590115; Sun, 16 Feb 2025 05:29:50 -0800 (PST) MIME-Version: 1.0 From: sud Date: Sun, 16 Feb 2025 18:59:38 +0530 X-Gm-Features: AWEUYZmmN_KGe0KW7x3ior5WqiI8ccp0mjWQ2uHD2sojcpa5pZ_Q1R6KFhTNkos Message-ID: Subject: Question on Alerts To: pgsql-general Content-Type: multipart/alternative; boundary="000000000000f1ff4a062e4266a5" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f1ff4a062e4266a5 Content-Type: text/plain; charset="UTF-8" Hi, We are asked to have key monitoring or alerting added to our postgres database. And I am thinking of metrics like blocked transactions, Max used transaction Ids, Max Active session threshold, Deadlock, Long running query, replica lag, buffer cache hit ratio, read/write IOPS or latency etc. I have below questions 1)Below are some which i tried writing, can you please let me know if these are accurate? 2)How should we be writing the alerting query for deadlock, max used transaction ids, read/write IOPS and latency? 3)Are there any docs available which have these sample sql queries on the pg_* table for these critical alerts which we can easily configure through any tool? 4)Any other alerts which we should be really having? *****Blocking sessions select distinct blocking_id from ( SELECT activity.pid, activity.usename, activity.query, blocking.pid AS blocking_id, blocking.query AS blocking_query FROM pg_stat_activity AS activity JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid)) ) a; **** long running beyond ~1 hours***** SELECT query, datname, pid, now() - state_change AS idle_for FROM pg_stat_activity WHERE state IN ('active', 'idle in transaction') AND pid <> pg_backend_pid() AND xact_start < now() - interval '1 hour' ORDER BY age(backend_xmin) DESC NULLS LAST; **** No of active sessions ****** SELECT count(*) AS active_connections FROM pg_stat_activity WHERE state = 'active'; ***replica lag**** SELECT client_addr, state, sent_location, write_location, flush_location, replay_location, pg_wal_lsn_diff(sent_location, replay_location) AS replica_lag FROM pg_stat_replication; ***buffer cache hit ratio**** SELECT (1 - (blks_read::float / (blks_hit + blks_read))) * 100 AS buffer_cache_hit_ratio FROM pg_stat_database; Regards Yudhi --000000000000f1ff4a062e4266a5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,
We are asked to have key monitoring or alerting add= ed to our postgres database. And I am thinking of metrics like blocked tran= sactions, Max used transaction Ids,=C2=A0 Max Active session threshold, Dea= dlock, Long running query, =C2=A0replica lag, buffer cache hit ratio, read/= write IOPS or latency etc. I have below questions

1)Below are some w= hich i tried writing, can you please let me know if these are accurate?
= 2)How should we be writing the alerting query for deadlock, max used transa= ction ids, read/write IOPS and latency?
3)Are there any docs available w= hich have these sample sql queries on the pg_* table for these critical ale= rts which we can easily configure through any tool?
4)Any other alerts = which we should be really having?

*****Blocking sessions
select d= istinct blocking_id from
=C2=A0 =C2=A0(
SELECT
=C2=A0 =C2=A0 activ= ity.pid,
=C2=A0 =C2=A0 activity.usename,
=C2=A0 =C2=A0 activity.query= ,
=C2=A0 =C2=A0 blocking.pid AS blocking_id,
=C2=A0 =C2=A0 blocking.q= uery AS blocking_query
FROM pg_stat_activity AS activity
JOIN pg_stat= _activity AS blocking
=C2=A0 =C2=A0 ON blocking.pid =3D ANY(pg_blocking_= pids(activity.pid))
=C2=A0 =C2=A0) a;

**** long running beyond ~1= hours*****
SELECT
=C2=A0 =C2=A0 query,
=C2=A0 =C2=A0 datname,
= =C2=A0 =C2=A0 pid,
=C2=A0 =C2=A0 now() - state_change AS idle_for
FRO= M =C2=A0 =C2=A0pg_stat_activity
WHERE =C2=A0 =C2=A0state IN ('active= ', 'idle in transaction')
=C2=A0 =C2=A0 AND pid <> pg_= backend_pid()
=C2=A0 =C2=A0 AND xact_start < now() - interval '1 = hour'
ORDER BY =C2=A0 =C2=A0age(backend_xmin) DESC NULLS LAST;
**** No of active sessions ******
SELECT count(*) AS active_connection= s
FROM pg_stat_activity
WHERE state =3D 'active';

***r= eplica lag****
SELECT client_addr, state, sent_location, write_location,= flush_location, replay_location,
=C2=A0 =C2=A0 =C2=A0 =C2=A0pg_wal_lsn_= diff(sent_location, replay_location) AS replica_lag
FROM pg_stat_replica= tion;

***buffer cache hit ratio****
SELECT
=C2=A0 =C2=A0(1 - = (blks_read::float / (blks_hit + blks_read))) * 100 AS buffer_cache_hit_rati= o
FROM pg_stat_database;

Regards
Yudhi
--000000000000f1ff4a062e4266a5--