Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qHeBK-0004LG-91 for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Jul 2023 05:39:02 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1qHeBH-0003jo-On for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Jul 2023 05:38:59 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qHeBH-0003jf-Bl for pgsql-hackers@lists.postgresql.org; Fri, 07 Jul 2023 05:38:59 +0000 Received: from mail-io1-xd33.google.com ([2607:f8b0:4864:20::d33]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qHeB9-002U0x-Do for pgsql-hackers@postgresql.org; Fri, 07 Jul 2023 05:38:58 +0000 Received: by mail-io1-xd33.google.com with SMTP id ca18e2360f4ac-78625caa702so54066539f.1 for ; Thu, 06 Jul 2023 22:38:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=singh-im.20221208.gappssmtp.com; s=20221208; t=1688708330; x=1691300330; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=T5bC/odqNv9c51dFwQo/oDJ/eX3GAoI7lzaKJmrUjyQ=; b=p1NNvnZ4Go13BuRkfJrcyGfE9Sj+1uXZ5uRVOyftV7l3vw15TvGumIItSkDLGX9psJ 3T/FWNzspeqzJ0ICQN/1fpJOwSA4coH9+XjydMfVftcdv7do8PByek1B87L/omVHjznk fKwOmCSsNbcT/iE5P88K4JKHn6acAJ1i+PKwWT0kVMXn2H3p61oq0X3VzGExwAttV+WI NLKzLBjxJjUEDBN4yNTYrGg52KLdn1VgI8s2hFac5Nk9SsdAd/gMIoSJU4iyYn6UbS2j fKF3NKfm/otl0kwT+CwJXibSQPzDlQgDXjeSxrwEB5Qdu5S6WYczGMlUsGfvFIciK+Tf nK1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688708330; x=1691300330; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T5bC/odqNv9c51dFwQo/oDJ/eX3GAoI7lzaKJmrUjyQ=; b=C8p1BD4rZD2XPZ3ykMKfGkB9TZEbZSWn4m1BVKq1z8NIhdj6dehvaRy2FIKQxeMbRl V0udOA6nw/q29yOdYMNafAS90s8wo5y69PFzvffbVLFsa6RhyLMswuIvYl72QK/GEWgA XV/QP/3GVd1v5E8W5iB3jFDLMk4zYAd1m+9yX74wxBvS6oX0oj96gRT59Kw6xXw/y8Gq tl1ecZaM3nICJI2+HZv4f9sBGz28r4olbbfkp8abnhbxAe2raiqb84ueLcChNW4MHa4+ 36XYseGEADe/RzDGATwCTtDJM9tzgrzQIuFlj5VESeGBUehvUpPALWFlIQIdxabm90F5 Oz1A== X-Gm-Message-State: ABy/qLYNPFDuJdyXDx9RxI6C7Mp4k9TFQ1JcdKLBk9nJgyJpnijwmm6x RF3IveUzCt6PnUwrwQk5OPGg7MMEQhLA+cxI/+c= X-Google-Smtp-Source: APBJJlHz7T615y4Ex/8Rf7pi1hfR4wdRcqVetKPgqueKX6Okyu+2FTkO2TZQjyrjjJzAtp199Wl+JoTYBZIfRlXz2no= X-Received: by 2002:a6b:6106:0:b0:784:314f:8d5b with SMTP id v6-20020a6b6106000000b00784314f8d5bmr4699314iob.11.1688708330283; Thu, 06 Jul 2023 22:38:50 -0700 (PDT) MIME-Version: 1.0 References: <89742024-d51a-c66b-90b9-67f837072cd2@joeconway.com> In-Reply-To: From: Gurjeet Singh Date: Thu, 6 Jul 2023 22:38:33 -0700 Message-ID: Subject: Re: RFC: pg_stat_logmsg To: Masahiko Sawada , Joe Conway Cc: PostgreSQL-development Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Jul 6, 2023 at 12:37=E2=80=AFAM Masahiko Sawada wrote: > On Sat, Jul 1, 2023 at 8:57=E2=80=AFAM Joe Conway wr= ote: > > > > The basic idea is to mirror how pg_stat_statements works, except the > > logged messages keyed by filename, lineno, and elevel are saved with a > > aggregate count. The format string is displayed (similar to a query > > jumble) for context, along with function name and sqlerrcode. > > > > > > Part of the thinking is that people with fleets of postgres instances > > can use this to scan for various errors that they care about. > > Additionally it would be useful to look for "should not happen" errors. > I'm concerned that displaying the format string could not be helpful > in some cases. For example, when raising an ERROR while reading WAL > records, we typically write the error message stored in > record->errormsg_buf: > > in ReadRecord(): > if (errormsg) > ereport(emode_for_corrupt_record(emode, xlogreader->EndRe= cPtr), > (errmsg_internal("%s", errormsg) /* already > translated */ )); > > In this case, the error message stored in pg_stat_logmsg is just '%s'. > The filename and line number columns might help identify the actual > error but it requires users to read the source code and cannot know > the actual error message. I believe that the number of such error messages, the ones with very little descriptive content, is very low in Postgres code. These kinds of messages are not prevalent, and must be used when code hits obscure conditions, like seen in your example above. These are the kinds of errors that Joe is referring to as "should not happen". In these cases, even if the error message was descriptive, the user will very likely have to dive deep into code to find out the real cause. I feel that the unique combination of file name and line number is useful information, even in cases where the format string not very descriptive. So I believe the extension's behaviour in this regard is acceptable. In cases where the extension's output is not descriptive enough, the user can use the filename:lineno pair to look for fully formed error messages in the actual log files; they may have to make appropriate changes to log_* parameters, though. If we still feel strongly about getting the actual message for these cases, perhaps we can develop a heuristic to identify such messages, and use either full or a prefix of the 'message' field, instead of 'message_id' field. The heuristic could be: strlen(message_id) is too short, or that message_id is all/overwhelmingly format specifiers (e.g. '%s: %d'). The core benefit of this extension is to make it easy for the user to discover error messages affecting their workload. The user may be interested in knowing the most common log messages in their server logs, so that they can work on reducing those errors or warnings. Or they may be interested in knowing when their workload hits unexpected/serious error messages, even if they're infrequent. The data exposed by pg_stat_logmsg view would serve as a starting point, but I'm guessing it would not be sufficient for their investigation. On Fri, Jun 30, 2023 at 4:57=E2=80=AFPM Joe Conway wro= te: > I would love to get > feedback on: > > 1/ the general concept > 2/ the pg_stat_statement-like implementation > 3/ contrib vs core vs external project +1 for the idea; a monitoring system trained at this view can bubble up problems to users that may otherwise go unnoticed. Piggybacking on, or using pg_stat_statments implementation as a model seems fine. I believe making this available as a contrib module hits the right balance; not all installations may want this, hence in-core, always collecting data, seems undesirable; if it's an external project, it won't be discoverable, I see that as a very high bar for someone to trust it and begin to use it. Best regards, Gurjeet http://Gurje.et