Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sU8MF-000gaq-Fp for pgsql-general@arkaria.postgresql.org; Wed, 17 Jul 2024 17:22:27 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sU8MD-003Vl6-JQ for pgsql-general@arkaria.postgresql.org; Wed, 17 Jul 2024 17:22:26 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sU8MD-003Vkw-4G for pgsql-general@lists.postgresql.org; Wed, 17 Jul 2024 17:22:25 +0000 Received: from mail-lj1-x22d.google.com ([2a00:1450:4864:20::22d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sU8MA-0003Ww-PO for pgsql-general@lists.postgresql.org; Wed, 17 Jul 2024 17:22:24 +0000 Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2eeb1051360so378121fa.0 for ; Wed, 17 Jul 2024 10:22:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721236941; x=1721841741; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=cnN6zDPOKzamISiB+ZjYsM8IiqeSpBVNXOpB4hM+iPE=; b=bPF7lXAtfAZ1frRf1jC5YTHasqJ2gE1pLgXKOlZxipsF8YJWmTyOwWapcRlUU2XRR1 twtFSHOhuVqeh59r+9ec6ZUV33Rb6R8Ie8lUjenxiGDbPnEEF4dXOK9o8EkUl0bYzoc0 E9o2uzCMTSZ89ikGIPFTWaVFpOfAnTU3V+pVotejqbPYpP3DwuaFsTh9em5fyciSVGOB wPMGaciAr22EkijQZ+dDBUfdlwdkmGUXxPqmyWHsNqyb0KR4bRiyw5njhlrEeceUrWsl 5S2Pk8iDp5cWr+sVHbisYGEeOHiNHdw+5VUma4+FnTFEHC8UNZKTwQQSXur6UBBn1Afn MVtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721236941; x=1721841741; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=cnN6zDPOKzamISiB+ZjYsM8IiqeSpBVNXOpB4hM+iPE=; b=qGw4Jl8EQqjFk/KSQ5HvV78+tadSTAzJqcoc6B4uVCdDRkps92WnN8li0p8iURxpdc mlnoqpLKk+lXWWx6Ey8n6Z1DzNc5Imlx+8flkxqZgmypLRb9nQgUmwRc3R2dl23OLs2K 7Ertl0TBEeXlrjOnDYEkzHOBqJsVN8d0bCyQdWO1VgTPQeiFBF2lhpmALTHlXxON+9is gs51hOH9S3ECu3/P8Wpx7Kh9j0WAfe7jjVs9+Ctk6l3HHpXDJ3ztgG8Vj4o9UqrOwVGo PiyOeura+lbPJSkkvaF2D+3hMOGkwj7ZvhkLhVYn8QEDc7Zp59lrQrrgnT73oHpOGNuH 4nVQ== X-Gm-Message-State: AOJu0Yz9nARxfqdCLXmrogPKdcvaxTn7uBgeI6KbN4giRUlxUrZCAv2e 4ecG4fQgFUAasMG3sKzc0O6Jha+TNSpIv94c0aFoMg0xwfG7MV27RCKWxAkOopNSx3O2/YN9HDx tgQqOmN4pszDUPTXZ3qTLAF5UakpcdAoY X-Google-Smtp-Source: AGHT+IF27jNBpGgG6Hk4uSUJjG049sJ0mBxpCHJP442Y3LQZTKN8+XGE/6l+ZhpTMiyh8BWsRnF1dou8DE6HvRrpYdw= X-Received: by 2002:a2e:9d98:0:b0:2ee:8cff:fa6b with SMTP id 38308e7fff4ca-2ef05c50495mr732991fa.5.1721236940601; Wed, 17 Jul 2024 10:22:20 -0700 (PDT) MIME-Version: 1.0 From: Greg Sabino Mullane Date: Wed, 17 Jul 2024 13:21:44 -0400 Message-ID: Subject: Planet Postgres and the curse of AI To: pgsql-general Content-Type: multipart/alternative; boundary="0000000000006b4194061d74b474" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000006b4194061d74b474 Content-Type: text/plain; charset="UTF-8" I've been noticing a growing trend of blog posts written mostly, if not entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise this issue. I considered a blog post, but this mailing list seemed a better forum to generate a discussion. The problem is two-fold as I see it. First, there is the issue of people trying to game the system by churning out content that is not theirs, but was written by a LLM. I'm not going to name specific posts, but after a while it gets easy to recognize things that are written mostly by AI. These blog posts are usually generic, describing some part of Postgres in an impersonal, mid-level way. Most of the time the facts are not wrong, per se, but they lack nuances that a real DBA would bring to the discussion, and often leave important things out. Code examples are often wrong in subtle ways. Places where you might expect a deeper discussion are glossed over. So this first problem is that it is polluting the Postgres blogs with overly bland, moderately helpful posts that are not written by a human, and do not really bring anything interesting to the table. There is a place for posts that describe basic Postgres features, but the ones written by humans are much better. (yeah, yeah, "for now" and all hail our AI overlords in the future). The second problem is worse, in that LLMs are not merely gathering information, but have the ability to synthesize new conclusions and facts. In short, they can lie. Or hallucinate. However you want to call it, it's a side effect of the way LLMs work. In a technical field like Postgres, this can be a very bad thing. I don't know how widespread this is, but I was tipped off about this over a year ago when I came across a blog suggesting using the "max_toast_size configuration parameter". For those not familiar, I can assure you that Postgres does not have, nor will likely ever have, a GUC with that name. As anyone who has spoken with ChatGPT knows, getting small important details correct is not its forte. I love ChatGPT and actually use it daily. It is amazing at doing certain tasks. But writing blog posts should not be one of them. Do we need a policy or a guideline for Planet Postgres? I don't know. It can be a gray line. Obviously spelling and grammar checking is quite okay, and making up random GUCs is not, but the middle bit is very hazy. (Human) thoughts welcome. Cheers, Greg --0000000000006b4194061d74b474 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I've been = noticing a growing trend of blog posts written mostly, if not entirely,=C2= =A0with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise this = issue.=C2=A0I considered a blog post, but this mailing list seemed a better= forum to=C2=A0generate a discussion.

The problem is two-fold as I s= ee it.

First, there is the issue of people trying to game the system= by churning out=C2=A0content that is not theirs, but was written by a LLM.= I'm not going to name=C2=A0specific posts, but after a while it gets e= asy to recognize things that=C2=A0are written mostly by AI.

These bl= og posts are usually generic, describing some part of Postgres in=C2=A0an i= mpersonal, mid-level way. Most of the time=C2=A0the facts are not wrong,=C2= =A0per se, but they lack nuances that a real DBA would bring to the discuss= ion,=C2=A0and often leave important things out. Code examples are often wro= ng in subtle ways. Places where you might expect a deeper discussion are gl= ossed over.

So this first = problem is that it is polluting the Postgres blogs with overly=C2=A0bland, = moderately helpful posts that are not written by a human, and do not=C2=A0r= eally bring anything interesting to the table. There is a place for posts= =C2=A0that describe basic Postgres features, but the ones written by humans= are=C2=A0much better. (yeah, yeah, "for now" and all hail our AI= overlords in the future).

The second problem is worse, in that LLMs= are not merely gathering information,=C2=A0but have the ability to synthes= ize new conclusions and facts. In short, they can lie.=C2=A0Or hallucinate.= However you want to call it, it's a side effect of the way LLMs work. = In a technical=C2=A0field like Postgres, this can be a very bad thing. I do= n't know how widespread this=C2=A0is, but I was tipped off about this o= ver a year ago when I came across a blog=C2=A0suggesting using the "ma= x_toast_size configuration parameter". For those not familiar,=C2=A0I = can assure you that Postgres does not have, nor will likely ever have, a GU= C with that name.

As anyone who has spoken with ChatGPT knows, getti= ng small important details=C2=A0correct is not its forte. I love ChatGPT an= d actually use it daily. It is=C2=A0amazing at doing certain tasks. But wri= ting blog posts should not be one of them.

Do we need a policy or a = guideline for Planet Postgres? I don't know.=C2=A0It can be a gray line= . Obviously spelling and grammar checking is quite okay,=C2=A0and making up= random GUCs is not, but the middle bit is very hazy. (Human) thoughts welc= ome.

Cheers,
Greg
<= div>
--0000000000006b4194061d74b474--