MIME-Version: 1.0
From: Greg Sabino Mullane <htamfids@gmail.com>
Date: Wed, 17 Jul 2024 13:21:44 -0400
Message-ID: <CAKAnmm+tbPMdP8ccrJ-o_LVgC6ADdOEoh2=J+zyWNLab6B3+_Q@mail.gmail.com>
Subject: Planet Postgres and the curse of AI
To: pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000006b4194061d74b474"
Archived-At: <https://www.postgresql.org/message-id/CAKAnmm%2BtbPMdP8ccrJ-o_LVgC6ADdOEoh2%3DJ%2BzyWNLab6B3%2B_Q%40mail.gmail.com>
Precedence: bulk

--0000000000006b4194061d74b474
Content-Type: text/plain; charset="UTF-8"

I've been noticing a growing trend of blog posts written mostly, if not
entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise
this issue. I considered a blog post, but this mailing list seemed a better
forum to generate a discussion.

The problem is two-fold as I see it.

First, there is the issue of people trying to game the system by churning
out content that is not theirs, but was written by a LLM. I'm not going to
name specific posts, but after a while it gets easy to recognize things
that are written mostly by AI.

These blog posts are usually generic, describing some part of Postgres
in an impersonal, mid-level way. Most of the time the facts are not
wrong, per se, but they lack nuances that a real DBA would bring to the
discussion, and often leave important things out. Code examples are often
wrong in subtle ways. Places where you might expect a deeper discussion are
glossed over.

So this first problem is that it is polluting the Postgres blogs with
overly bland, moderately helpful posts that are not written by a human, and
do not really bring anything interesting to the table. There is a place for
posts that describe basic Postgres features, but the ones written by humans
are much better. (yeah, yeah, "for now" and all hail our AI overlords in
the future).

The second problem is worse, in that LLMs are not merely gathering
information, but have the ability to synthesize new conclusions and facts.
In short, they can lie. Or hallucinate. However you want to call it, it's a
side effect of the way LLMs work. In a technical field like Postgres, this
can be a very bad thing. I don't know how widespread this is, but I was
tipped off about this over a year ago when I came across a blog suggesting
using the "max_toast_size configuration parameter". For those not
familiar, I can assure you that Postgres does not have, nor will likely
ever have, a GUC with that name.

As anyone who has spoken with ChatGPT knows, getting small important
details correct is not its forte. I love ChatGPT and actually use it daily.
It is amazing at doing certain tasks. But writing blog posts should not be
one of them.

Do we need a policy or a guideline for Planet Postgres? I don't know. It
can be a gray line. Obviously spelling and grammar checking is quite
okay, and making up random GUCs is not, but the middle bit is very hazy.
(Human) thoughts welcome.

Cheers,
Greg

--0000000000006b4194061d74b474
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_quote"><div dir=3D"ltr">I&#39;ve been =
noticing a growing trend of blog posts written mostly, if not entirely,=C2=
=A0with AI (aka LLMs, ChatGPT, etc.). I&#39;m not sure where to raise this =
issue.=C2=A0I considered a blog post, but this mailing list seemed a better=
 forum to=C2=A0generate a discussion.<br><br>The problem is two-fold as I s=
ee it.<br><br>First, there is the issue of people trying to game the system=
 by churning out=C2=A0content that is not theirs, but was written by a LLM.=
 I&#39;m not going to name=C2=A0specific posts, but after a while it gets e=
asy to recognize things that=C2=A0are written mostly by AI.<br><br>These bl=
og posts are usually generic, describing some part of Postgres in=C2=A0an i=
mpersonal, mid-level way. Most of the time=C2=A0the facts are not wrong,=C2=
=A0per se, but they lack nuances that a real DBA would bring to the discuss=
ion,=C2=A0and often leave important things out. Code examples are often wro=
ng in subtle ways. Places where you might expect a deeper discussion are gl=
ossed over.</div><div dir=3D"ltr"><br></div><div dir=3D"ltr">So this first =
problem is that it is polluting the Postgres blogs with overly=C2=A0bland, =
moderately helpful posts that are not written by a human, and do not=C2=A0r=
eally bring anything interesting to the table. There is a place for posts=
=C2=A0that describe basic Postgres features, but the ones written by humans=
 are=C2=A0much better. (yeah, yeah, &quot;for now&quot; and all hail our AI=
 overlords in the future).<br><br>The second problem is worse, in that LLMs=
 are not merely gathering information,=C2=A0but have the ability to synthes=
ize new conclusions and facts. In short, they can lie.=C2=A0Or hallucinate.=
 However you want to call it, it&#39;s a side effect of the way LLMs work. =
In a technical=C2=A0field like Postgres, this can be a very bad thing. I do=
n&#39;t know how widespread this=C2=A0is, but I was tipped off about this o=
ver a year ago when I came across a blog=C2=A0suggesting using the &quot;ma=
x_toast_size configuration parameter&quot;. For those not familiar,=C2=A0I =
can assure you that Postgres does not have, nor will likely ever have, a GU=
C with that name.<br><br>As anyone who has spoken with ChatGPT knows, getti=
ng small important details=C2=A0correct is not its forte. I love ChatGPT an=
d actually use it daily. It is=C2=A0amazing at doing certain tasks. But wri=
ting blog posts should not be one of them.<br><br>Do we need a policy or a =
guideline for Planet Postgres? I don&#39;t know.=C2=A0It can be a gray line=
. Obviously spelling and grammar checking is quite okay,=C2=A0and making up=
 random GUCs is not, but the middle bit is very hazy. (Human) thoughts welc=
ome.<br></div><div dir=3D"ltr"><br></div><div>Cheers,</div><div>Greg</div><=
div><br></div></div></div>

--0000000000006b4194061d74b474--