MIME-Version: 1.0
References: <CAOYmi+kThkM9Z87u=R_Wi7fCor2i+UZKAyq0UCyprzCwTQvqgA@mail.gmail.com>
 <20240610200411.byj6sv2vpgol6wcf@awork3.anarazel.de>
In-Reply-To: <20240610200411.byj6sv2vpgol6wcf@awork3.anarazel.de>
From: Jacob Champion <jacob.champion@enterprisedb.com>
Date: Tue, 11 Jun 2024 07:28:23 -0700
Message-ID: <CAOYmi+niQdwFdX7srOiD8zdme_rxEp2m4JGdqzK=+dS6dpV2Og@mail.gmail.com>
Subject: Re: RFC: adding pytest as a supported test framework
To: Andres Freund <andres@anarazel.de>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://www.postgresql.org/message-id/CAOYmi%2BniQdwFdX7srOiD8zdme_rxEp2m4JGdqzK%3D%2BdS6dpV2Og%40mail.gmail.com>
Precedence: bulk

On Mon, Jun 10, 2024 at 1:04=E2=80=AFPM Andres Freund <andres@anarazel.de> =
wrote:
> Just for context for the rest the email: I think we desperately need to m=
ove
> off perl for tests. The infrastructure around our testing is basically
> unmaintained and just about nobody that started doing dev stuff in the la=
st 10
> years learned perl.

Okay. Personally, I'm going to try to stay out of discussions around
subtracting Perl and focus on adding Python, for a bunch of different
reasons:

- Tests aren't cheap, but in my experience, the maintenance-cost math
for tests is a lot different than the math for implementations.
- I don't personally care for Perl, but having tests in any form is
usually better than not having them.
- Trying to convince people to get rid of X while adding Y is a good
way to make sure Y never happens.

> On 2024-06-10 11:46:00 -0700, Jacob Champion wrote:
> > 4. It'd be great to split apart client-side tests from server-side
> > tests. Driving Postgres via psql all the time is fine for acceptance
> > testing, but it becomes a big problem when you need to test how
> > clients talk to servers with incompatible feature sets, or how a peer
> > behaves when talking to something buggy.
>
> That seems orthogonal to using pytest vs something else?

Yes, I think that's fair. It's going to be hard not to talk about
"things that pytest+Python don't give us directly but are much easier
to build" in all of this (and I tried to call that out in the next
section, maybe belatedly). I think I'm going to have to convince both
a group of people who want to ask "why pytest in particular?" and a
group of people who ask "why isn't what we have good enough?"

> > =3D=3D Why pytest? =3D=3D
> >
> > From the small and biased sample at the unconference session, it looks
> > like a number of people have independently settled on pytest in their
> > own projects. In my opinion, pytest occupies a nice space where it
> > solves some of the above problems for us, and it gives us plenty of
> > tools to solve the other problems without too much pain.
>
> We might be able to alleviate that by simply abstracting it away, but I f=
ound
> pytest's testrunner pretty painful. Oodles of options that are not very w=
ell
> documented and that often don't work because they are very specific to so=
me
> situations, without that being explained.

Hm. There are a bunch of them, but I've never needed to go through the
oodles of options. Anything in particular that caused problems?

> > Problem 1 (rerun failing tests): One architectural roadblock to this
> > in our Test::More suite is that tests depend on setup that's done by
> > previous tests. pytest allows you to declare each test's setup
> > requirements via pytest fixtures, letting the test runner build up the
> > world exactly as it needs to be for a single isolated test. These
> > fixtures may be given a "scope" so that multiple tests may share the
> > same setup for performance or other reasons.
>
> OTOH, that's quite likely to increase overall test times very
> significantly. Yes, sometimes that can be avoided with careful use of var=
ious
> features, but often that's hard, and IME is rarely done rigiorously.

Well, scopes are pretty front and center when you start building
pytest fixtures, and the complicated longer setups will hopefully
converge correctly early on and be reused everywhere else. I imagine
no one wants to build cluster setup from scratch.

On a slight tangent, is this not a problem today? I mean... part of my
personal long-term goal is in increasing test hygiene, which is going
to take some shifts in practice. As long as review keeps the quality
of the tests fairly high, I see the inevitable "our tests take too
long" problem as a good one. That's true no matter what framework we
use, unless the framework is so bad that no one uses it and the
runtime is trivial. If we're worried that people will immediately
start exploding the runtime and no one will notice during review,
maybe we can have some infrastructure flag how much a patch increased
it?

> > Problem 2 (seeing what failed): pytest does this via assertion
> > introspection and very detailed failure reporting. If you haven't seen
> > this before, take a look at the pytest homepage [1]; there's an
> > example of a full log.
>
> That's not really different than what the perl tap test stuff allows. We
> indeed are bad at utilizing it, but I'm not sure that switching languages=
 will
> change that.

Jelte already touched on this, but I wanted to hammer on the point: If
no one, not even the developers who chose and like Perl, is using
Test::More in a way that's maintainable, I would prefer to use a
framework that does maintainable things by default so that you have to
try really hard to screw it up. It is possible to screw up `assert
actual =3D=3D expected`, but it takes more work than doing it the right
way.

> I think part of the problem is that the information about what precisely
> failed is often much harder to collect when testing multiple servers
> interacting than when doing localized unit tests.
>
> I think we ought to invest a bunch in improving that, I'd hope that a lot=
 of
> that work would be largely independent of the language the tests are writ=
ten
> in.

We do a lot more acceptance testing than internal testing, which came
up as a major complaint from me and others during the unconference.
One of the reasons people avoid writing internal tests in Perl is
because it's very painful to find a rhythm with Test::More. From
experience test-driving the OAuth work, I'm *very* happy with the
development cycle that pytest gave me.

Other languages _could_ do that, sure. It's a simple matter of programming.=
..

> Ugh, I think this is actually python's weakest area. There's about a doze=
n
> package managers and "python distributions", that are at best half compat=
ible,
> and the documentation situation around this is *awful*.

So... don't support the half-compatible stuff? I thought this
conversation was still going on with Windows Perl (ActiveState?
Strawberry?) but everyone just seems to pick what works for them and
move on to better things to do.

Modern CPython includes pip and venv. Done. If someone comes to us
with some horrible Anaconda setup wanting to know why their duct tape
doesn't work, can't we just tell them no?

> > When it comes to third-party packages, which I think we're
> > probably going to want in moderation, we would still need to discuss
> > supply chain safety. Python is not as mature here as, say, Go.
>
> What external dependencies are you imagining?

The OAuth pytest suite makes extensive use of
- psycopg, to easily drive libpq;
- construct, for on-the-wire packet representations and manipulation; and
- pyca/cryptography, for easy generation of certificates and manual
crypto testing.

I'd imagine each would need considerable discussion, if there is
interest in doing the same things that I do with them.

> I think somewhere between 1 and 4 a *substantial* amount of work would be
> required to provide a bunch of the infrastructure that Cluster.pm etc
> provide. Otherwise we'll end up with a lot of copy pasted code between te=
sts.

Possibly, yes. I think it depends on what you want to test first, and
there's a green-field aspect of hope/anxiety/ennui, too. Are you
trying to port the acceptance-test framework that we already have, or
are you trying to build a framework that can handle the things we
can't currently test? Will it be easier to refactor duplication into
shared fixtures when the language doesn't encourage an infinite number
of ways to do things? Or will we have to keep on top of it to avoid
pain?

--Jacob