Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wSIiW-00360R-0R for pgsql-hackers@arkaria.postgresql.org; Wed, 27 May 2026 18:10:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wSIiT-0094nN-3C for pgsql-hackers@arkaria.postgresql.org; Wed, 27 May 2026 18:10:54 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wSIiS-0094mp-37 for pgsql-hackers@lists.postgresql.org; Wed, 27 May 2026 18:10:54 +0000 Received: from fout-a8-smtp.messagingengine.com ([103.168.172.151]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wSIiQ-00000001jQK-1vkm for pgsql-hackers@postgresql.org; Wed, 27 May 2026 18:10:53 +0000 Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.phl.internal (Postfix) with ESMTP id 09F48EC011B; Wed, 27 May 2026 14:10:48 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Wed, 27 May 2026 14:10:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1779905448; x=1779991848; bh=Duc5NT9eTE4Ec90XMh89emOkWc4K/oRA+gtG9FPuVzY=; b= etupXq1fAdDqrOuHVEtCHfd4hFWVHYCobwTl56NH/1Md14w/cgR9PdyL6bVWzaEd IxN4wDLfD7HPbqt27DOGjS2Bi9jTirxxGLaqp22ZpJL5muT1zyRyLJGIEl/Pv3MN 8leuk96gdpPzCbxU/nQeJLC4nEWrQ2hC6OVB7pKoMya3+y2aEvWwm0qYT2j1r7u3 aBXFt1RRsNUUX9gM+FAB3iX8Qi2O3ne1yDfJxcJuhnSZCfCPnrD+ltubZDVailUX cU0ipsGGbn4sjF72xPXiCa/y6X7odMlnX2m0ocxqb377hb3U/u3o/0qulir3Fjeh 9hhlrYghBMWsGL5lNIXkbQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779905448; x= 1779991848; bh=Duc5NT9eTE4Ec90XMh89emOkWc4K/oRA+gtG9FPuVzY=; b=Z /TmzjpaQ5OEn9+U+Y1RsBPQUA/u/Kwdp8bR0EITQD2meL5gDOpY4FPILXRtRo1X0 U6j+mi6Z+/MN7gszstiAGWv9A8hsLFMcxm32K+YS1QPHvBLzUwSpRtMwDkmeUDV4 e5ftXjjHSqH5GAMLI436+PLPPMzLt4QSRiq8YsJ0rqSfPgYafTEpbvK9M46gz2pE HggcbMYMy28mKIcbiZnwFe58J6+RH4xEDv2sm+XFmiuSpE6pg0ikFwmtmldR9NSt 4OFHMDxlR9et6QSvOvCuPVj+wrTZUPAACKc5R/QTPwLf5UAF0OTimtvXORcA+kLm cadxxa7iF4NMTZNTES+9w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFdH5W8mvGny6FB7sfsC+rxvIQKBfj2Y8SUEcgwyFEqxhKAJhgjNMESb09hnnzfcy UivVqEzrL2F/snIxKSpR/Dad/V5j/+Q8cp9nhVdz5cpnqgL1jyikalQXfpynERU67egVZp zf766KE/fG72HHq4UwPSfa/KNHxrWComVpkgbEvB12uHYmnlcDgS+ejLUgR6nQLLx186HG k7YJCXQi/WgCoIVoNPZsBQMPcZbz2EByzh5EDVkswkqcjrxiETbz4f2t0uHnwb9+xdjF32 5sFHgv5e4ctxNc2SpmwT6+pUegwAd6ub1usd4+Hkguhnk+7xRQ3+CON6cZeNrXr4HyPuKy nge9xpoD4V+oCEoW72BVPGv9ZjDldKzZLqf/AbTPklXGl1dgPrgeTUY09o+m7wzMd0pD5Y eTw7Vu4DtBUb8KDR/m6YZD2kRG/pkIEXv0BDQAk1BYGPTeCHZMUIdB03ypz45UTLjB5Rqq C1AL/R5Sm4GL7jqBM0KNP4x4UbLA+kVd3hsdYqjz8yVovsE5ERCPUiQAhiOV3luW06cZji QhRMUa0F2mWcwzutuaRoyPw0Ne73mtrv5OXe7CKiOloKd+PH/DFtBEqFjSZLfnX9scbQZJ UuEn4Vjt9HDDxLOjf7e1VqvGTJ0aA1nd02dSsBOOK9jm/Okm+LKr9ep6EGSw X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 27 May 2026 14:10:47 -0400 (EDT) Date: Wed, 27 May 2026 14:10:46 -0400 From: Andres Freund To: Nazir Bilal Yavuz Cc: Jelte Fennema-Nio , Thomas Munro , pgsql-hackers@postgresql.org Subject: Re: Heads Up: cirrus-ci is shutting down June 1st Message-ID: References: <3ydjipcr7kbss57nvi67noplncqhesl5eyb6wgol4ccjxynspv@yatlykpribmm> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, > Here is the v2, I took Jelte's patch and reviewed & merged it with my > patch. Updates and questions are: > > 1- I continued to use Jelte's container method (Linux tasks only for > now, BSD tasks will be included in the future) because I think that is > the future-proof way since we might want to generate our container > images in the future. Also, up-to-date Debian images can be tested > with this way; otherwise we would need to use Ubuntu 24.04. Good. > 2- io_uring tests work on the Linux Meson task. Is there a reason to not just do that for all the tasks? > 3- I didn't put commands to helper scripts for now. I think it is a > good thing to have a helper script but it would be better to have this > helper script after the first version is committed since it can extend > the timeline. Also, I found that having all commands in one file makes > debugging easier. Hm. I'm a bit worried about this getting pretty unmaintainable, due to the repetition. I think at least we need to use yaml anchors to deduplicate some steps. > 4- FreeBSD task has these options: > > PG_TEST_INITDB_EXTRA_OPTS: >- > -c debug_copy_parse_plan_trees=on > -c debug_write_read_parse_plan_trees=on > -c debug_raw_expression_coverage_test=on > -c debug_parallel_query=regress > > Since we won't have FreeBSD for the first version. I put these options > to the MacOS task but I couldn't decide where to put > 'PG_TEST_PG_UPGRADE_MODE: --link'. Makes sense. > Also, I am planning to work on back patches when we agree on the > upstream one. Does that sound good? Yep. > diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml > new file mode 100644 > index 00000000000..6d20068727c > --- /dev/null > +++ b/.github/workflows/ci.yml > @@ -0,0 +1,1125 @@ > +# GitHub Actions CI configuration for PostgreSQL > + > +name: Github Actions CI > + > +on: > + push: > + branches: [ "*" ] > + > +# Default to the minimum privilege the jobs need (just reading the repo > +# contents during checkout). Individual jobs override this when they need > +# more, e.g. `cancel-previous` needs `actions: write` to cancel runs. > +permissions: > + contents: read I'm not sure I like that we ever need more than that. I'd expect that postgresql-cfbot will explicitly disable write permissions for runs. > +# NB: intentionally NO workflow-level `concurrency:` block. The native > +# concurrency mechanism makes a new run wait for the previous one to fully > +# cancel before it starts — which can take a while. Instead the > +# `cancel-previous` job below fires a cancel API call asynchronously, > +# so the new run gets going immediately. On master the cancel job is skipped, > +# so every push runs to completion. Is this really worth having our own code? Seems like it'd not be that frequent to push if there are already running runs? What kind of delays are we talking about? > + # To avoid unnecessarily spinning up a lot of VMs / containers for entirely > + # broken commits, have a minimal task that all others depend on. > + # > + # SPECIAL: > + # - Builds with --auto-features=disabled and thus almost no enabled > + # dependencies > + sanity-check: > + name: SanityCheck > + needs: setup > + if: needs.setup.outputs.sanitycheck == 'true' > + runs-on: ubuntu-latest > + timeout-minutes: 15 > + container: > + image: ${{ needs.setup.outputs.linux_ci_image }} > + env: > + BUILD_JOBS: 8 > + TEST_JOBS: 8 > + CCACHE_DIR: ${{ github.workspace }}/ccache_dir > + # no options enabled, should be small > + CCACHE_MAXSIZE: "150M" > + steps: > + - uses: actions/checkout@v6 > + with: > + fetch-depth: ${{ env.CLONE_DEPTH }} > + > + - name: Restore ccache > + uses: actions/cache@v5 Seems like this is used by every task. Can we move this into a yaml anchor or such, by using a variable representing the job name? > + with: > + path: ${{ env.CCACHE_DIR }} > + key: ccache-sanitycheck-${{ github.run_id }} > + restore-keys: ccache-sanitycheck- Why is the key here the run id? Doesn't that mean that we will never have a precise cache match and that we will keep multiple versions of the cache around? That seems like a waste of cache space? For efficiency, particularly on cfbot, it seems like it could be useful to populate the cache of branches with the cache of the master branch. For that we'd need the branch name in the key. Which I think would also good for postgres/postgres, as we currently have a lot of interference between runs on the main and the REL_XY_STABLE branches. > + - name: Prepare workspace > + run: | > + whoami > + useradd -m postgres > + chown -R postgres:postgres . > + mkdir -p "$CCACHE_DIR" > + chown -R postgres:postgres "$CCACHE_DIR" > + # Can't change the container's kernel.core_pattern; the postgres > + # user can't write to / normally. Make / writable. > + chown root:postgres / > + chmod g+rwx / Why not just always use a privileged container? > + - name: Configure > + run: | > + su postgres <<-'EOF' > + set -e > + meson setup \ > + --buildtype=debug \ > + --auto-features=disabled \ > + -Ddefault_library=shared \ > + -Dtap_tests=enabled \ > + build > + EOF > + > + - name: Build > + run: | > + su postgres < + set -e > + ninja -C build -j${BUILD_JOBS} ${MBUILD_TARGET} > + EOF Should we have an explicit cache upload step here? Or are upload steps run unconditionally? > + # Run a minimal set of tests. The main regression tests take too long > + # for this purpose. For now this is a random quick pg_regress style > + # test, and a tap test that exercises both a frontend binary and the > + # backend. > + - name: Test > + run: | > + su postgres < + set -e > + ulimit -c unlimited > + meson test ${MTEST_ARGS} --suite setup > + meson test ${MTEST_ARGS} --num-processes ${TEST_JOBS} \ > + cube/regress pg_ctl/001_start_stop > + EOF > + > + - name: Core backtraces > + if: failure() > + run: | > + mkdir -m 770 /tmp/cores > + find / -maxdepth 1 -type f -name 'core*' -exec mv '{}' /tmp/cores/ \; > + src/tools/ci/cores_backtrace.sh linux /tmp/cores > + > + - name: Upload logs > + if: failure() > + uses: actions/upload-artifact@v7 > + with: > + name: sanitycheck-logs-${{ github.run_id }} > + path: | > + build*/testrun/**/*.log > + build*/testrun/**/*.diffs > + build*/testrun/**/regress_log_* > + build*/meson-logs/*.txt > + if-no-files-found: ignore I think this really should be in a yaml anchor, we have a few somewhat different versions of this now. It's pretty annoying that the output of the failures isn't visible in the UI. Maybe we ought to print a few of the failures out or something? > + > + # SPECIAL: > + # - Uses address sanitizer (sanitizer failures are typically printed in > + # the server log) > + # - Configures postgres with a small segment size > + # > + # Enable a reasonable set of sanitizers. Use the linux task for that, as > + # it's one of the fastest tasks (without sanitizers). Also several of the > + # sanitizers work best on linux. > + # > + # The overhead of alignment sanitizer is low, undefined behaviour has > + # moderate overhead. Test alignment sanitizer in the meson task, as it > + # does both 32 and 64 bit builds and is thus more likely to expose > + # alignment bugs. > + # > + # Address sanitizer in contrast is somewhat expensive. Enable it in the > + # autoconf task, as the meson task tests both 32 and 64bit. I wonder if we should split the meson task into two, one for 32bit and one for 64bit. The concurrency limits for public repos are high enough for that to seem like a reasonable tradeoff? There's no work, other than the repo checkout, shared between them. > + # disable_coredump=0, abort_on_error=1: for useful backtraces in case of crashes > + # print_stacktraces=1,verbosity=2, duh > + # detect_leaks=0: too many uninteresting leak errors in short-lived binaries > + linux-autoconf: > + name: Linux - Debian Trixie - Autoconf > + needs: [setup, sanity-check] > + if: | > + !cancelled() && > + needs.setup.outputs.linux == 'true' && > + needs.sanity-check.result != 'failure' > + runs-on: ubuntu-latest > + timeout-minutes: 60 > + container: > + image: ${{ needs.setup.outputs.linux_ci_image }} > + # Share the host PID + IPC namespaces. 017_shm.pl rapidly creates, > + # kill9's, and restarts postgres; with the container's small PID > + # space a new postgres can recycle the dead postmaster's PID before > + # pg_ctl's postmaster.pid check notices, producing spurious "node X > + # is already running" failures. SysV shm in the test also relies on > + # host-like IPC behavior. > + # > + # --ulimit raises memlock and core dump size. Memlock is needed for > + # running the AIO tests. > + # > + # --privileged is needed so the prepare step can write to sysctls > + # under /proc/sys (it's mounted read-only without it). We use it to > + # set kernel.core_pattern. > + options: --pid=host --ipc=host --ulimit memlock=-1:-1 --privileged > + env: > + BUILD_JOBS: 4 > + TEST_JOBS: 8 > + CCACHE_DIR: /tmp/ccache_dir > + DEBUGINFOD_URLS: "https://debuginfod.debian.net" > + > + SANITIZER_FLAGS: -fsanitize=address > + UBSAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2 > + ASAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0:detect_stack_use_after_return=0 > + CFLAGS: -Og -ggdb -fno-sanitize-recover=all -fsanitize=address > + CXXFLAGS: -Og -ggdb -fno-sanitize-recover=all -fsanitize=address > + LDFLAGS: -fsanitize=address > + CC: ccache gcc > + CXX: ccache g++ There's a fair bit of stuff shared between the meson/autoconf linux tasks. Previously they used a matrix to reduce that a *bit*. But now it's entirely duplicated, including stuff that doesn't apply to the current job (e.g. UBSAN_OPTIONS/ASAN_OPTIONS). And blocks like the following: > + - name: Prepare workspace > + run: | > + useradd -m postgres > + chown -R postgres:postgres . > + mkdir -p "$CCACHE_DIR" > + chown -R postgres:postgres "$CCACHE_DIR" > + mkdir -m 770 /tmp/cores > + chown root:postgres /tmp/cores > + sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core' > + > + # Hosts for the load balance test > + cat >> /etc/hosts <<-EOF > + 127.0.0.1 pg-loadbalancetest > + 127.0.0.2 pg-loadbalancetest > + 127.0.0.3 pg-loadbalancetest > + EOF > + # Install dependencies via Homebrew rather than Macports. On stock > + # GH runners macports requires a heavy bootstrap, and the relevant > + # Postgres deps are all available in brew. What does "heavy bootstrap" mean? > + - name: Install dependencies > + run: | > + brew update > + brew install \ > + ccache meson openldap python@3.12 tcl-tk > + # IPC::Run via cpanm (system perl) > + sudo cpan -T -i IPC::Run IO::Tty We do spend ~95s on this every run, that's not nothing. And it puts a bunch of load onto the brew's mirrors to do that every run. > + - name: Test world > + run: | > + ulimit -c unlimited > + ulimit -n 1024 > + meson test ${MTEST_ARGS} --num-processes ${TEST_JOBS} I'd re-add the comments that were in .cirrus.yml about this. > + windows-vs: > + name: Windows - Server 2022, VS 2022 - Meson & ninja > + needs: [setup, sanity-check] > + if: | > + !cancelled() && > + needs.setup.outputs.windows == 'true' && > + needs.sanity-check.result != 'failure' > + runs-on: windows-2022 > + timeout-minutes: 60 > + env: > + TEST_JOBS: 8 > + # Avoid port conflicts between concurrent tap tests > + PG_TEST_USE_UNIX_SOCKETS: 1 > + PG_REGRESS_SOCK_DIR: 'c:\pgsock\' At least my editor gets confused by the \', thinking it's escaping the '. As everything just works without the trailing \, I'd go that way. > + # The TAP tests build an initdb template under build/tmp_install and > + # then `robocopy` it into per-test data directories. Robocopy with the > + # default /COPY:DAT flag doesn't copy ACLs — destinations inherit from > + # their parent dir. On GitHub-hosted Windows runners the workspace's > + # inherited ACL grants Administrators:(F) and Users:(RX) but does NOT > + # grant the runner user (runneradmin) directly. That matters because > + # pg_ctl on Windows uses CreateRestrictedProcess to drop admin > + # privileges from postmaster, so the postmaster process has the user > + # SID in its token but no longer the Administrators group — leaving it > + # with only "Users:(RX)" on pg_control and friends, which causes > + # "PANIC: could not open file global/pg_control: Permission denied". > + # > + # Fix it once on the workspace dir with (OI)(CI) inheritance flags so > + # every file/dir created underneath gets an explicit grant for the > + # current user. > + - name: Grant workspace ACL to runner user > + shell: pwsh > + run: | > + icacls "${{ github.workspace }}" /grant "${env:USERNAME}:(OI)(CI)F" /Q | Out-Null > + Write-Host "Granted Full Control to $env:USERNAME on ${{ github.workspace }}" Perhaps this would be better to fix by changing the robocopy flags? > + # postgres' plpython3u loads python3.dll (the stable-ABI forwarder) > + # which in turn loads whichever python3NN.dll the Windows loader finds > + # first on PATH. On windows-2022 `C:\Program Files\Mercurial\` ships > + # its own python3.dll + python39.dll and appears on PATH *before* the > + # hostedtoolcache Python 3.12 — so without intervention the backend > + # ends up running Python 3.9 while postgres' stdlib search uses 3.12, > + # producing `ImportError: cannot import name 'text_encoding' from > + # 'io'` (the 3.12 `io.py` calling into 3.9's `_io`). > + # > + # Pin PYTHONHOME to the Python 3.12 prefix, and prepend that prefix > + # to PATH so its python3.dll wins the DLL search. > + - name: Pin Python prefix on PATH and PYTHONHOME > + shell: pwsh > + run: | > + $prefix = (python -c "import sys; print(sys.prefix)").Trim() > + Add-Content $env:GITHUB_ENV "PYTHONHOME=$prefix" > + Add-Content $env:GITHUB_PATH $prefix > + Write-Host "PYTHONHOME=$prefix" > + Write-Host "Prepended $prefix to PATH" GRJGJKLJKJDFJKDF. > + - name: Install dependencies > + shell: pwsh > + run: | > + choco install -y --no-progress --limitoutput diffutils winflexbison > + # meson + ninja aren't preinstalled on windows-2022. Install via pip > + python -m pip install --upgrade meson ninja > + > + # OpenSSL 1.1 via the slproweb installer (pinned to match the > + # version used elsewhere in postgres CI). > + curl.exe -fsSL -o openssl-setup.exe https://slproweb.com/download/Win64OpenSSL-1_1_1w.exe > + Start-Process -Wait -FilePath ./openssl-setup.exe ` > + -ArgumentList '/DIR=c:\openssl\1.1\ /VERYSILENT /SP- /SUPPRESSMSGBOXES' > + # The slproweb installer puts libcrypto-1_1-x64.dll / libssl-1_1-x64.dll > + # in c:\openssl\1.1\bin\ and updates the system PATH. GH Actions > + # snapshots PATH at job start though, so the running job won't > + # see those DLLs and initdb.exe would crash silently at runtime. > + # Push the bin dir onto GITHUB_PATH so it persists for later steps. > + Add-Content $env:GITHUB_PATH "c:\openssl\1.1\bin" I don't like that much, but I'm not sure we have a better alternative short-term. > + windows-mingw: > + name: Windows - Server 2022, MinGW64 - Meson > + needs: [setup, sanity-check] > + if: | > + !cancelled() && > + needs.setup.outputs.mingw == 'true' && > + needs.sanity-check.result != 'failure' > + runs-on: windows-2022 > + timeout-minutes: 60 > + env: > + TEST_JOBS: 4 # higher concurrency causes occasional failures > + PG_TEST_USE_UNIX_SOCKETS: 1 > + PG_REGRESS_SOCK_DIR: 'c:\pgsock\' > + TAR: "c:/windows/system32/tar.exe" > + # for mingw plpython to find its installation > + PYTHONHOME: D:/a/_temp/msys64/ucrt64 > + > + MSYS: winjitdebug > + CHERE_INVOKING: 1 > + MESON_FEATURES: >- > + -Dnls=disabled Missing comments from .cirrus.tasks.yml Thanks for working on this! Greetings, Andres Freund