Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tZTcY-009xhH-CX for pgsql-general@arkaria.postgresql.org; Sun, 19 Jan 2025 11:37:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tZTcW-006geU-0o for pgsql-general@arkaria.postgresql.org; Sun, 19 Jan 2025 11:37:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tZTcV-006geM-Mi for pgsql-general@lists.postgresql.org; Sun, 19 Jan 2025 11:37:36 +0000 Received: from mail.hjp.at ([212.17.106.138] helo=rorschach.hjp.at) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tZTcS-000MHN-2J for pgsql-general@lists.postgresql.org; Sun, 19 Jan 2025 11:37:35 +0000 Received: by rorschach.hjp.at (Postfix, from userid 1000) id 6A4727FF14; Sun, 19 Jan 2025 12:37:01 +0100 (CET) Date: Sun, 19 Jan 2025 12:37:01 +0100 From: "Peter J. Holzer" To: pgsql-general@lists.postgresql.org Subject: Re: Design of a reliable task processing queue Message-ID: <20250119113701.tnipxsn2gs4247zu@hjp.at> Mail-Followup-To: pgsql-general@lists.postgresql.org References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="aetkqwtmrrmchpdy" Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --aetkqwtmrrmchpdy Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2025-01-18 19:44:07 +0900, Alex Burkhart wrote: > REQUIREMENTS >=20 > 1. Pending actions are persisted to a database. There's a trace once they= are > done. > 2. Application workers pick actions one by one. At any given time, each a= ction > can be assigned to at most one worker (transaction). > 3. If multiple actions have same "lock_id", only one of them is processed= at > the time. That has to be action with smallest id. >=20 > MY ATTEMPT >=20 > I got something approximate working with the following setup. >=20 > =3D=3D=3D BEGIN DATABASE QUERY =3D=3D=3D >=20 > DROP TABLE IF EXISTS actions; >=20 > CREATE TABLE actions ( > =A0 id SERIAL PRIMARY KEY, > =A0 lock_id BIGINT, > =A0 action VARCHAR(255), > =A0 done BOOLEAN DEFAULT false > ); >=20 [... removed Go code ...] > This code generates transaction like this. >=20 > =3D=3D=3D BEGIN TRANSACTION =3D=3D=3D >=20 > BEGIN; >=20 > -- Lock one row in "actions" table. >=20 > SELECT id, lock_id, action > FROM actions > WHERE done =3D false > ORDER BY id > LIMIT 1 > FOR NO KEY UPDATE > SKIP LOCKED; >=20 > -- Lock other transactions that process same lock_id. >=20 > SELECT pg_advisory_xact_lock(%lock_id); >=20 > -- Work on the action... mark it done at the end. >=20 > UPDATE actions SET done =3D true WHERE id =3D %d; >=20 > COMMIT; >=20 > =3D=3D=3D END TRANSACTION =3D=3D=3D >=20 > Which almost does the job. By running the Go client concurrently, it picks > actions one by one and processes only one transaction with same "lock_id"= at a > time. >=20 > However I'm worried about the gap between the row lock and "lock_id" lock= =2E This > leaves a room for requirement "That has to be action with smallest id" to= be > unsatisfied. I'd be worried about that, too. Let's assume you have two entries in your queue: 123, 26, 'Create instance 26', false 124, 26, 'Update instance 26', false And also two worker threads. Then this scenario is possible: Worker A executes the SELECT ... SKIP LOCKED query and gets the record with ID 123 Worker B executes the SELECT ... SKIP LOCKED query. Since record 123 is already locked it gets the record with ID 124 Worker B executes the SELECT pg_advisory_xact_lock(26) query successfully and can continue Worker A executes the SELECT pg_advisory_xact_lock(26) query, but blocks and has to wait for worker B to commit (or rollback). So in this case the update would happen before the create, which isn't good. > QUESTION >=20 > Is there a way to improve this attempt and close the gap? I'm not sure if I like this solution (if feels a bit hacky), but the following should work, IMHO: 1) After the first select, do another on all records with the same lock_id: SELECT id, lock_id, action =A0 =A0 =A0FROM actions =A0 =A0 =A0WHERE lock_id =3D %d AND done =3D false =A0 =A0 =A0ORDER BY id =A0 =A0 =A0FOR NO KEY UPDATE 2) Check whether the id you got first is the smallest of all. 3) If it isn't, rollback and start over. 4) If it is, you have now locked all the rows with the same lock_id and can continue. The advisory lock isn't needed then. hp --=20 _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" --aetkqwtmrrmchpdy Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmeM49cACgkQ8g5IURL+ KF0eEBAAp+sdisvEdEKlH1LyEm0Jkl+XasB6j1dy00yblPvdu8QrXyhkLRuG8h7P csm/87PvlvOCe1r1hGWVVAyBNSKTZHN4oWoSXFjRCiMF9n0fjcx2vcMpT0k9J3+f ipiGr6GxzqXqumHULFfcZC//idT7zz0NqSGd9gG5JGY7ifSlANl89aeUrwlb82CX LeU5Um6cL6l9oImdefDL1c/gQ9icngv5HBzmkin7G3hMUz4UwB9YYgJFVSgOWTbQ UOVy4frk/b0ZByaVaUvO1lRe9WHgQayJbZGP0ObM8EWMEBwMOHMOzpQqoghqTP1P pR/cITy+rTL9CCbbfN7ik1Kky3FTSSlcOz3ohmXzXpcQwBN2myxk4KV/4qbfDfrR 7o+tF1YYAIB5Mp1gfsmDhrKxrW+wkphzKXIdWa0CkZr1B5sHIkeOdYHXX56CqeMU 63lQ2CrZOtm31hGI+Lhr8JoihFqk4I9EYeS7Y0Vbg2p9y2h5FGKZP7WV2OSSBWBY P/24P3nKJsBbDBeWtN6jly0tf9Uzs1C3Ti3qLyAEQJDTOchsbNWDPwxRpZRO4tc2 rn/qRKPf5xLRSy7cKsXWohBFs5CuNaVCg/roHjkcZy4GUXzvc7cfuv+kDLNBbV8b O+5Aydkn9WWL8NJd6/VWIVRXgHZlAlEZIX4fulbiR8t38PD9xrM= =o1XP -----END PGP SIGNATURE----- --aetkqwtmrrmchpdy--