Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t0iMT-00Cph2-Hg for pgsql-general@arkaria.postgresql.org; Tue, 15 Oct 2024 14:17:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t0iMR-007GpK-Ss for pgsql-general@arkaria.postgresql.org; Tue, 15 Oct 2024 14:17:20 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t0iMR-007Gp8-Ha for pgsql-general@lists.postgresql.org; Tue, 15 Oct 2024 14:17:19 +0000 Received: from mail-ot1-x32c.google.com ([2607:f8b0:4864:20::32c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t0iMP-00112e-ED for pgsql-general@lists.postgresql.org; Tue, 15 Oct 2024 14:17:18 +0000 Received: by mail-ot1-x32c.google.com with SMTP id 46e09a7af769-716a601d931so2852296a34.2 for ; Tue, 15 Oct 2024 07:17:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729001836; x=1729606636; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=6bW9GUUqmj3011FnAXIXjRqHZ0Ub77Hk4U+zfNsf9Ro=; b=OVppMIjmC0n0Uar50oB/dHfnCGGTF5aeqws2PqG3pN8f7HcdvxkQmFDcg6Pq3NDTnA lRvwh2arDzKlzy6Z3mBlIDgbWVHlT4NcXFyPwB+vdD9VUf3M+9mSpt9Qk2xXkUZZ7OJX ki9o5yG27+TKnKa5xFKNQzAorf26qjshZwwkIB37VoWgnnyoJ/2fnQ6erLYNNxHrKDfe eGG/fdLX9/6n4r/b183IT0GJtocfl0JPEb4T/jte4KZqvbHvsU5s95hcKQZgSztVd3qj MzWVictwpqsTPb0ilpylGdmDt1ahPxGNB01Cvh1+qA8ChXymTVlSqOUuvNOFUFG6KhXA zC0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729001836; x=1729606636; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=6bW9GUUqmj3011FnAXIXjRqHZ0Ub77Hk4U+zfNsf9Ro=; b=JV1nJ1gCboM6AAHa4vmzjXyHnp0vinzsVqIoKKJ9Wy/2l326oDsn73e5F+EYNDt3bV f1bJZC0w3H3wThgS4GDAsAWLgzQTOJwTFE1/Me9THAgoy5RxXVJIgNx4Hq0XODH+oV/1 EnNgU2pzmuqPnWpu6lH4Z8jMNzBBJmdJDuKW1ahofXK1EnAaQ8rSQBsP6ucJ6gknhxoA 8866J1R4/crjussCOF4cKnCtwwlT7mXASv83k9K3A/cJreVb2MhWCLlizyI9/XqMlDMX Jea50P824vFUnnuv5kPoBOaIQTFoi5Ur+WtEFX4PavjlD2/bDV13SEx3BS2Yb+UYqCzi H19A== X-Gm-Message-State: AOJu0Yz16bzWAmM3WmseKxjFEvQDfoUFKEEBUph4mJvvX8sfZp8ntP7G pNC9i2ek322yQ7bUPkQxzM6DqYFigLW892A7zAAtafjsfwXMjkTaw4upvrzapXVAbW36tgP1x3e bE9aHaQ23g0S5RHri4O6nVqpjcPmYFA== X-Google-Smtp-Source: AGHT+IFz1dKjDa+R3pc2782VpL2tMPLzEFprsVcV9jLoq0JGYOysxJsffTy/s2k8+04H4OQ6EYollJJiDrYuaOn8AVw= X-Received: by 2002:a05:6830:2b0f:b0:716:ab1b:6460 with SMTP id 46e09a7af769-717d63e97a8mr11800419a34.5.1729001836227; Tue, 15 Oct 2024 07:17:16 -0700 (PDT) MIME-Version: 1.0 From: Dominique Devienne Date: Tue, 15 Oct 2024 16:17:04 +0200 Message-ID: Subject: Advice on efficiently logging outputs to PostgreSQL To: pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk I have an existing heavy ETL that serially loads tons of data to PostgreSQL. This is done using a CLI tool, processing one project after another. I'd like to parallelize / distribute the work, which I could do from my CLI tool, but 1) that would be confined to a single machine, and 2) we'd like to provide a web UI to report progress and logs from the ETL workers / processes. I mentioned in the past I have a basic LISTEN-NOTIFY-based work queue in PostgreSQL, that would do nicely for this, but I'm wondering how to best to deal with the logs. The ETL process is quite chatty, and can generate quite a bit of logs. Which for the Web UI to see (developped independently, by a separate team), I'd want in PostgreSQL as well. The ETL worker wants to write them, almost continuously. The Web UI wants to read them concurrently, to show them, and ideally in a timely fashion. My experience with CI tools like Jenkins is that logs are "laggy", and arrive in big chunk, dozens of seconds appart, which is a subpar experience. That's not PostgreSQL related, but just to illustrate what I'd like to avoid. Has anyone done anything along these lines? Given the way MVCC works in PostgreSQL, updating (by appending to) a "file-like" bytea or text[] seems like a bad idea. So each log line should be it's own row? Or lines with close timestamps aggregated together, to limit rows generated? If granular at the line level, will tons of small transactions be a problem? And blow-up our "Oid budget" too rapidly? Am I worrying too much? :) I'd appreciate any advise or experience-based story around this use-case, please. Thanks, --DD