Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1scRW7-007KR7-N7 for pgsql-general@arkaria.postgresql.org; Fri, 09 Aug 2024 15:26:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1scRW6-0043vy-Bm for pgsql-general@arkaria.postgresql.org; Fri, 09 Aug 2024 15:26:58 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1scRW5-0043tk-VP for pgsql-general@lists.postgresql.org; Fri, 09 Aug 2024 15:26:58 +0000 Received: from mail-ot1-x330.google.com ([2607:f8b0:4864:20::330]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1scRW2-003u22-Sw for pgsql-general@lists.postgresql.org; Fri, 09 Aug 2024 15:26:56 +0000 Received: by mail-ot1-x330.google.com with SMTP id 46e09a7af769-70949118d26so1517707a34.0 for ; Fri, 09 Aug 2024 08:26:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ilm.com; s=google; t=1723217214; x=1723822014; darn=lists.postgresql.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4xwFXLYtTpQruCyIhjzyc7yqpseDcwDJVhHw/3NzxOo=; b=bgNjdzW5l2yyuTh5fixaK9DE7zuGOKVMGX7e7vlwt8yIY5jwcAVjUY4DUpcb8QBN0g cv8mYF4fo0QubBrgPm6DcpXkTnyIZjVDLhj7h0Exi/UTGCzj8Ow6jNdh1yFwsnXKvA8h P+f1KD2ygPQyOG0UBgoG3XT7lrNWBmTAUJ3OQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723217214; x=1723822014; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4xwFXLYtTpQruCyIhjzyc7yqpseDcwDJVhHw/3NzxOo=; b=Z1NEywwZ/nvWYmkZ41VJ/eTpcjrMppKi8/58/BuNPJdti2CYOUIfMp5mhExHPG0Piv YDJ5e8gwQqyA1SzdcPYKx/vP+Ns/cVJRazc0yfKg4kyqDQeclAdRSEjloPdpgHSa2Jnh FnTSAKZkGuCh2xhQAUOMEkRC/7poKKS4TXenWPitHmqGfV5RuaVvt3QfjJXa4UB+yMzR bhRg3XQwGEW36+LtukEskLhLRtw9GVNfaPUXtM3YR4m6hg/HShHpa83bxCaXOF1UjeHg /b4IPht7BXNnM2rcY+vEAjLBwSFDM5IaJOwmLkE8gQnblSQ5Gaand78zWvZGpn/RzDin QUjw== X-Gm-Message-State: AOJu0Yz+T39FvSxuGvZ1JgVfwePiFPuJ1eBDTUtzAGko/esLc3hFfv4d VGbK+LTO2ruu+rFrIDAQyvTkcVNKrxvlelJekSmRO0J4xCYLLcGDUdUFpSyCup+o6LnGn69gSvq KAAPmw3FZBjIRmLFNfXhS6fp2Hl13xX4yRMeZ2Pe8mkR5W9JI X-Google-Smtp-Source: AGHT+IGKDtyr2mGktm9oNjb5JNGSbipt+E8tEBhP0KJxfPvFHKhASue+M70ZgvI2JvYvIRt0zUHWS/7zkyhWptebw9U= X-Received: by 2002:a05:6830:348e:b0:703:b0e9:d951 with SMTP id 46e09a7af769-70b6b301a0dmr2457850a34.6.1723217213728; Fri, 09 Aug 2024 08:26:53 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jim Vanns Date: Fri, 9 Aug 2024 16:26:42 +0100 Message-ID: Subject: Re: Trouble understanding how to avoid/manage ERROR: multixact "members" limit exceeded To: pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk To clear up any confusion, I tried the reductions as hinted in the error message and after a few iterations still faced the same error. Also, re. the sample configuration I provided - this is the result of said iterations after initially using the stock values from the PGDG RPM. Jim On Fri, 9 Aug 2024 at 11:26, Jim Vanns wrote: > > Hi pggen community! > > I am struggling with this error almost daily now and despite various effo= rts, not succeeding in avoiding or dealing with it; > > ERROR: multixact "members" limit exceeded > DETAIL: This command would create a multixact with 2 members, but the re= maining space is only enough for 0 members. > HINT: Execute a database-wide VACUUM in database with OID 16467 with red= uced vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age = settings. > > Runtime details follow below, but before that; I am now seeing the above = error almost daily after approximately 12 hours of normal or expected behav= iour and throughput. Then it hits and all writes are blocked etc. and the s= ervice is largely unusable/unable to recover. Restarting PG does allow auto= vacuum processes to kick in with aggressive vacuuming to handle the multixa= ct freezing, but that isn't a suitable solution! Although having read sourc= es that now explain why multixact XIDs exist and when they're used, I am no= t able to properly figure out how to plan for it or configure postgresql ap= propriately to handle it given our workload. > > My questions are; > > 1) How should I be managing this? Although not new to PG, I am new to thi= s particular problem. > 2) How can I confirm what processes/tables are contributing to this multi= xact "members" limit? > 3) What are the units for vacuum_multixact_freeze_min_age/vacuum_multixac= t_freeze_table_age and how should I be > setting them appropriately for my rates etc.? I can't really find any= thing that explains this clearly. > 4) How can I check that autovacuum workers are specifically able to freez= e multixact XIDs and thus avoid this? > 5) Can I check if autovacuum is actually succeeding in its work? > > Schema (brief): > 10 tables > 1 table (the 'hottest) partitioned by 2h ranges; so an extra 12x partitio= ns/relations a day, though only the most recent one might be considered 'ac= tive' > > System (brief): > PG: 15.5 w/ TimescaleDB 2.14 extension > Write heavy workload; > Mean Txn/s (low): 8k > Mean Txn/s (high): 10k > Mean rows/s: 100k > Concurrency: 32 threads (local socket sessions) for 'upserts' via primary= service plus auxiliary processes (background workers/autovacuum workers et= c.) > > Pattern (brief): > COPY (binary) to session-local temporary tables (ON COMMIT DELETE) > INSERT FROM tt TO main_table(s) > UPDATE FROM tt TO main_table(s) > VACUUM tt (every 100k txns) > > Config (excerpt): > # - Transactions - (based on a period mean of ~8k txn/s) > # See/ref; > # www.postgresql.org/docs/15/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND > # www.percona.com/blog/tuning-autovacuum-in-postgresql-and-autovacuum-int= ernals > # blog.sentry.io/transaction-id-wraparound-in-postgres > # https://blog.danslimmon.com/2023/12/11/concurrent-locks-and-multixacts-= in-postgres > # https://pganalyze.com/blog/5mins-postgres-multiXact-ids-foreign-keys-pe= rformance > # What unit is 'age' here? No. of. transactions? > vacuum_freeze_min_age =3D 28800000 # 1h @ 8k/s > vacuum_multixact_freeze_min_age =3D 28800000 # 1h @ 8k/s > autovacuum_freeze_max_age =3D 157600000 # 2h @ 8k/s > autovacuum_multixact_freeze_max_age =3D 57600000 # 2h @ 8k/s > vacuum_multixact_freeze_table_age =3D 115200000 # 4h @ 8k/s > > # We don't UPDATE or DELETE often; we never DELETE and only perform > # UPDATE operations every 6h due to internal cache(s). So we set this to > # zero so the formula doesn't consider it a major factor > autovacuum_vacuum_threshold =3D 0 # In tuples/rows UPDATEd or DELETEd > autovacuum_vacuum_insert_threshold =3D 5000 # In tuples/rows INSERTed > > autovacuum_vacuum_scale_factor =3D 0.1 # 10% > autovacuum_analyze_scale_factor =3D 0.1 # 10% > autovacuum_vacuum_insert_scale_factor =3D 0.1 # 10% > > autovacuum_naptime =3D 60 > autovacuum_max_workers =3D 8 > > # Give autovacuum more credits to ensure a better chance at scanning > autovacuum_vacuum_cost_limit =3D 2000 # 10x the default > > Thanks for your help and any guidance/knowledge you can share! > > Jim > > -- > Jim Vanns > Principal Production Engineer > Industrial Light & Magic, London --=20 Jim Vanns Principal Production Engineer Industrial Light & Magic, London