Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sA3Fg-007DeA-LO for pgsql-general@arkaria.postgresql.org; Thu, 23 May 2024 07:52:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sA3Fg-009XhV-3M for pgsql-general@arkaria.postgresql.org; Thu, 23 May 2024 07:52:40 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sA3Ff-009XhN-Mw for pgsql-general@lists.postgresql.org; Thu, 23 May 2024 07:52:39 +0000 Received: from mail-ed1-x535.google.com ([2a00:1450:4864:20::535]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sA3Fa-000LOq-9J for pgsql-general@lists.postgresql.org; Thu, 23 May 2024 07:52:39 +0000 Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-57230faeb81so6736008a12.0 for ; Thu, 23 May 2024 00:52:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec-at.20230601.gappssmtp.com; s=20230601; t=1716450752; x=1717055552; darn=lists.postgresql.org; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:to:from:subject:message-id:from:to:cc :subject:date:message-id:reply-to; bh=TVxXDOEsuVEgK5gfO94BSgjygI60Gw6tx2ylkud3Sv0=; b=zVKwE5mPKXwcHMoqLboXv9v6uRwwM6TqS3qNE7D0Y1BaYBgAQll7B3qBekoKO2zgD1 mr8OVvlxzUEZ0/gACa8PnQlZZMnArpsx1Z4IH1RVEk9MrzOHVutaGBRpZLFEAGgtwF9w x8ZAGFnTKpNU1zCJyXNJYU6nBKhpHogzfYB1PsC4xQdz1R3dmKOMu0tIdbFvs15f1AzN PlAxVwL7aY4DKPyqQmNFEXnczOeCvFX1P4T0EAMD/8efcF+DPkN5yAQ8b2c3vpi+FRDC 4EYBobX3d6oa5HyYf/JqCsZOG1emtXap+BKdFZcQmvF/Hl8RhXpEJo9xL2nfo+s0TByX fgJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716450752; x=1717055552; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:to:from:subject:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TVxXDOEsuVEgK5gfO94BSgjygI60Gw6tx2ylkud3Sv0=; b=tMpSmeTeRWxwKFho9ELOggsrT886xyOEbZXILO5dG+YVBzAKRZfzopYNOLeTmjUb4a x+wzZ5UYADUHt9FIvIxJrOo863zXrnLF01cW4QMI6GCxxtKfbAzOsEq5GL6dSGgPyfSd dMMFw+KeK1/xDf9tJ0vgjiao/vCfWsl14PbG/bQvNgZcqHNCBRvUhzyBVucaErQgIQcR v5Q+GCTSlDGVEcWLaLEcqRSiZbGRVZqU9rnu6pj6awgnrJ2NEn96aV/0GtbSIN0Lv3Q+ rraadubPG10S4wM78fHEtniwGDHLTBWGY29rgrQctiPdfYzQx/qTLv9TSAPOBTKSUNdJ CQJw== X-Forwarded-Encrypted: i=1; AJvYcCWshEOorq09WObZ/KO+M26JxMqSc/7mLtrvWkrZ2eRYfAcdT//JfFq+xZ44wTD8/8RYfFOQ25JEB3JkMAzTL3H6FEBBoTY30W2VJ/Lzcb290MTr X-Gm-Message-State: AOJu0YyasHF5dJOMigIG9YiN73Jxvb7jimcVioUzaJBRlChT5Wt6iDWk 4rZLYdnyd3LP1iywh1qupsreKfj6k+fJNjjLKSeZNOJSvIxqOqh0x0/E5Y+ABN8= X-Google-Smtp-Source: AGHT+IG3+yOF7XfT+aCkP2rZasHuDorPHXqkCPFgYwIAiBv3ZDpYqmh6iyRpjfO8BN/u5mdg645VPQ== X-Received: by 2002:a50:8a93:0:b0:574:ecec:1887 with SMTP id 4fb4d7f45d1cf-57832c334dfmr3050927a12.32.1716450752424; Thu, 23 May 2024 00:52:32 -0700 (PDT) Received: from localhost.localdomain ([2001:871:5e:e895:ad22:deaa:5d3c:705d]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5733c34fed1sm19328211a12.96.2024.05.23.00.52.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 00:52:32 -0700 (PDT) Message-ID: Subject: Re: Long running query causing XID limit breach From: Laurenz Albe To: sud , pgsql-general Date: Thu, 23 May 2024 09:52:31 +0200 In-Reply-To: References: Autocrypt: addr=laurenz.albe@cybertec.at; prefer-encrypt=mutual; keydata=mQINBGGDwAQBEADgbWy5cKXQld3N2mF+DFyiNFbi2oBl2T+XgxpPF8wTRw2D/u4bBKXP0SYSE/lA86jIVNWWU0gf1KODIkVvgJm2w4vH2VBV1b7ddVViGl1Iu+9zaRnv9wulhnH42KefepXnoean6UT1EzLM0opF/Ik0j+40TxdRtobkBprkQUyHDXWlHc2ffPs3SipyFEP9AVLf7ejRC46CXWDnsqjOBSMEW8Z4HiK/8RrPZBsKLts8dJxKF4pygOdJb0CWk8k/X1jbcfdxo+zOLjOMvJcSJ2pFdJmQHU+JufB3rePziqQ2S9Ur6sccr9XnTC1GVBWN4Lf5VHq+vf+bFJjVwg+2hrySZnAVfcOrxoqFLErr7ug1zN2nM1kcpgA4VWn4gxlJtYNYYq+9WxX5dtvnNANlG3ZCrRKQzl8lxtzoF6Zo7LUhEqPaHDwn7Rvs+IdbOn41lF5UDTJGqmC4gS/bZydW2Fy3YWm4aSaN9fgFf8D+PVkrlKAZB7gBLz1TyHjbcRf85cYF+GKKrDld5SzMB/V60VX3oP/Eo8ikFpyWaqiz1f9X7MBot3/PjJkY+wDzp3nmb19QEcOBuQiSQ4xds2r0HewbuHTAR68u8jNNMGmpm2j4x+g09Jd/WQDjqlTBZ/jEltH41fYCCPWMfljXTOOXu2eLNGdfi7ETZogtwjM9oTtSPQARAQABtCdMYXVyZW56IEFsYmUgPGxhdXJlbnouYWxiZUBjeWJlcnRlYy5hdD6JAk4EEwEIADgWIQR0CqhbZGGABqoaSbdi8bhXA2EdmAUCYYPABAIbAwULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRBi8bhXA2EdmM/6EADK232JCwmBzhlj8h7U9CjG6kx0JHP3uJGv+XfsHtHAlmY/RCwF1BHMEsRlk bT5UrLvJ2jb99bA9QARzhFaxzyn0F/BUKzuIjRGNs/n6d5dNUFA0kOt8sX+TacmC GEyjEBCrVCm4ranBiUyePn9NhHNWnaex7pJyqvMLLdwW9BEMJx0Fqo+DN8ukbXmYRsmhEtd3ue+x/luYmOmJnaGtzInaY5aOJYbW9XqoRIZkZvOCgbi1FfvNmoqWa+3oVxTOgw9RafjJDyW0lTHzKGjbGI5ofMU98l+/hKJFYJqWUF6VpFJY5YIcN/1lf4ZICMwDl+MPIVo/tpq8L10seJL28nLlvw3K+cI+TVW8IW/qL/LyVoDofI3USeOORuYmhpWRhik8JXX6xf3v6GrRilJIPWNFIJbxm1ZblQiQnOw3IOW7T+8nAmPin1HKqM3VrOrJQ2VtShsefNBibNAsr1oFaqcDBkn3yGG8i6CTW+FyO4PZ+/EwNxMVgktxbYdy5AT1/lpXr5tB+phhLIyVfiBvrWs5EThxYMQ/L8Y85c3GMsAy1l/x4h3jqySIYy3SCU9+jc5UVuNnXljbvkEzJ+NLWJ6C1rACFWrMszgPdh5tCrlRY9PpmYll4JbCgb8BtxEIUmR+xr50/ZElEK5iml7Q00KUekCcDt+36PsyGFTXBzNOrkCDQRhg8AEARAAzOZ2tLHlI4rrhG411h6cdCFjBZxuljaFCxFyHn3m6wbGLqwBUWC5k8UrRqjHMz88KcTSaNO7XGAmCqPdWd2SeflPZRnNTbjsVpw7mLdffsBm4JX7kki2Pvk5h0NtYeidXT1PSpc2ri4DutYXuT9uD8RAm1wUDCE5HQNUihT/WH6opt+hskHW21uHao0+y822tG0QQcGMqdQR5Vxdxj89wiEPdqW+HpU/oOZIhrf2E7prduAppxixjHy/o1rcnoznnJvc8D3+YgI9O0LrBMij89dM55pRGbLovTR1oGR3U74sX774+0xmSzeIKwZfiMUz7Atlvfk5SHOsRUFPN2Ux9kaXiiBibQpHFxt7b lDrT4wxdLJ/XCdbPPAyl+lZtOLsaHEEZvYNyTXwZc35dVf3R4/oz20HoG6s7ct8e1 AQygj43XAERzty9SkWgxs8+grp1PrGx6FHVSYRqBM8dS/ZR6yRVwOwJXPyaSSqfIF21DkE4j1y4n+ItSewPGoRp8K/yWCikt6qlkVkO2ASNIiX04fAbtzwVOaNn8ZMRNqyvLc1fED4sr49onE4cAIcBLjcC3KL+w9DUGRQCdziROj5H2Yl/sXGPdMciUHo/Uz2rggc+2th3bQiMhrHWSsBpUkDQp0yWewemstPpPgBL3h2fHKaX8B9oH5Qu/H1IgrOuX8AEQEAAYkCNgQYAQgAIBYhBHQKqFtkYYAGqhpJt2LxuFcDYR2YBQJhg8AEAhsMAAoJEGLxuFcDYR2YuPwQAMkpGtR80pQ1gVsONhdkqj0H2eU66efP/gO3CoyaoIcvrpKYj7C2HipVSmkt1gpByL0X4AMQ/vKuknUz3wd28Ba+G1dCfbVs/Xiusq+SmpUj5rTwmYqdSjWMuCo1R6oS5hdJMdUUJYGMT0QkVlm1KnW8jkmCTl9GzjDxOAsN9O6/6lPzaGFtk9XF+34Bry/N4HKiJkqpC4+UTd0AprPfzJ2jdT64e1F0+W88X8y1bTTgNrHwK4mDiLnlE4SKRuEm54lNhJz//ar86Or5BErzNpM6TL7lk44QS06hwsMrEdKIy8J/SYJPjfzR8tIUnKscclVpOgjKaBqC+0iFiVaRqAgfOlIEiezX6kMh5Q2FIUfqs46qWhhXjRrdKOEoStYAaikdLu5ZXr7vfb0ZaDh+ZwTQtbSMFolyOkecwI81MCdbMfT/1TqIGTOdAj5as9fAakk0jb2pXgUYQ8X1DVTR8ahSDVEaw9VTmWiSvTxvguVJ1Mb7gG4Gmh6aviDTJhfXtH4rPUNXhDLqrTH8JkJjyKROOMakIF68Hjse5vUfUxreBEOtb5r1Coa2Fe7ncJayaSE7ryrDbFqpZ 36UMAx4ulWMyqJajLNGY0DdG8qIsR5nxRhrnK/mrCidZ8F9/D3bWAl4rjtHlsztN59 +AnW5l0HsQcY9ntFL/zEBOaonjdJf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.50.4 (3.50.4-1.fc39) MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 2024-05-23 at 02:46 +0530, sud wrote: > It's RDS postgres version 15.4. We suddenly saw the "MaximumUsedTransacti= onIDs" > reach to ~1.5billion and got alerted by team members who mentioned the da= tabase > is going to be in shutdown/hung if this value reaches to ~2billion and wo= n't be > able to serve any incoming transactions. It was a panic situation. >=20 > I have heard of it before , because of the way postgres works and the XID= being > a datatype of length 32 bit integer can only represent (2^32)/2=3D~2 bill= ion > transactions. However, as RDS performs the auto vacuum , we thought that = we need > not worry about this issue. But it seems we were wrong. And we found one = adhoc > "SELECT '' query was running on the reader instance since the last couple= of > days and when that was killed, the max xid (MaximumUsedTransactionIDs) dr= opped > to 50million immediately. This has nothing to do with autovacuum running. PostgreSQL won't freeze any rows above the xmin horizon (see the "backend_x= min" column in "pg_stat_activity"). > So I have few questions, >=20 > 1)This system is going to be a 24/7 up and running system which will proc= ess > ~500million business transactions/day in future i.e. ~4-5billion rows/d= ay > inserted across multiple tables each day. And as I understand each row = will > have XID allocated. So in that case , does it mean that, we will need > (5billion/24)=3D~200million XID/hour and thus , if any such legitimate > application "SELECT" query keeps running for ~10 hours (and thus keep t= he > historical XID alive) , then it can saturate the "MaximumUsedTransactio= nIDs" > and make the database standstill in 2billion/200million=3D~10hrs. Is th= is > understanding correct? Seems we are prone to hit this limit sooner goin= g forward. Yes, that is correct. You cannot run such long-running queries with a transaction rate like that. > 2)We have some legitimate cases where the reporting queries can run for 5= -6hrs. > So in such cases if the start of this SELECT query happen at 100th XID = on > table TAB1, then whatever transactions happen after that time, across a= ll > other tables(table2, table3 etc) in the database won't get vacuum until= that > SELECT query on table1 get vacuumed(as database will try to keep that s= ame > 100th XID image) and the XID will just keep incrementing for new transa= ction, > eventually reaching the max limit. Is my understanding correct here? Again correct. PostgreSQL cannot tell which tables the query will use in t= he future, so VACUUM cannot clean up old row versions in any table in the data= base. This is irrelevant for transaction ID wraparound, though: you'll get into t= rouble even if only a single table holds an unfrozen row that is old enough. > 3)Although RDS does the auto vacuum by default. but should we also consid= er > doing manual vacuum without impacting ongoing transactions? That won't help. Autovacuum is running, but cannot freeze the rows, and a = manual VACUUM cannot do it either. > 4)Had worked in past in oracle database [...] Oracle implements all this radically differently. > =C2=A0And in that case, It seems we have to mandatorily set "statement_ti= meout" to > some=C2=A0value e.g. 4hrs(also i am not seeing a way to set it for any sp= ecific > user level, so it will be set for all queries including application level= ) > and also "idle_in_transaction_session_timeout" to 5minutes, even on all t= he > prod and non prod databases, to restrict the long running transactions/qu= eries > and avoid such issues in future. Correct me if I'm wrong. That looks right. One thing you could consider is running the long-running queries on a stand= by server. Replication will get delayed, and you have to keep all the WAL around for the standby to catch up once the query is done, but it should wo= rk. You'd set "max_streaming_standby_delay" to -1 on the standby. Yours, Laurenz Albe