Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sgOTw-001cNg-9i for pgsql-hackers@arkaria.postgresql.org; Tue, 20 Aug 2024 13:01:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sgOTu-00HEgb-6K for pgsql-hackers@arkaria.postgresql.org; Tue, 20 Aug 2024 13:01:02 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sgOTt-00HEfM-Sq for pgsql-hackers@lists.postgresql.org; Tue, 20 Aug 2024 13:01:02 +0000 Received: from mail-pj1-x102f.google.com ([2607:f8b0:4864:20::102f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sgOTq-000cEk-TS for pgsql-hackers@postgresql.org; Tue, 20 Aug 2024 13:01:01 +0000 Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-2d439572aeaso1483218a91.1 for ; Tue, 20 Aug 2024 06:00:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724158857; x=1724763657; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4UxHbN+NaygGZRRuuiFwtMux0OmGTPRBKQSY+GXQqqM=; b=RlunAQGJYB8LoiJf24+QjwQsNc4fzykoKiGI7adYstj3Qq8VcAo/OV0MDuBQirpGj6 quYqx/h6p2JyjtgcajZqD6Tb2BemAGOrjUMfXf/8VKWDEe/jXaMi98tCto4V6m5jJVSP m0j++1izhswwgo8k2H6LznBRg6FUJOwb0RzpA+5ofxpC5a6iQcNZnUlfIovukItbucW3 sZdouZvpezOsydUzcIo3cVqBtLK8O3UqoZhlVO7PySBQusseG5xPukzn8KI9jt+2CVUz ZI7dgzpACl4I2SENgg+7VZkzehuPVxWMRq9oruHeLcBLGOkj5wlz2h5x7a5tK3p128v4 nqvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724158857; x=1724763657; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4UxHbN+NaygGZRRuuiFwtMux0OmGTPRBKQSY+GXQqqM=; b=BdGaWH/+pFyoPhRAW9pVlfBulcDfAjmWzSCzqgw4JC7WLOAJbD0NN5Fe/ewhP1ScLn gvdtF2pP2CISolQ2087DDwp8NiM0E00fh2NFU6nPjWyApdocxL3azE8whK30cIoAS2hc 6bUIY6/ZHnWVQVTqDe+oUVoX1tXvCKcaEdCi25pS5L5ontpFy6nNXDZC6EKv3aC5izeV ssNqAqEIAKAlNXWM4zLWD3wXoPzkhS41fug38+CImfhjo104xNXPW0iTwkzftS/1kZt9 18aSDUD2MURmMPs/xYMGpICSFeye2dzuTJjlBersLCuJjGgeewjStURly7ro6HFAGy9H rp9g== X-Forwarded-Encrypted: i=1; AJvYcCWZHy+8rlq3v80TZUS4fn2yDV0uKjuKjMluIIdgVMrSq/VBvbaHH9BjMBYPkrjVOCUrUoz9yn/fDGYvgAVIFI3bRMZ4p88j+xIna5C4 X-Gm-Message-State: AOJu0Yycl/ioGdzZHOoMl06RaROZKSdFTa+O2cjkMUAk3Wn9NxrAZ4Og b4iqnYuhJu0N6hNwLdDAFB42t5ei3QymQqKHGW03kmsdivkqHCxS138XerroGfyniiEVbhrw9gp V9oltrTqiOWbKYrMAk0wverB3k0o= X-Google-Smtp-Source: AGHT+IF9SV+fd6skM7bliKpR3U6FOHA/wgDowOr3YuarHFQRwj1rsx8CR0a431fTcGG9x1sPjphM4ajXJDJVBu1+vkU= X-Received: by 2002:a17:90b:1809:b0:2c3:34f4:5154 with SMTP id 98e67ed59e1d1-2d4731ec39dmr4340758a91.1.1724158855781; Tue, 20 Aug 2024 06:00:55 -0700 (PDT) MIME-Version: 1.0 References: <202406191709.jbvpf7d7hl6g@alvherre.pgsql> In-Reply-To: From: Amit Langote Date: Tue, 20 Aug 2024 22:00:38 +0900 Message-ID: Subject: Re: generic plans and "initial" pruning To: Robert Haas Cc: Alvaro Herrera , Andres Freund , Daniel Gustafsson , David Rowley , PostgreSQL Hackers , Thom Brown , Tom Lane Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Aug 20, 2024 at 1:39=E2=80=AFAM Robert Haas = wrote: > On Fri, Aug 16, 2024 at 8:36=E2=80=AFAM Amit Langote wrote: > > So it is possible for the executor to try to run a plan that has > > become invalid since it was created, so... > > I'm not sure what the "so what" here is. I meant that if the executor has to deal with broken plans anyway, we might as well lean into that fact by choosing not to handle only the cached plan case in a certain way. Yes, I understand that that's not a good justification. > > One perhaps crazy idea [1]: > > > > What if we remove AcquireExecutorLocks() and move the responsibility > > of taking the remaining necessary locks into the executor (those on > > any inheritance children that are added during planning and thus not > > accounted for by AcquirePlannerLocks()), like the patch already does, > > but don't make it also check if the plan has become invalid, which it > > can't do anyway unless it's from a CachedPlan. That means we instead > > let the executor throw any errors that occur when trying to either > > initialize the plan because of the changes that have occurred to the > > objects referenced in the plan, like what is happening in the above > > example. If that case is going to be rare anway, why spend energy on > > checking the validity and replan, especially if that's not an easy > > thing to do as we're finding out. In the above example, we could say > > that it's a user error to create a rule like that, so it should not > > happen in practice, but when it does, the executor seems to deal with > > it correctly by refusing to execute a broken plan . Perhaps it's more > > worthwhile to make the executor behave correctly in face of plan > > invalidation than teach the rest of the system to deal with the > > executor throwing its hands up when it runs into an invalid plan? > > Again, I think this may be a crazy line of thinking but just wanted to > > get it out there. > > I don't know whether this is crazy or not. I think there are two > issues. One, the set of checks that we have right now might not be > complete, and we might just not have realized that because it happens > infrequently enough that we haven't found all the bugs. If that's so, > then a change like this could be a good thing, because it might force > us to fix stuff we should be fixing anyway. I have a feeling that some > of the checks you hit there were added as bug fixes long after the > code was written originally, so my confidence that we don't have more > bugs isn't especially high. This makes sense. > And two, it matters a lot how frequent the errors will be in practice. > I think we normally try to replan rather than let a stale plan be used > because we want to not fail, because users don't like failure. If the > design you propose here would make failures more (or less) frequent, > then that's a problem (or awesome). I think we'd modify plancache.c to postpone the locking of only prunable relations (i.e., partitions), so we're looking at only a handful of concurrent modifications that are going to cause execution errors. That's because we disallow many DDL modifications of partitions unless they are done via recursion from the parent, so the space of errors in practice would be smaller compared to if we were to postpone *all* cached plan locks to ExecInitNode() time. DROP INDEX a_partion_only_index comes to mind as something that might cause an error. I've not tested if other partition-only constraints can cause unsafe behaviors. Perhaps, we can add the check for CachedPlan.is_valid after every table_open() and index_open() in the executor that takes a lock or at all the places we discussed previously and throw the error (say: "cached plan is no longer valid") if it's false. That's better than running into and throwing into some random error by soldiering ahead with its initialization / execution, but still a loss in terms of user experience because we're adding a new failure mode, however rare. --=20 Thanks, Amit Langote