Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6WTD-0006Ko-5n for pgsql-general@arkaria.postgresql.org; Mon, 13 May 2024 14:16:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s6WTD-000ZyY-6U for pgsql-general@arkaria.postgresql.org; Mon, 13 May 2024 14:16:03 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6WTC-000ZyP-RO for pgsql-general@lists.postgresql.org; Mon, 13 May 2024 14:16:02 +0000 Received: from mail-lj1-x22f.google.com ([2a00:1450:4864:20::22f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1s6WT9-0000cL-U7 for pgsql-general@lists.postgresql.org; Mon, 13 May 2024 14:16:01 +0000 Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2e6792ea67dso30737551fa.3 for ; Mon, 13 May 2024 07:15:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715609758; x=1716214558; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oN48r5BF+IWqOf+6dzE0sD0HNXDY4WIatSYntxmEoN0=; b=B3evIrcAOvuDZEQYMBQgB7/gpfHC2BszuFQhQtqfz42ysQSM/3d5MQf4TlnXqKmNpM jZnKlZIY8JqOKEptcBTwNULT+gZRD8JDZgibjH2QEd1kS9jmirC0eFWKd0b85HWlInqS GwF9tbEvVsm+kErdSEIL2P6Ka1T4DBoF4CvVZ7+0zy+Spc+na2Q1wyzw+jcQAyioXb1t epGTAXOaavm7Jsc4cQMo1dFBkSFEqj82CZl4jeOn7MrekkvsiVFJIcJj2rBne1roxjWE dXDrnKpGLNFkiOYoPiNgYRXasNZMEx3B8t1CnWVRTscglhq/wbTRNLzIUuaq4rSnqO7A UdEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715609758; x=1716214558; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oN48r5BF+IWqOf+6dzE0sD0HNXDY4WIatSYntxmEoN0=; b=uqwyfnRip77Bcae5uaSo2eMObU9UyKEhHqjam0SIXLejxXCQlmIBCP6C6q4zVWYHCP HaIl6HWa5RHupQSRKWc8dMoESrFcdzh5yb9vhmNff65AatRFtv934vxhSKlzXAswNUet zDnFctCuUkZwgejHWn+TNWLwM1YsrvMOnSFxvfH+obkYPsmSeSQ5M+5olmYK83gjpOlu seui36SFFN9pRVEkxLui0I9UHxe4mw6Sm7QOYTquSGSCfAmnkFBY+NqUEKxsErx+9KKX vOL3kCQls3TZlZl3GEnjZwpDnlvC1xSWwjOaMfqubT1eg6PVSK5kvIFkv9qA96Xycqza fSLQ== X-Forwarded-Encrypted: i=1; AJvYcCU7dam6IrzBLqHi57oRzE7YslvrNOTZyH8lP+ZXikWdoifBo97AD8r82W190e3fGQ6WCtnbgWKlazCQjfYPRllxPtCDi2m5goCIBEHcCblRkax/ X-Gm-Message-State: AOJu0YyUELF5GQauZzkBhmFg7wPNKQ91cLZQ4w14WgOHFuCtJfRrjV/8 /tc+dtmNi+z1+njrCX1FoC/Azm4BVEOK0suRJJdCS72N9tP+njQn8ViBujVQ5xiXNwa9AgVSPGA cbEcn7rNWDUoXm1jhxSZ/QeWWs+Q= X-Google-Smtp-Source: AGHT+IH9h8PD0D+SYXb38RPAgWVHCpzV/a9iXpCsvDvnaFRjs27oi2kFjfHrEyJO1GSLDR+xEI4CZohJZeSm57SDyvw= X-Received: by 2002:a05:651c:1251:b0:2e2:72a7:8440 with SMTP id 38308e7fff4ca-2e52028a885mr75022061fa.41.1715609758013; Mon, 13 May 2024 07:15:58 -0700 (PDT) MIME-Version: 1.0 References: <7886a68f-b466-2131-1747-f69f0fb71a37@gmx.net> <69077f15-4125-2d63-733f-21ce6eac4f01@gmx.net> <559b0e40-63e6-fa9a-6b03-d1eba10f30f8@gmx.net> <1629463.1715372568@sss.pgh.pa.us> <6a2f3906-3d7a-6924-7403-8f77d57a18e4@gmx.net> <3efce60b-48c0-160d-0444-474b02f76739@gmx.net> <98c715d1-22f4-0fc1-1997-6236873c13de@gmx.net> In-Reply-To: <98c715d1-22f4-0fc1-1997-6236873c13de@gmx.net> From: David Rowley Date: Tue, 14 May 2024 02:15:45 +1200 Message-ID: Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions To: Dimitrios Apostolou Cc: Tom Lane , pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 14 May 2024 at 01:52, Dimitrios Apostolou wrote: > > On Tue, 14 May 2024, David Rowley wrote: > > The query does contain an ORDER BY, so if the index is not chosen to > > provide pre-sorted input, then something has to put the results in the > > correct order before the LIMIT is applied. > > The last query I tried was: > > SELECT DISTINCT workitem_n FROM test_runs_raw LIMIT 10; I was looking at the original query. In that case, we have 2 ways to remove duplicate rows with DISTINCT, "Hash Aggregate" and "Sort" -> "Unique". Both of these will consume all of their input rows before outputting any rows. DISTINCT with LIMIT is a special case that we don't have a good operator for. In theory, we could have some "Hash Distinct" node type that was less eager to consume all of its input rows. When invoked "Hash Distinct" could consume input rows until it found one that didn't exist in the hash table. I've no idea how that would work when we exceed work_mem. However, most queries with a LIMIT will have an ORDER BY, so such a node likely wouldn't get much use. David