Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1raG0m-008rfc-Ru for pgsql-hackers@arkaria.postgresql.org; Wed, 14 Feb 2024 14:13:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1raG0l-006C8R-GF for pgsql-hackers@arkaria.postgresql.org; Wed, 14 Feb 2024 14:13:19 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1raG0l-006C8I-6K for pgsql-hackers@lists.postgresql.org; Wed, 14 Feb 2024 14:13:19 +0000 Received: from mail-wm1-x329.google.com ([2a00:1450:4864:20::329]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1raG0h-007AIz-TG for pgsql-hackers@lists.postgresql.org; Wed, 14 Feb 2024 14:13:18 +0000 Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-411e71d8a2bso5123255e9.1 for ; Wed, 14 Feb 2024 06:13:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1707919994; x=1708524794; darn=lists.postgresql.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=sJQEIoEsp8ELHbNosMWusnn1NE+jNBjoU1/jb67dPZY=; b=RPTJ6C7vOoVVpYAEZdn1Vgx0I8S+l0qRQhdTFqT7uGADnRhJJGp23plVLB2W/evEa0 Q12MLKktVpm7jkzrFatW5t9qZ2XDXtOB2q1GUGPJT0hRKd6Jw1bZhM2T3gKDGARMaZVN E19V89d/JvjCzdJwRDVn9FE554f7Lq6ichUjwjyMAbs9+Yhw4ZJOn+rZVyGmH2HqVJVn ez57Y35I/GaZcZX3kGKVizrKqEOB8/ypmzdeoGEk2X9SUApxHae7BQQff9EKKg8/JnFl 3JgI/sfACR5eLNN17U2dSCqSj9ca9Aw7QOK4a77NFg8GENjNiOKYpNcZoqPnJEs4fVxJ lNwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707919994; x=1708524794; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sJQEIoEsp8ELHbNosMWusnn1NE+jNBjoU1/jb67dPZY=; b=hS1kMxfNR+veiKcBX0vqtouIj640A+Vh4wxmdmMsKfKfEDsGM69KSOdtq3bp9ECE+G YHjzQ1+3Pk0Fhy65dEzfj52nAy5pMXNi+oIuJlbpo0EkOC2DAn8QL5AbPT7McNfc2ujH ILoTNitVsWRxCTfVIMgJqskLzVP+YPtS51eO5irullV1K2sWdp/xxcn7EKIW2nHMWqo4 teY+G5Z3+MYbLEcmgfZao5/22H9E8rovaaYpTS9nE8gUjrfrLiGsWC0hEA9DNxczrNBo BxgI8ZnC7UW+eMAX4LbnjJFo6+3YE9DFClZZOhXHFvGXzQcvggEYOYuQBgnnqIQdc6GR qgNw== X-Forwarded-Encrypted: i=1; AJvYcCWjEHqd0TaIzal3bJLsGMwYq/TDXlHnYRzaoFSQFGkD6T0TozCix40PGr+a+lsKEYGymFfZhYurXACxAbpX3w9Pr8HJCOlpWsbga+ZXFWP8+QLs X-Gm-Message-State: AOJu0YzKyoDj6Ebrm0CfVq9KXA+55VXxQ1GXllPUKhXewQoyzL5q2KE4 Dgx6Vz3qZyCpPmgg333PT+jWIWA1XZrwJkoM5p9zTYYZ3Ojnty4Xi5AVGcjWnQ== X-Google-Smtp-Source: AGHT+IFKna7SSzuQG+dNNXww8TVo3hMUpUHmqd5hpmbE1vX+8LVH8+XdfpbnvachLTWMVAuy2G5MsA== X-Received: by 2002:a05:600c:450a:b0:411:c5fc:617b with SMTP id t10-20020a05600c450a00b00411c5fc617bmr2381830wmo.3.1707919994061; Wed, 14 Feb 2024 06:13:14 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCU4d8njahBFZ0KeVFTlvwD/v6xOLl0IZV1qMCNDas60x0GVrMwAOjpEXfqE6cmGIy2MktiOS0hx1Lky23eRuPLfjq6GNax1fok9ba7FE+0MIwWUPIH0EtrdncFOyHrsHCeyFSMe+KyN9ClagCCfjuknpVXeYxD5P1KIDPCPrksxk5wo849J/rsLOuOpznw6JE1F8dEeXhrHN/ySMJoPnSLl1/iF2YCzVSSWagTRbFNs687QNoqQj3wno9wGDUHgdywh9H4a3jQhGSscoA== Received: from [10.137.0.18] (ip-86-49-229-30.bb.vodafone.cz. [86.49.229.30]) by smtp.gmail.com with ESMTPSA id o19-20020a05600c379300b00411c3c2fc55sm2031861wmr.45.2024.02.14.06.13.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 14 Feb 2024 06:13:13 -0800 (PST) Message-ID: <5fc8f150-f350-4cb3-b6b7-ffcc21ed7bc2@enterprisedb.com> Date: Wed, 14 Feb 2024 15:13:12 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: index prefetching Content-Language: en-US To: Robert Haas , Melanie Plageman Cc: Andres Freund , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar References: <8ec36f51-b863-60e3-20e2-b9c981c5ce5e@enterprisedb.com> <98ba4b25-fae8-c1f4-1597-8093375a1986@enterprisedb.com> <20231221134314.wf2rs62d37u62j7t@alap3.anarazel.de> <20231221154352.ijtg6wloa3nowivh@alap3.anarazel.de> <482ec3ff-52ad-415d-96fd-f3832a894023@enterprisedb.com> <56176b8d-956c-487e-ab09-310db4581c07@enterprisedb.com> <4867452a-b853-4813-a6da-9bb06a336f8b@enterprisedb.com> From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2/14/24 08:10, Robert Haas wrote: > On Thu, Feb 8, 2024 at 3:18 AM Melanie Plageman > wrote: >> - kill prior tuple >> >> This optimization doesn't work with index prefetching with the current >> design. Kill prior tuple relies on alternating between fetching a >> single index tuple and visiting the heap. After visiting the heap we >> can potentially kill the immediately preceding index tuple. Once we >> fetch multiple index tuples, enqueue their TIDs, and later visit the >> heap, the next index page we visit may not contain all of the index >> tuples deemed killable by our visit to the heap. > > Is this maybe just a bookkeeping problem? A Boolean that says "you can > kill the prior tuple" is well-suited if and only if the prior tuple is > well-defined. But perhaps it could be replaced with something more > sophisticated that tells you which tuples are eligible to be killed. > I don't think it's just a bookkeeping problem. In a way, nbtree already does keep an array of tuples to kill (see btgettuple), but it's always for the current index page. So it's not that we immediately go and kill the prior tuple - nbtree already stashes it in an array, and kills all those tuples when moving to the next index page. The way I understand the problem is that with prefetching we're bound to determine the kill_prior_tuple flag with a delay, in which case we might have already moved to the next index page ... So to make this work, we'd need to: 1) keep index pages pinned for all "in flight" TIDs (read from the index, not yet consumed by the index scan) 2) keep a separate array of "to be killed" index tuples for each page 3) have a more sophisticated way to decide when to kill tuples and unpin the index page (instead of just doing it when moving to the next index page) Maybe that's what you meant by "more sophisticated bookkeeping", ofc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company