Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vK6L1-00Fmcm-1G for pgsql-hackers@arkaria.postgresql.org; Sat, 15 Nov 2025 02:48:30 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vK6Jy-008jwQ-1F for pgsql-hackers@arkaria.postgresql.org; Sat, 15 Nov 2025 02:47:26 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vK6Jy-008jw9-0G for pgsql-hackers@lists.postgresql.org; Sat, 15 Nov 2025 02:47:26 +0000 Received: from mail-qt1-x844.google.com ([2607:f8b0:4864:20::844]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vK6Jv-007CPv-39 for pgsql-hackers@lists.postgresql.org; Sat, 15 Nov 2025 02:47:24 +0000 Received: by mail-qt1-x844.google.com with SMTP id d75a77b69052e-4ede4a30a84so15784851cf.1 for ; Fri, 14 Nov 2025 18:47:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763174843; x=1763779643; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3LgSWyAEdevBf76QwONsJtTbt3xm8/v3Z3YIjovebIU=; b=TJWi3kD3Oe2an0ZcXlDgA5Y3ZsnmNTy2JtWXB6cA0mxlmFnFKsj5M9HPBjvr/pHa4b dRoW9H1A7Im4H67KHw7pPndpmny6vPNH/OmdUgVm5QBQe+zXCv8CJjs6Wikj4CA+EHrN 7OO1hXxaVSIl07HVZYtmSlhSHjr3gBW4k3YeDT6P9Jc0UvcHb3XC8kVwqiFzjX1i1zu+ AUQ3vBM0N+Hg9+ywdjIGEGNOI36b65Q2Z1zP4+erwRH3Dalf2GLeHHoiF3gc+nGeMACp chqRYNGORKvUauLrw5IDqQV6wEi1mb0snA/tjcHR1E+GL/+zIWlv+5AOEI9d9Bf1PQXg PoxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763174843; x=1763779643; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=3LgSWyAEdevBf76QwONsJtTbt3xm8/v3Z3YIjovebIU=; b=Ivnbi3Sx4pGUAo50bIiauhW9mQASSo14lh/4Hf3KntZBoTgciqpP7ocB0+JsAjdEJJ wScIoPEN9X04X6n0aUuYUT9zHhN82X0M/2/jrbNMc/sYJNdLFkta0O9qX7Y4ELr332Jx uXFUmTJBlLyyF83/M51srZhyJnrf4SjNRmyrB3/FyFpWY1u68WfyeRqkK3QoFB8ODiXR 6ezpc32Ft7BmWBH9DAIaxSFyu5rHRrj+Bdkv2aHpFMv3YoSKhsKnHxo3PLz1DNPZi3HV L6lzn9Cu5u7icj+QFAaQaaMYefz9TmrN8X65hgDGcODgTzffdYFEj2FFYPi3lwgYJqen JIEQ== X-Gm-Message-State: AOJu0YxFPjCs9OxtIbwQiojM/I+JG18VhhOaYQdw+OtdBkJ1Q9WZ10ep q/fw7xLEa6d2/1NwIL5XHXKHPgy0QNqYp1GYrmRdBef/FEbmpI9Q+hj54nbGL2UXLRUJYXrNmW+ +1JmbwYCcVwUjdLE0Ytg4LxOzr/ARSjA= X-Gm-Gg: ASbGncsKIYVPmqGvrVB6DqJ+0WG92Kpj5vchDZO/ZyN0lpeNGVdFSoHDZ5UxiNGeDvh 8Ogtn0lXYTMNzP3pWF9+MoU4eykZmyicVOXuYwghjiJ9rmlz7EWz2UIPeJio32JiiQ5pmsf5S8Y bIFNqz3ZC/ZvClEFOYd+l7sEXQuKjfhX/P3EWUDxjQlxycd7kimdHc7vzIuZO5IPAMmfLaqzJGT 9wlF/72FOo9fHnfQkVaVFJ+5+ydgLAuFXBC/F8z/1xIAgNotAslYDT5q2Daw7/2erMX3OvzoO/E oGCg0PyaDdZtrRWFJ6sCcpQz8wChsGdxDbPaWMzsnSzfWCcPnvOpYVBV6LKRlumhbp6FLZWqk4M sEajs2k7eO7Lp/L0= X-Google-Smtp-Source: AGHT+IEN6fq4YCFjpVUEZ8T2Ob7J953DoGpNy+RZPDpb6la8xKExSdZ3coqVZneg7zgIc0NNKkO1GBPPy7ALu8VWPMQ= X-Received: by 2002:a05:622a:38f:b0:4e8:846e:2d8b with SMTP id d75a77b69052e-4edf2068f59mr70248011cf.28.1763174843003; Fri, 14 Nov 2025 18:47:23 -0800 (PST) MIME-Version: 1.0 References: <97492f51-51cc-45cd-aeea-8ed6f48eccc2@gmail.com> In-Reply-To: <97492f51-51cc-45cd-aeea-8ed6f48eccc2@gmail.com> From: John Naylor Date: Sat, 15 Nov 2025 09:47:11 +0700 X-Gm-Features: AWmQ_bnEnMJG9o74Gy52bcQBcsFOn6anNOt1GjIF4ORCCkUtU3oN0qpbfxy0m2c Message-ID: Subject: Re: tuple radix sort To: David Geier Cc: PostgreSQL Hackers , Peter Geoghegan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Nov 15, 2025 at 1:05=E2=80=AFAM David Geier w= rote: > I understand that you want to make progress with the use case at hand > but I feel like we're missing out on a lot of opportunity where the > introduced code would also be very beneficial. The patch is independently beneficial, but is also just a stepping stone toward something larger, and I don't yet know exactly how it's going to look. Premature abstractions are just going to get in the way. I'd be open to hear proposals for possible wider application after the dust settles, but that's not going to happen during the PG19 cycle. --=20 John Naylor Amazon Web Services On Sat, Nov 15, 2025 at 1:05=E2=80=AFAM David Geier w= rote: > > Hi John! > > On 13.11.2025 05:01, John Naylor wrote: > > If that's the case then I suggest first seeing if dfd8e6c73ee made > > things any worse. A simpler possible improvement is to use a similar > > normalization step for the chars, if needed, then do the sort and > > quinique with a specialization for unsigned chars. (We don't yet > > specialize qunique, but that can be remedied). If you're interested, > > please start a separate thread for that. > > It did but only a bit. I worked around it by having two sort > specializations, one for signed and one for unsigned. I also wanted to > try to use a hash table to filter out duplicates and then only sort the > remaining unique trigams, which are, most of the times, a lot less. > > Generally speaking, the GIN code is death by a thousand cuts. I've got a > patch coming up that cuts CREATE INDEX runtime in half for columns with > relatively short strings and yields even better results for columns with > longer strings. But that's not only changing the sort but requires a few > changes in a couple of places. More details in the upcoming thread. > > I thought qunique() is already pretty optimal because it's defined in a > header file. I believe that even the comparator gets inlined. What would > be useful though is if qunique() used an equality comparator which only > returns true/false instead of a sort comparator. In the GIN code this > also shaved off a few percent. I'll take a closer look at qunique() at > open a thread with the findings / ideas for changes. > > Anyways. In this context GIN was just one example for where a generic > radix sort would be useful and there are certainly more. > > > > > That's moving the goalposts too far IMO. I want to get to a place > > where I feel comfortable with the decisions made, and that already > > requires a lot of testing. Also, I don't want to risk introducing > > abstractions that make future improvements to tuplesort more > > cumbersome. > > On a quick glance it looks like you didn't specialize much. So the > testing seems related to if the new algo introduces regressions, not if > the abstraction would cause problems. So it should be possible to > extract out the code fairly easily without invalidating your existing > benchmark results. > > I understand that you want to make progress with the use case at hand > but I feel like we're missing out on a lot of opportunity where the > introduced code would also be very beneficial. Beyond that we could > nicely test the new sort code in the spirit of test_rbtree.c and > friends. Maybe you want to give it a 2nd thought. > > -- > David Geier > --=20 John Naylor Amazon Web Services