Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vOaRY-00E16l-2h for pgsql-hackers@arkaria.postgresql.org; Thu, 27 Nov 2025 11:45:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vOaRW-005v6w-2C for pgsql-hackers@arkaria.postgresql.org; Thu, 27 Nov 2025 11:45:47 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vOaRW-005v6l-1F for pgsql-hackers@lists.postgresql.org; Thu, 27 Nov 2025 11:45:46 +0000 Received: from mail-qt1-x843.google.com ([2607:f8b0:4864:20::843]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vOaRU-001joE-1M for pgsql-hackers@lists.postgresql.org; Thu, 27 Nov 2025 11:45:45 +0000 Received: by mail-qt1-x843.google.com with SMTP id d75a77b69052e-4ee553ba7b4so3111831cf.1 for ; Thu, 27 Nov 2025 03:45:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764243944; x=1764848744; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vyfTec2OQs/SMb6bLyw/J6VNufDQOdd6zloUZzyxzVw=; b=DGVrIdSY6XN52aZX0he4ZL6o1wbOkxMwguFNzh9vvP15TtUcpmufA4+kW/TBR4sZ94 M1Y7t8U/RI818rfS0d02H/YrOb8DZuTh/eVdmInGgwW1luyK/svaCoVN44krPMK2Mc7x LGsfy9wvktI9Zmb1G2uEW+bIpPcebxSUHKFuFrFyk4jlGAwy4nE5xkndOJkmzfWtVn1V OztxVYXMGJp2Pt9ZBWi8aAy7naLZknOrE+Dekx7hpvqaAujY2UWFkiPKOT6247hFzWkQ yxpwwrg08SAN7IEFcFY6cn+aCcqYqcIbQpcnwsykA3jelwJBqUY1Up+uuQdzP7N3QslO esvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764243944; x=1764848744; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=vyfTec2OQs/SMb6bLyw/J6VNufDQOdd6zloUZzyxzVw=; b=lpRBL+bHtLmBZueAwCZkG6eyEp2j0ixaSk3Osi4ewZgsaJC30XOcotrZPmHyQp/H1g lGeesEiWXeFjCN8hhO5zOJUOezQOLaVK+kxVd7zsguFVlqSHXeOTRB95KgSdgZAij4Ju 2yzOiBOqFJH+b7Q2LtOXoqs/s9I1IxFeeIY6GG8XQKpvlwiR2+WfVOsTTZLvE6UTx7cb BLOVkD2WcviuN+LQ7ou8QdCugk2fbQcgejRbeFek9kKx5CVAXq9aThAMd14JP+V2BaRO dkw4jmJaa4q7q3liVpRYKNC1NruCK0SIvgS0DzUNU74FbUq9cBwenNu1KkKQWYhGYIbm BPZQ== X-Forwarded-Encrypted: i=1; AJvYcCW3m8WpvJRZN0MJimgF3FLecyW6Rv9Yb+qS8pkSPYLbX5ksFSfbJwbP2Mf1eqZGPnlW7viT+rs4w9S78co6@lists.postgresql.org X-Gm-Message-State: AOJu0Yw+39zoSokHtFfuuwvggauJmyYbU48jaTEbdYDxKi+/+9+I31po 4vIYOrYewY+PMHSuOjt6FvULDtOnsfSKDN13jiNWvkMHtRVBI/GwvY7dvM/DeTTY1DaOyoNLr9/ 7+mJKsUUSezfds0gRiwF748md8rS0Z4c= X-Gm-Gg: ASbGncvfjCZiOHUJEaLN5ipQoIqxKxHh/i0xB/VyDA4VJzTG/V1EzgRTJaz+5VqryDm 77gVpqKZvz8zMqTLHrzw1fCabdeczsvowZO2iJSvrpwrG9fiHFnKDU+FNYXdxq0AvBg+CEzAEZ3 D7Kzb4QMEi3s6JJmoaSJZSqBng1iRQIU/nqEeuqScTKCt5zQNMJeWqfryUw9FGN4d4ysKbNtQb4 ISQPlHz2MvfZpFhuddvvhWXick5hUizYpM2RGDco9ppZ/QomdokGQUWbs0fMm5eG/dq8J4xtUlw Z063zFG++sKHh2DHafGZFAyyd1X8i7trOf6PskTdN8KkisBqtUl4PYmCDZEt5h+tfAttJ4zsojO BUvQ+yqFCkwZu7U0= X-Google-Smtp-Source: AGHT+IEUwYNGuOCrtYb89GEL8D9GOIfml9wJ3MrBblcUmqYOzMPMxFMKu2KFxMrG+TSUXQ/PzIXaWArdJYPHttRfSmQ= X-Received: by 2002:ac8:5d54:0:b0:4ed:b83f:78a3 with SMTP id d75a77b69052e-4efbda90566mr122552231cf.47.1764243943836; Thu, 27 Nov 2025 03:45:43 -0800 (PST) MIME-Version: 1.0 References: <269A8FB9-6D43-43CF-A6FE-52D28CBDB8A9@Outlook.com> In-Reply-To: From: John Naylor Date: Thu, 27 Nov 2025 18:45:32 +0700 X-Gm-Features: AWmQ_blmhwGO22J_EAlCP3Dk_soz4ReRNlabYwB8ca-pIxTM1KrMOERvh38c5iE Message-ID: Subject: Re: tuple radix sort To: Chao Li Cc: Chengpeng Yan , "pgsql-hackers@lists.postgresql.org" , Peter Geoghegan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Nov 27, 2025 at 2:49=E2=80=AFPM Chao Li wr= ote: > I did an initial test before, but I didn=E2=80=99t read the code at the t= ime. Today, I spent time reviewing 0001. Overall, I believe the implementat= ion is solid. I just got a few comments/suggestions. Thanks for looking! > > On Nov 26, 2025, at 21:11, John Naylor wrote: > > > We recompute normalize_datum(tup->datum1, ssup) for every tuple in every = level, why don=E2=80=99t cache the result in SortTuple. As we have cached c= urrent_byte in SortTuple, it shouldn=E2=80=99t be a big deal to add one mor= e field to it. Actually it is a big deal, because the memtuples array counts against work_= mem. I saw two ways to store the full normalized without increasing the size, and I've already rejected them: - share space with isnull1/srctape -- v3 did this, and I already explained the reason for changing when I shared v4. - share space with datum1 -- that would require additional code to restore the original datum and makes it more difficult to verify correctness There is also a proposal upthread to not store anything, and that's still up in the air. > 2 - 0001 > ``` > + while (num_remaining > 1) > + { > + /* start the count over */ > + num_remaining =3D num_partitions; > ``` > > The swap loop always start the count over, so that sorted partitions are = re-scanned as well. I think we can do an optimization like: > > ``` > num_active =3D num_partitions; > while (num_active > 1) > { > for (int i =3D 0; i < num_active; ) > { > uint8 idx =3D remaining_partitions[i]; > // do the swaps for the partition =E2=80=A6 > > if (partitions[idx].offset =3D=3D partitions[idx].next_offset) > { > remaining_partitions[i] =3D remaining_partitions[num_active -= 1]; > num_active--; > } > else > i++; > } > } > ``` > > This way we move out sorted partitions, so they will not be re-scanned. I don't think that's going to work without additional bookkeeping, so I'm not sure it's worth it. See the recursion step. > 3 - 0001 > > In sort_byvalue_datum, we can add a fast-path for all NULL and all NOT NU= LL cases, so that they won=E2=80=99t need to run the branchless cyclic perm= utation and the two =E2=80=9Cfor=E2=80=9D loops of assertions. Something li= ke: This is too clever and yet doesn't go far enough. There is already one fast path, which happens to cover the common ASC + all NOT NULL case. The right way to skip the permutation step would be to add a second loop that starts from the end and stops at the first tuple that needs to swap. That should work not just for all NULL and all NOT NULL, but also where there is a mix of the two and some (or all) are already in place. All these cases can be treated the same and they will continue to the exact same paths. I haven't yet bothered to try harder, but it may be necessary in order to avoid introducing regressions, so I'll look into it. -- John Naylor Amazon Web Services