Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nbXP5-0003qN-M7 for pgsql-sql@arkaria.postgresql.org; Tue, 05 Apr 2022 00:50:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nbXP3-00076h-Q3 for pgsql-sql@arkaria.postgresql.org; Tue, 05 Apr 2022 00:50:37 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nbXP3-00076X-7I for pgsql-sql@lists.postgresql.org; Tue, 05 Apr 2022 00:50:37 +0000 Received: from mail-ed1-x532.google.com ([2a00:1450:4864:20::532]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nbXOw-0002Nl-H6 for pgsql-sql@lists.postgresql.org; Tue, 05 Apr 2022 00:50:36 +0000 Received: by mail-ed1-x532.google.com with SMTP id g20so13078117edw.6 for ; Mon, 04 Apr 2022 17:50:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6GrtT+4hE1FGeTwp2Ge1hHoCwj/JWnFDlKeJ5re2KB8=; b=lzQWiBXeezICt4fdRQHNK0oduRqZ4aknRsmev7DrhH6l9Ighw5ethktaR1tkMXcEZJ R0Oo72mm9x294+Mgp2RcWDP07zrIW5qFlKEevtHWFaid2z1d0SS2eMuFfZZIW59lX1de RZWQ7MMxmqAf8RLRco+EaU+zWGHunvfctnCaD1ZdtpIbArnIZ794TPMjNi6slDPp1akn aI/ioMsDRXFkQC+3Vln6F8c3d/FhVSMZwf/z2O3gJWE8MQuXt2fJSJaRxCg1CRCVWAAb byuvXze7h2KwNFpf1pC9+IbJdmAEQWWeK/+0e5JKPrthonpX+rYjA/ejquI/64eU6oTx jjvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6GrtT+4hE1FGeTwp2Ge1hHoCwj/JWnFDlKeJ5re2KB8=; b=fnrISPr9ZG/c4e5okVLJi+MFFGoTu8JSd0Qzi0ktUd8KXlHderZ1ajmUGPzmiEf6wa dR6zWlokVw7QtsNJpRVvTDtVPPz1hreutZ/APpQyUP6PGLfmfFkHcuZrBqOn1BIdOR2x bB27jU22OkBic0r9g/WRUUSOOA9sIIpXy8IBzl+ibGpH9v6fiXeZTZcck+3Dk8KUXdCW VtAaWZiw4B5PnzqJtw5g7owmK1QFwuPuMvF5vrdi9nzCESxbkTs0Eg5DgQvsxDhkj3H+ 2/59cWEvcd7XHUzt9ISasA4yfWoHphJo4G1h2FCO179qrUnr2oL79sIGghQ0fAjwwrGE aAaA== X-Gm-Message-State: AOAM5311S+q69VWpirqJePg5E/L8/BkwoLYBFQ+fDuns1xqwvLGSGEPx 7f4B0nTvsCn8WA+teP9KhGmZ1Q4bHjBVf1uZQj4= X-Google-Smtp-Source: ABdhPJxvnsIBVEoFufNcnPh+z44OK0yDVsyHJhwxuPzn2XloX+N/104JK+hwhtEaAxIu3KmSyOS+sysQEtufl6JInUQ= X-Received: by 2002:a50:954b:0:b0:41a:c9cb:8778 with SMTP id v11-20020a50954b000000b0041ac9cb8778mr868759eda.165.1649119828888; Mon, 04 Apr 2022 17:50:28 -0700 (PDT) MIME-Version: 1.0 References: <3817c56b-458d-5295-e8bc-1001231dc5c8@gmail.com> In-Reply-To: From: "David G. Johnston" Date: Mon, 4 Apr 2022 17:50:09 -0700 Message-ID: Subject: Re: How to just get the last in a recursive query To: Steve Midgley Cc: Shaozhong SHI , Rob Sargent , pgsql-sql Content-Type: multipart/alternative; boundary="00000000000097ce5705dbdda1aa" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000097ce5705dbdda1aa Content-Type: text/plain; charset="UTF-8" On Mon, Apr 4, 2022 at 5:00 PM Steve Midgley wrote: > On Mon, Apr 4, 2022 at 4:32 PM David G. Johnston < > david.g.johnston@gmail.com> wrote: > >> On Mon, Apr 4, 2022, 16:21 Shaozhong SHI wrote: >> >>> That is not the most efficient in this case. >> >> >> Can you prove that statement? Provide a query that is more efficient. >> > > Just to share the SQL from that example > > WITH RECURSIVE walk_network(id, segment) AS ( > SELECT id, segment > FROM network > WHERE id = 6 > UNION ALL > SELECT n.id, n.segment > FROM network n, walk_network w > WHERE ST_DWithin( > ST_EndPoint(w.segment), > ST_StartPoint(n.segment),0.01))SELECT idFROM walk_network > > > David J (kind of off-topic): There's no *order by *in the original query, > so I could imagine that adding any order by clause at all would make the > query less efficient. But maybe it could become more efficient if the > planner picks a better index as a result? > > David (OP): My main point is that in this example, since no order by > clause is provided, it is meaningless to talk about a "last" or "first" > item. SQL, afaik, is not required to produce the results in any order > whatsoever, when no order by clause is provided (corrections welcome if > that's not accurate). So while you might grab the last item somehow this > time, it might not be the last item, the next time you run the query. So > I'd say you should add an appropriate order by query, and then you can > measure "ASC" vs "DESC" with "LIMIT 1" to see if either one is less > efficient. (I'm in David J's camp that it's unlikely to make any difference) > > Reading the query now... I get the desire of the OP - walk the graph and just output the final node that you fall upon. The nature of a recursive CTE is that it can/does have a natural order if the graph produced is linear - thus it has a well defined last node. The CTE, however, captures the entire path. The main query, which only cares about the final node, then has to reverse the path and then select the first node. That is the solution that was provided. There is no way to get the final node in a path, giving a starting node, without walking the path. Or rather, if there is for a given set of data, a recursive CTE is not the best way to get at it. I'll admin the presence of PostGIS makes this a bit harder for me to personally reason about, but the concept is valid and turning the CTE into a subquery, reversing the output (which should have an index indicating node order, in addition to the node id), and doing limit 1 gives the desired answer. Is there a better way to get that answer for this particular query - I have no clue - but absent a better answer the OP needs to just take the one suggestion that does provide a valid answer and assume it is the best possible. Since no other suggestion exists the one answer is by definition also the best answer. David J. --00000000000097ce5705dbdda1aa Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Apr 4, 2022 at 5:00 PM Steve Midgley <science@misuse.org> wrote:
=
On Mon, Apr 4, 2022 at 4:32= PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Apr 4, 2022, 16:21 Shaozhong SHI <shishaozhong@gmail.com>= ; wrote:
That is= not the most efficient in this case.
<= br>
Can you prove that statement?=C2=A0 Provide a qu= ery that is more efficient.

Jus= t to share the SQL from that example
WITH RECURSIVE walk_network(id, segment<=
span style=3D"box-sizing:border-box">) AS (
  SELECT =
id, segment=20
    FROM =
network=20
    WHERE=
 id =3D 6
  UNION <=
span style=3D"box-sizing:border-box;color:rgb(0,102,153)">ALL
  SELECT =
n.id, n=
.segment
    FROM =
network n, walk_network w
    WHERE=
 ST_DWithin(
      ST_EndPoint(w<=
/span>.segment),
      ST_StartPoint(n<=
/span>.segment),0.01)
)
SELECT id
FROM walk_network

David J (kind of off-topic): There's no order by in the original query, so I could imagine that adding any order by clause= at all would make the query less efficient. But maybe it could become more= efficient if the planner picks a better index as a result?

<= /div>
David (OP): My main point is that in this example, since no order= by clause is provided, it is meaningless to talk about a "last" = or "first" item. SQL, afaik, is not required to produce the resul= ts in any order whatsoever, when no order by clause is provided (correction= s welcome if that's not accurate). So while you might grab the last ite= m somehow this time, it might not be the last item, the next time you run t= he query. So I'd say you should add an appropriate order by query, and = then you can measure "ASC" vs "DESC" with "LIMIT 1= " to see if either one is less efficient. (I'm in David J's ca= mp that it's unlikely to make any difference)


Reading the query now...

=
I get the desire of the OP - walk the graph and just output the fi= nal node that you fall upon.=C2=A0 The nature of a recursive CTE is that it= can/does have a natural order if the graph produced is linear - thus it ha= s a well defined last node.=C2=A0 The CTE, however, captures the entire pat= h.=C2=A0 The main query, which only cares about the final node, then has to= reverse the path and then select the first node.=C2=A0 That is the solutio= n that was provided.

There is no way to get the fina= l node in a path, giving a starting node, without walking the path.=C2=A0 O= r rather, if there is for a given set of data, a recursive CTE is not the b= est way to get at it.=C2=A0 I'll admin the presence of PostGIS makes th= is a bit harder for me to personally reason about, but the concept is valid= and turning the CTE into a subquery, reversing the output (which should ha= ve an index indicating node order, in addition to the node id), and doing l= imit 1 gives the desired answer.

Is there a better way= to get that answer for this particular query - I have no clue - but absent= a better answer the OP needs to just take the one suggestion that does pro= vide a valid answer and assume it is the best possible.=C2=A0 Since no othe= r suggestion exists the one answer is by definition also the best answer.

David J.



--00000000000097ce5705dbdda1aa--