Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t2YrT-004z8S-2z for pgsql-general@arkaria.postgresql.org; Sun, 20 Oct 2024 16:32:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t2YrQ-00H08d-KC for pgsql-general@arkaria.postgresql.org; Sun, 20 Oct 2024 16:32:56 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t2YrQ-00H08V-4P for pgsql-general@lists.postgresql.org; Sun, 20 Oct 2024 16:32:56 +0000 Received: from mail-lj1-x231.google.com ([2a00:1450:4864:20::231]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t2YrN-001rlp-MN for pgsql-general@lists.postgresql.org; Sun, 20 Oct 2024 16:32:55 +0000 Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-2fb3110b964so32514991fa.1 for ; Sun, 20 Oct 2024 09:32:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729441971; x=1730046771; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=z0W2CKdxhuJM6nzzgqB7d9nKSu76qVqJ/1LddLm82to=; b=SEsSlq/NmJANr4cQIls5fZ7Uv4UbtXyLklCV2AyCJaRqJi/B4UTi95D57MtNHrkTc2 qOa0USd/IKxY1EOK900twsCrECT+qxoqebCgK18Tp+mmdla/n++l2NRiP3vCPQYoshG8 RhpfRHw8+VA6mcyL3nbc6csujFn5lXJA7wlpkKxuxsu4CTlRHyo9/nmjowAvrNkCFf7w VpNtvbOwPMR2TBElmwROkAMaQlIim3eqNoIiZOQ/qZSxKWltVbpOdbQkN1wuZPwnPGA3 4LMjL0V0wIQ+8rWb4pcLyEpwwnYcPFrxeAgsxaVhAEjRCGURQl3GLMqykNXf1N1P4Ry9 jMmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729441971; x=1730046771; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=z0W2CKdxhuJM6nzzgqB7d9nKSu76qVqJ/1LddLm82to=; b=h3+g/5TWVZxH/N14uc80ZF7S3bT6JREqX0F7ycTqjlv0cIkacTYIxl7EPVRuc7BVJ0 xFg3DGKSySJNSvsV4LS0nH/qu9A3EIy7KbSsw9VIJ+ovce8B/A7h4gOONHLc+DgTQaoq 7WgnGZMmmJ4boL1OsmR+mTIIoFfllyjEsH545RB4wbN+5+re9ZnF65EyNsRSXUaJSa0b IFJ/Z09AvAhlonjU6tPSAmi47DNetxbND/QoQW0ZvmlWG+lFlSuczbrTm7RcdvYw0lB5 qIk5xky8fnuKFjPdFRN9eNSK/9mEcrqWCe9rwPkO3ONoxBEzg3YP3brK0bjX4H/3OYYr peRg== X-Gm-Message-State: AOJu0Yy7SRAFX+DmX1wTh3XdXCzLWsRKDlWtNEzxtyRVKe1qIsDNiekn 4z3NvFDYNdx7m+mMKX/4QNtncbqkmL1EzpReV3etBw94iMJL7FDqX2a2wSOKVrmHnoMaYSBfBJD uEwK73YwWvtQUpzAe/3v5fREhKoxMJ5TI X-Google-Smtp-Source: AGHT+IE27e0YIp7nsS+fPM8VtOo874GxhF+n5Fb1zTY+9x9rEUpD9u73G0fVQ9OC6GkFkpJwgiGa7TJPyL2IchD+s5k= X-Received: by 2002:a2e:b8cd:0:b0:2f3:eeab:7f17 with SMTP id 38308e7fff4ca-2fb8320b288mr36548481fa.41.1729441970301; Sun, 20 Oct 2024 09:32:50 -0700 (PDT) MIME-Version: 1.0 From: Michel Pelletier Date: Sun, 20 Oct 2024 09:32:13 -0700 Message-ID: Subject: Using Expanded Objects other than Arrays from plpgsql To: pgsql-general Content-Type: multipart/alternative; boundary="0000000000004cb6ea0624eb1625" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000004cb6ea0624eb1625 Content-Type: text/plain; charset="UTF-8" Hello! I'm working on the OneSparse Postgres extension that wraps the GraphBLAS API with a SQL interface for doing graph analytics and other sparse linear algebra operations: https://onesparse.github.io/OneSparse/test_matrix_header/ OneSparse wraps the GraphBLAS opaque handles in Expanded Object Header structs that register ExpandedObjectMethods for flattening and expanding objects from their "live" handle that can be passed to the SuiteSparse API, and their "flat" representations are de/serialized and get written as TOAST values. This works perfectly. However during some single source shortest path (sssp) benchmarking I was getting good numbers but not as good as I expected, and noticed some sublinear scaling as the problems got bigger. It seems my objects are getting constantly flattened/expanded from plpgsql during the iterative phases of an algorithm. As the solution grows the result vector gets bigger and the expand/flatten cost increases on each iteration. I found this thread from the original path implementation from Tom Lane in 2015: https://www.postgresql.org/message-id/E1Ysvgz-0000s0-DP%40gemulon.postgresql.org In this initial implementation, a few heuristics have been hard-wired > into plpgsql to improve performance for arrays that are stored in > plpgsql variables. We would like to generalize those hacks so that > other datatypes can obtain similar improvements, but figuring out some > appropriate APIs is left as a task for future work. Sure enough looking at the code I see this condition: https://github.com/postgres/postgres/blob/master/src/pl/plpgsql/src/pl_exec.c#L549 This is a showstopper for me as I can't see a good way around it, I tried to "fake" an array but didn't get too far down that approach but I may still pull it off as GraphBLAS objects are very much array-like, but I figured I'd also open the discussion on how we can fix this permanently so that future extensions don't run into this penalty. My first thought was to add a flag to CREATE TYPE like "EXPANDED = true" or some other better name that indicates that the object can be safely taken ownership of in its expanded state and not copied. The GraphBLAS is specific in its API in that the object handle holder is the owner of the reference, so that would work fine for me. Another option I guess is some kind of whitelist or blacklist telling plpgsql which types can be kept expanded. And then there is just removing the existing restriction on arrays only. Is any other expanded object out there really interested in being flattened/expanded over and over again? Thanks, -Michel --0000000000004cb6ea0624eb1625 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello!

I'm working= on the OneSparse Postgres extension that wraps the GraphBLAS API with a SQ= L interface for doing graph analytics and other=C2=A0sparse=C2=A0linear alg= ebra operations:


OneS= parse wraps the GraphBLAS opaque handles in Expanded Object Header structs= =C2=A0that register ExpandedObjectMethods for flattening and expanding obje= cts from their "live" handle that can be passed to the SuiteSpars= e API, and their "flat" representations are de/serialized and get= written as TOAST values.=C2=A0 This works perfectly.

<= div>However during some single source shortest path (sssp) benchmarking I w= as getting good numbers but not as good as I expected, and noticed some sub= linear scaling as the problems got bigger.=C2=A0 It seems my objects are ge= tting constantly flattened/expanded from plpgsql during the iterative phase= s of an algorithm.=C2=A0 As the solution grows the result vector gets bigge= r and the expand/flatten cost increases on each iteration.

I found this thread from the original path implementation from Tom= Lane in 2015:


In = this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve perf= ormance for arrays that are stored in
plpgsql variables. We would like to generalize those hac= ks so that
other dat= atypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a tas= k for future work.

Sure enoug= h looking at the code I see this condition:


This is a showstop= per for me as I can't see a good way around it, I tried to "fake&q= uot; an array but didn't get too far down that approach but I may still= pull it off as GraphBLAS objects are very much array-like, but I figured I= 'd also open the discussion on how we can fix this permanently so that = future extensions don't run into this penalty.=C2=A0=C2=A0
My first thought was to add a flag to CREATE TYPE like "E= XPANDED =3D true" or some other better name that indicates that the ob= ject can be safely taken ownership of in its expanded state and not copied.= =C2=A0 The GraphBLAS is specific in its API in that the object handle holde= r is the owner of the reference, so that would work fine for me.=C2=A0 Anot= her option I guess is some kind of whitelist or blacklist telling plpgsql w= hich types can be kept expanded.

And then there is= just removing the existing restriction on arrays only.=C2=A0 Is any other = expanded object out there really interested in being flattened/expanded ove= r and over again?

Thanks,

-Michel
--0000000000004cb6ea0624eb1625--