public inbox for [email protected]  
help / color / mirror / Atom feed
From: Michel Pelletier <[email protected]>
To: pgsql-general <[email protected]>
Subject: Using Expanded Objects other than Arrays from plpgsql
Date: Sun, 20 Oct 2024 09:32:13 -0700
Message-ID: <CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com> (raw)

Hello!

I'm working on the OneSparse Postgres extension that wraps the GraphBLAS
API with a SQL interface for doing graph analytics and other sparse linear
algebra operations:

https://onesparse.github.io/OneSparse/test_matrix_header/

OneSparse wraps the GraphBLAS opaque handles in Expanded Object Header
structs that register ExpandedObjectMethods for flattening and expanding
objects from their "live" handle that can be passed to the SuiteSparse API,
and their "flat" representations are de/serialized and get written as TOAST
values.  This works perfectly.

However during some single source shortest path (sssp) benchmarking I was
getting good numbers but not as good as I expected, and noticed some
sublinear scaling as the problems got bigger.  It seems my objects are
getting constantly flattened/expanded from plpgsql during the iterative
phases of an algorithm.  As the solution grows the result vector gets
bigger and the expand/flatten cost increases on each iteration.

I found this thread from the original path implementation from Tom Lane in
2015:

https://www.postgresql.org/message-id/E1Ysvgz-0000s0-DP%40gemulon.postgresql.org

In this initial implementation, a few heuristics have been hard-wired
> into plpgsql to improve performance for arrays that are stored in
> plpgsql variables. We would like to generalize those hacks so that
> other datatypes can obtain similar improvements, but figuring out some
> appropriate APIs is left as a task for future work.


Sure enough looking at the code I see this condition:


https://github.com/postgres/postgres/blob/master/src/pl/plpgsql/src/pl_exec.c#L549

This is a showstopper for me as I can't see a good way around it, I tried
to "fake" an array but didn't get too far down that approach but I may
still pull it off as GraphBLAS objects are very much array-like, but I
figured I'd also open the discussion on how we can fix this permanently so
that future extensions don't run into this penalty.

My first thought was to add a flag to CREATE TYPE like "EXPANDED = true" or
some other better name that indicates that the object can be safely taken
ownership of in its expanded state and not copied.  The GraphBLAS is
specific in its API in that the object handle holder is the owner of the
reference, so that would work fine for me.  Another option I guess is some
kind of whitelist or blacklist telling plpgsql which types can be kept
expanded.

And then there is just removing the existing restriction on arrays only.
Is any other expanded object out there really interested in being
flattened/expanded over and over again?

Thanks,

-Michel


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Using Expanded Objects other than Arrays from plpgsql
  In-Reply-To: <CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox