public inbox for [email protected]
help / color / mirror / Atom feedFrom: Michel Pelletier <[email protected]>
To: Tom Lane <[email protected]>
Cc: Pavel Stehule <[email protected]>
Cc: [email protected]
Subject: Re: Using Expanded Objects other than Arrays from plpgsql
Date: Sun, 22 Dec 2024 19:52:08 -0800
Message-ID: <CACxu=vKjvJSftCgBPa78tbdpBnJOq9DCrtV8x=o0_cuCOoxLbA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com>
<[email protected]>
<CACxu=vLXvpzN4X3k+9jsMt6ujuOvFVUSkA80t_cROSsF4y2jQQ@mail.gmail.com>
<[email protected]>
<CACxu=vKEF8Qa-OaADFxf0uMg-xw6gH_CNCWd2s+xaqh-gY4=xg@mail.gmail.com>
<[email protected]>
<[email protected]>
<CACxu=v++HNmss59yGUDkRny7g=M8tZ2YXF07AUXqKVGqcSfxGQ@mail.gmail.com>
<[email protected]>
<[email protected]>
<CACxu=v+dn37zr8gx5xNP-EZY3OLtGLTHrbx_ZkCQc40HpyMLKA@mail.gmail.com>
<[email protected]>
<CACxu=vL7i_U_iSNpREe8eCAMEyKHuPpk9THRpBhk+ar0U1EdOw@mail.gmail.com>
<CACxu=vKLc6f5N8_DR58LKkE1eohWSxTvThTeGsLm7p7QH1aFBA@mail.gmail.com>
<CACxu=vJf2S=ysun_h=zmYNu6oUM47+egbpX5mMC0X9BJK=EQwQ@mail.gmail.com>
<CACxu=vK+S6BXN8ZYyBvqQBWrcwHXqtue1-ZuKO3+XtHGBYcDUQ@mail.gmail.com>
<CAFj8pRCd6xcH-AYEyHFdGdU89O9JjZ-v-pyQnOwd9zNJkCEdhQ@mail.gmail.com>
<[email protected]>
<[email protected]>
<CACxu=vKCSSqO6y0ES4SJnFSMBa8szANOykQkG3L1LsLnN2v0JA@mail.gmail.com>
<CACxu=vL4g2pPgEg66F-ttZ5jVuMba8K31vsoEYKbTJ7q6jk0hg@mail.gmail.com>
<[email protected]>
On Wed, Dec 18, 2024 at 12:22 PM Tom Lane <[email protected]> wrote:
> Michel Pelletier <[email protected]> writes:
> > My bad, sorry for the long confusing email, I figured out that I was
> > calling the wrong macro when getting my matrix datum and inadvertently
> > expanding RO pointers as well, I've fixed that issue, and everything is
> > working great! No extra expansions and my support functions are working
> > well, I need to go through a few more places in the API to add more
> support
> > but otherwise the fixes Tom has put into plpgsql have worked perfectly
> and
> > the library now appears to be behaving optimally! I can get down to
> doing
> > some benchmarks and head-to-head with the C and Python bindings to
> compare
> > against.
>
> So, just to clarify where we're at: you are satisfied that the current
> patch-set does what you need?
>
I have some updates on this thread based on some graph algorithms I've
ported from the Python/C graphblas libraries.
All of the plpgsql expanded object optimizations so far are working well, I
can minimize object expansion in most cases, there are a couple I haven't
been able to work around but I'm still getting excellent benchmarking
numbers on some large test graphs:
LiveJournal Orkut
Nodes 3,997,962 3,072,441
Edges 34,681,185 117,185,037
Triangles 177,820,130 627,583,972
Seconds Edges/S Seconds Edges/S
Tri Count LL 2.80s 12,386,138 32.03s 3,658,602
Tri Count LU 1.91s 18,157,688 16.38s 7,156,338
Tri Centrality 1.55s 22,374,958 12.22s 9,589,610
Page Rank 8.10s 4,281,628 23.14s 5,064,176
That's on a 2020 era 4 core economy laptop and is in line with what the
C/Python/Julia bindings get on similar hardware.
There are a few cases where I have to force an expansion, I work around
this by calling a `wait()` function, which expands the datum, calls
GrB_wait() on it (a nop in this case) and returns a r/w pointer. You can
see this in the following Triangle Counting function which is a matrix
multiplication of a graph to itself, using itself as a mask. This matrix
reduces to the triangle count (times six):
create or replace function tcount_b(graph matrix) returns bigint language
plpgsql as
$$
begin
graph = wait(graph);
graph = mxm(graph, graph, 'plus_pair_int32', mask=>graph,
descr=>'s');
return reduce_scalar(graph) / 6;
end;
$$;
DEBUG: new_matrix
DEBUG: flatten_matrix
DEBUG: matrix_wait
DEBUG: expand_matrix -- expansion happens here in wait()
DEBUG: new_matrix
DEBUG: matrix_mxm -- mxm does not re-expand the object, good!
DEBUG: expand_semiring
DEBUG: new_semiring
DEBUG: new_matrix
DEBUG: expand_descriptor
DEBUG: new_descriptor
DEBUG: matrix_reduce_scalar -- neither does reduce, good!
DEBUG: new_scalar
DEBUG: scalar_div_int32
DEBUG: new_scalar
DEBUG: cast_scalar_int64
If I take out the call to wait(), then mxm calls expand_matrix 3 times as
it did before your optimizations.
The other task we'd talked about was generalizing the existing
> heuristics in exec_assign_value() and plpgsql_exec_function() that
> say that array-type values should be forced into expanded R/W form
> when being assigned to an array-type PL/pgSQL variable. The argument
> for that is that the PL/pgSQL function might subsequently do a lot of
> subscripted accesses to the array (which'd benefit from working with
> an expanded array) while never doing another assignment and thus not
> having any opportunity to revisit the decision. The counter-argument
> is that it might *not* do such accesses, so that the expansion was
> just a waste of cycles. So this is squishy enough that I'd prefer to
> have some solid use-cases to look at before trying to generalize it.
>
> It's sounding to me like you're going to end up in a place where all
> your values are passed around in expanded form already and so you have
> little need for that optimization.
If so, I'd prefer not to go any
> further than the present patch-set for now. Adding "type support"
> hooks as discussed would be a substantial amount of work, so I'd
> like to have a more compelling case for it before doing that.
>
I agree it makes sense to have more use cases before making deeper
changes. I only work with expanded forms, but need to call wait() to
pre-expand the object to avoid multiple expansions in functions that can
take the same object in multiple parameters. This is a pretty common
pattern in GraphBLAS (and linear algebra in general) where (many) matrices
are commutable to themselves in several ways like multiplication,
element-wise operations, and element masking.
I'm not sure if eliminating wait() is a good enough use case, it would
definitely be nice to get rid of but I can document it pretty thoroughly
and it's relatively easy to catch.
-Michel
view thread (34+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: Using Expanded Objects other than Arrays from plpgsql
In-Reply-To: <CACxu=vKjvJSftCgBPa78tbdpBnJOq9DCrtV8x=o0_cuCOoxLbA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox