public inbox for [email protected]  
help / color / mirror / Atom feed
From: Michel Pelletier <[email protected]>
To: Tom Lane <[email protected]>
Cc: Pavel Stehule <[email protected]>
Cc: [email protected]
Subject: Re: Using Expanded Objects other than Arrays from plpgsql
Date: Sun, 22 Dec 2024 19:52:08 -0800
Message-ID: <CACxu=vKjvJSftCgBPa78tbdpBnJOq9DCrtV8x=o0_cuCOoxLbA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com>
	<[email protected]>
	<CACxu=vLXvpzN4X3k+9jsMt6ujuOvFVUSkA80t_cROSsF4y2jQQ@mail.gmail.com>
	<[email protected]>
	<CACxu=vKEF8Qa-OaADFxf0uMg-xw6gH_CNCWd2s+xaqh-gY4=xg@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CACxu=v++HNmss59yGUDkRny7g=M8tZ2YXF07AUXqKVGqcSfxGQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CACxu=v+dn37zr8gx5xNP-EZY3OLtGLTHrbx_ZkCQc40HpyMLKA@mail.gmail.com>
	<[email protected]>
	<CACxu=vL7i_U_iSNpREe8eCAMEyKHuPpk9THRpBhk+ar0U1EdOw@mail.gmail.com>
	<CACxu=vKLc6f5N8_DR58LKkE1eohWSxTvThTeGsLm7p7QH1aFBA@mail.gmail.com>
	<CACxu=vJf2S=ysun_h=zmYNu6oUM47+egbpX5mMC0X9BJK=EQwQ@mail.gmail.com>
	<CACxu=vK+S6BXN8ZYyBvqQBWrcwHXqtue1-ZuKO3+XtHGBYcDUQ@mail.gmail.com>
	<CAFj8pRCd6xcH-AYEyHFdGdU89O9JjZ-v-pyQnOwd9zNJkCEdhQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CACxu=vKCSSqO6y0ES4SJnFSMBa8szANOykQkG3L1LsLnN2v0JA@mail.gmail.com>
	<CACxu=vL4g2pPgEg66F-ttZ5jVuMba8K31vsoEYKbTJ7q6jk0hg@mail.gmail.com>
	<[email protected]>

On Wed, Dec 18, 2024 at 12:22 PM Tom Lane <[email protected]> wrote:

> Michel Pelletier <[email protected]> writes:
> > My bad, sorry for the long confusing email, I figured out that I was
> > calling the wrong macro when getting my matrix datum and inadvertently
> > expanding RO pointers as well, I've fixed that issue, and everything is
> > working great!  No extra expansions and my support functions are working
> > well, I need to go through a few more places in the API to add more
> support
> > but otherwise the fixes Tom has put into plpgsql have worked perfectly
> and
> > the library now appears to be behaving optimally!  I can get down to
> doing
> > some benchmarks and head-to-head with the C and Python bindings to
> compare
> > against.
>
> So, just to clarify where we're at: you are satisfied that the current
> patch-set does what you need?
>

I have some updates on this thread based on some graph algorithms I've
ported from the Python/C graphblas libraries.

All of the plpgsql expanded object optimizations so far are working well, I
can minimize object expansion in most cases, there are a couple I haven't
been able to work around but I'm still getting excellent benchmarking
numbers on some large test graphs:

                LiveJournal         Orkut
Nodes           3,997,962           3,072,441
Edges           34,681,185          117,185,037
Triangles       177,820,130         627,583,972

                Seconds Edges/S     Seconds Edges/S
Tri Count LL    2.80s   12,386,138  32.03s  3,658,602
Tri Count LU    1.91s   18,157,688  16.38s  7,156,338
Tri Centrality  1.55s   22,374,958  12.22s  9,589,610
Page Rank       8.10s   4,281,628   23.14s  5,064,176

That's on a 2020 era 4 core economy laptop and is in line with what the
C/Python/Julia bindings get on similar hardware.

There are a few cases where I have to force an expansion, I work around
this by calling a `wait()` function, which expands the datum, calls
GrB_wait() on it (a nop in this case) and returns a r/w pointer.  You can
see this in the following Triangle Counting function which is a matrix
multiplication of a graph to itself, using itself as a mask.  This matrix
reduces to the triangle count (times six):

create or replace function tcount_b(graph matrix) returns bigint language
plpgsql as
    $$
    begin
        graph = wait(graph);
        graph = mxm(graph, graph, 'plus_pair_int32', mask=>graph,
descr=>'s');
        return reduce_scalar(graph) / 6;
    end;
    $$;

DEBUG:  new_matrix
DEBUG:  flatten_matrix
DEBUG:  matrix_wait
DEBUG:  expand_matrix  -- expansion happens here in wait()
DEBUG:  new_matrix
DEBUG:  matrix_mxm      -- mxm does not re-expand the object, good!
DEBUG:  expand_semiring
DEBUG:  new_semiring
DEBUG:  new_matrix
DEBUG:  expand_descriptor
DEBUG:  new_descriptor
DEBUG:  matrix_reduce_scalar  -- neither does reduce, good!
DEBUG:  new_scalar
DEBUG:  scalar_div_int32
DEBUG:  new_scalar
DEBUG:  cast_scalar_int64

If I take out the call to wait(), then mxm calls expand_matrix 3 times as
it did before your optimizations.

The other task we'd talked about was generalizing the existing
> heuristics in exec_assign_value() and plpgsql_exec_function() that
> say that array-type values should be forced into expanded R/W form
> when being assigned to an array-type PL/pgSQL variable.  The argument
> for that is that the PL/pgSQL function might subsequently do a lot of
> subscripted accesses to the array (which'd benefit from working with
> an expanded array) while never doing another assignment and thus not
> having any opportunity to revisit the decision.  The counter-argument
> is that it might *not* do such accesses, so that the expansion was
> just a waste of cycles.  So this is squishy enough that I'd prefer to
> have some solid use-cases to look at before trying to generalize it.
>
> It's sounding to me like you're going to end up in a place where all
> your values are passed around in expanded form already and so you have
> little need for that optimization.

  If so, I'd prefer not to go any
> further than the present patch-set for now.  Adding "type support"
> hooks as discussed would be a substantial amount of work, so I'd
> like to have a more compelling case for it before doing that.
>

I agree it makes sense to have more use cases before making deeper
changes.  I only work with expanded forms,  but need to call wait() to
pre-expand the object to avoid multiple expansions in functions that can
take the same object in multiple parameters.  This is a pretty common
pattern in GraphBLAS (and linear algebra in general) where (many) matrices
are commutable to themselves in several ways like multiplication,
element-wise operations, and element masking.

I'm not sure if eliminating wait() is a good enough use case, it would
definitely be nice to get rid of but I can document it pretty thoroughly
and it's relatively easy to catch.


-Michel


view thread (34+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Using Expanded Objects other than Arrays from plpgsql
  In-Reply-To: <CACxu=vKjvJSftCgBPa78tbdpBnJOq9DCrtV8x=o0_cuCOoxLbA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox