public inbox for [email protected]  
help / color / mirror / Atom feed
About the stability of COPY BINARY data
7+ messages / 3 participants
[nested] [flat]

* About the stability of COPY BINARY data
@ 2024-11-06 16:20  Dominique Devienne <[email protected]>
  0 siblings, 2 replies; 7+ messages in thread

From: Dominique Devienne @ 2024-11-06 16:20 UTC (permalink / raw)
  To: pgsql-general

From https://www.postgresql.org/docs/current/sql-copy.html:
|> binary-format file is less portable across machine architectures
and PostgreSQL versions

In my experience, the binary encoding of binding/resultset/copy is
endian neutral (network byte order), so what is the less portable
across machine architectures that warning about?

Also, does the code for per-type _send() and _recv() functions really change
across versions of PostgreSQL? How common are instances of such
changes across versions? Any examples of such backward-incompatible
changes, in the past?

The binary data contains OIDs, but if sticking to built-in types,
which OIDs are unlikely to change across versions?

I'm obviously storing COPY BINARY data (we have lots of bytea
columns), and I wonder how bad it is long term, and across PostgreSQL
versions.

Thanks for any insights, --DD






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: About the stability of COPY BINARY data
@ 2024-11-07 16:37  Adrian Klaver <[email protected]>
  parent: Dominique Devienne <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Adrian Klaver @ 2024-11-07 16:37 UTC (permalink / raw)
  To: Dominique Devienne <[email protected]>; pgsql-general

On 11/6/24 08:20, Dominique Devienne wrote:
>>>From https://www.postgresql.org/docs/current/sql-copy.html:
> |> binary-format file is less portable across machine architectures
> and PostgreSQL versions
> 
> In my experience, the binary encoding of binding/resultset/copy is
> endian neutral (network byte order), so what is the less portable
> across machine architectures that warning about?
> 
> Also, does the code for per-type _send() and _recv() functions really change
> across versions of PostgreSQL? How common are instances of such
> changes across versions? Any examples of such backward-incompatible
> changes, in the past?
> 
> The binary data contains OIDs, but if sticking to built-in types,
> which OIDs are unlikely to change across versions?
> 
> I'm obviously storing COPY BINARY data (we have lots of bytea
> columns), and I wonder how bad it is long term, and across PostgreSQL
> versions.

If I where to hazard a guess this plays a part:

https://www.postgresql.org/docs/current/sql-copy.html

"To determine the appropriate binary format for the actual tuple data 
you should consult the PostgreSQL source, in particular the *send and 
*recv functions for each column's data type (typically these functions 
are found in the src/backend/utils/adt/ directory of the source 
distribution)."

> 
> Thanks for any insights, --DD
> 
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: About the stability of COPY BINARY data
@ 2024-11-07 17:14  Dominique Devienne <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Dominique Devienne @ 2024-11-07 17:14 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general

On Thu, Nov 7, 2024 at 5:37 PM Adrian Klaver <[email protected]> wrote:
>
> On 11/6/24 08:20, Dominique Devienne wrote:
> >>From https://www.postgresql.org/docs/current/sql-copy.html:
> > |> binary-format file is less portable across machine architectures
> > and PostgreSQL versions
> >
> > In my experience, the binary encoding of binding/resultset/copy is
> > endian neutral (network byte order), so what is the less portable
> > across machine architectures that warning about?
> >
> > Also, does the code for per-type _send() and _recv() functions really change
> > across versions of PostgreSQL? How common are instances of such
> > changes across versions? Any examples of such backward-incompatible
> > changes, in the past?
> >
> > The binary data contains OIDs, but if sticking to built-in types,
> > which OIDs are unlikely to change across versions?
> >
> > I'm obviously storing COPY BINARY data (we have lots of bytea
> > columns), and I wonder how bad it is long term, and across PostgreSQL
> > versions.
>
> If I where to hazard a guess this plays a part:
>
> https://www.postgresql.org/docs/current/sql-copy.html
>
> "To determine the appropriate binary format for the actual tuple data
> you should consult the PostgreSQL source, in particular the *send and
> *recv functions for each column's data type (typically these functions
> are found in the src/backend/utils/adt/ directory of the source
> distribution)."

Hi Adrian.

Well, sure. The questions above are whether those type-specific formats are:
1) architecture dependent. (that's not my experience).
2) change across PostgreSQL versions.

Not what the actual formats are. --DD

PS: I'm surprised I didn't get answers. Seems to me to doc is overly "careful"
    about COPY BINARY's stability, thus my asking for confirmation here.






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: About the stability of COPY BINARY data
@ 2024-11-07 17:39  Daniel Verite <[email protected]>
  parent: Dominique Devienne <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Daniel Verite @ 2024-11-07 17:39 UTC (permalink / raw)
  To: Dominique Devienne <[email protected]>; +Cc: pgsql-general

	Dominique Devienne wrote:

> Also, does the code for per-type _send() and _recv() functions
> really change across versions of PostgreSQL? How common are
> instances of such changes across versions? Any examples of such
> backward-incompatible changes, in the past?

For the timestamp types, I think these functions were
sending/expecting float8 (before version 7.3), and then float8 or
int64 depending on the server configuration up until 9.6, and since
then int64 only.
The same for the "time" field of the interval type.
There is still an "integer_datetimes" GUC reflecting this.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: About the stability of COPY BINARY data
@ 2024-11-07 17:55  Dominique Devienne <[email protected]>
  parent: Daniel Verite <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Dominique Devienne @ 2024-11-07 17:55 UTC (permalink / raw)
  To: Daniel Verite <[email protected]>; +Cc: pgsql-general

On Thu, Nov 7, 2024 at 6:39 PM Daniel Verite <[email protected]> wrote:
>         Dominique Devienne wrote:
> > Also, does the code for per-type _send() and _recv() functions
> > really change across versions of PostgreSQL? How common are
> > instances of such changes across versions? Any examples of such
> > backward-incompatible changes, in the past?
>
> For the timestamp types, I think these functions were
> sending/expecting float8 (before version 7.3), and then float8 or
> int64 depending on the server configuration up until 9.6, and since
> then int64 only.
> The same for the "time" field of the interval type.
> There is still an "integer_datetimes" GUC reflecting this.

Thanks. So it did happen in a distant past.
Anything below 14 is of no concern to me though.
So again, it does sound like changes are unlikely.

And I haven't seen anything not network-byte-order,
as far architecture is concerned.






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: About the stability of COPY BINARY data
@ 2024-11-07 18:04  Adrian Klaver <[email protected]>
  parent: Dominique Devienne <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Adrian Klaver @ 2024-11-07 18:04 UTC (permalink / raw)
  To: Dominique Devienne <[email protected]>; Daniel Verite <[email protected]>; +Cc: pgsql-general

On 11/7/24 09:55, Dominique Devienne wrote:
> On Thu, Nov 7, 2024 at 6:39 PM Daniel Verite <[email protected]> wrote:
>>          Dominique Devienne wrote:
>>> Also, does the code for per-type _send() and _recv() functions
>>> really change across versions of PostgreSQL? How common are
>>> instances of such changes across versions? Any examples of such
>>> backward-incompatible changes, in the past?
>>
>> For the timestamp types, I think these functions were
>> sending/expecting float8 (before version 7.3), and then float8 or
>> int64 depending on the server configuration up until 9.6, and since
>> then int64 only.
>> The same for the "time" field of the interval type.
>> There is still an "integer_datetimes" GUC reflecting this.
> 
> Thanks. So it did happen in a distant past.
> Anything below 14 is of no concern to me though.
> So again, it does sound like changes are unlikely.

Yeah that is implied by:

https://www.postgresql.org/docs/current/pgupgrade.html

"Major PostgreSQL releases regularly add new features that often change 
the layout of the system tables, but the internal data storage format 
rarely changes. "

The COPY warning is there as heads up that it is a possibility.

> 
> And I haven't seen anything not network-byte-order,
> as far architecture is concerned.
> 
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: About the stability of COPY BINARY data
@ 2024-11-07 18:10  Dominique Devienne <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Dominique Devienne @ 2024-11-07 18:10 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: Daniel Verite <[email protected]>; pgsql-general

On Thu, Nov 7, 2024 at 7:04 PM Adrian Klaver <[email protected]> wrote:
> On 11/7/24 09:55, Dominique Devienne wrote:
> > On Thu, Nov 7, 2024 at 6:39 PM Daniel Verite <[email protected]> wrote:
> >>          Dominique Devienne wrote:
> >>> Also, does the code for per-type _send() and _recv() functions
> >>> really change across versions of PostgreSQL? How common are
> >>> instances of such changes across versions? Any examples of such
> >>> backward-incompatible changes, in the past?
> >>
> >> For the timestamp types, I think these functions were
> >> sending/expecting float8 (before version 7.3), and then float8 or
> >> int64 depending on the server configuration up until 9.6, and since
> >> then int64 only.
> >> The same for the "time" field of the interval type.
> >> There is still an "integer_datetimes" GUC reflecting this.
> >
> > Thanks. So it did happen in a distant past.
> > Anything below 14 is of no concern to me though.
> > So again, it does sound like changes are unlikely.
>
> Yeah that is implied by:
>
> https://www.postgresql.org/docs/current/pgupgrade.html
>
> "Major PostgreSQL releases regularly add new features that often change
> the layout of the system tables, but the internal data storage format
> rarely changes. "
>
> The COPY warning is there as heads up that it is a possibility.
> >
> > And I haven't seen anything not network-byte-order,
> > as far architecture is concerned.

But the COPY BINARY format and "the internal data storage format"
are two separate things Adrian, AFAIK.

Using binds and result sets in binary mode is part of the protocol in a way,
and not an internal implementation detail, like internal format for
tables on disk.

I'm sure I'm not the only one that using binary mode for PostgreSQL,
and any change in _send and _recv functions across versions, or have their
results be architecture dependent, would break many client codes, if
it happened.
And COPY BINARY's outer format is also public and documented, so can't
change either.

That's why I'm insisting on that phrase in the documentation, which
gives the wrong impressions IMHO.
Shouldn't it be removed or amended? Thanks, --DD






^ permalink  raw  reply  [nested|flat] 7+ messages in thread


end of thread, other threads:[~2024-11-07 18:10 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-11-06 16:20 About the stability of COPY BINARY data Dominique Devienne <[email protected]>
2024-11-07 16:37 ` Adrian Klaver <[email protected]>
2024-11-07 17:14   ` Dominique Devienne <[email protected]>
2024-11-07 17:39 ` Daniel Verite <[email protected]>
2024-11-07 17:55   ` Dominique Devienne <[email protected]>
2024-11-07 18:04     ` Adrian Klaver <[email protected]>
2024-11-07 18:10       ` Dominique Devienne <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox