public inbox for [email protected]  
help / color / mirror / Atom feed
Unlogged tables
11+ messages / 6 participants
[nested] [flat]

* Unlogged tables
@ 2017-08-09 03:20  [email protected] <[email protected]>
  0 siblings, 2 replies; 11+ messages in thread

From: [email protected] @ 2017-08-09 03:20 UTC (permalink / raw)
  To: pgsql-performance

Hello,


We have a fairly large static dataset that we load into Postgres. We made the tables UNLOGGED and saw a pretty significant performance improvement for the loading. This was all fantastic until the server crashed and we were surprised to see during a follow up demo that the data had disappeared... Of course, it's all our fault for not understanding the implications of UNLOGGED proprely.


However, our scenario is truly a set of tables with 100's of millions of rows that are effectively WORMs: we write them once only, and then only read from them afterwards. As such, they could not be possibly corrupted post-load (i think) during a server crash (short of physical disk defects...).


I'd like to have the performance improvement during a initial batch insert, and then make sure the table remains after "unclean" shutdowns, which, as you might have it, includes a regular Windows server shut down during patching for example. So unlogged tables in practice are pretty flimsy. I tried to ALTER ... SET LOGGED, but that takes a VERY long time and pretty much negates the initial performance boost of loading into an unlogged table.


Is there a way to get my cake and eat it too?


Thank you,

Laurent Hasson





^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-09 10:39  Michael Paquier <[email protected]>
  parent: [email protected] <[email protected]>
  1 sibling, 1 reply; 11+ messages in thread

From: Michael Paquier @ 2017-08-09 10:39 UTC (permalink / raw)
  To: [email protected] <[email protected]>; +Cc: pgsql-performance

On Wed, Aug 9, 2017 at 5:20 AM, [email protected]
<[email protected]> wrote:
> We have a fairly large static dataset that we load into Postgres. We made
> the tables UNLOGGED and saw a pretty significant performance improvement for
> the loading. This was all fantastic until the server crashed and we were
> surprised to see during a follow up demo that the data had disappeared... Of
> course, it's all our fault for not understanding the implications of
> UNLOGGED proprely.

This is documented.

> However, our scenario is truly a set of tables with 100's of millions of
> rows that are effectively WORMs: we write them once only, and then only read
> from them afterwards. As such, they could not be possibly corrupted
> post-load (i think) during a server crash (short of physical disk
> defects...).
>
> I'd like to have the performance improvement during a initial batch insert,
> and then make sure the table remains after "unclean" shutdowns, which, as
> you might have it, includes a regular Windows server shut down during
> patching for example. So unlogged tables in practice are pretty flimsy.

All the data that you want to keep needs to be durable anyway, so you
will need to WAL-log it, and full page writes of those relation pages
will need to be created at least once. After you get past the
checkpoint the data will still be around. If you want to improve the
performance once, there are a couple of tricks, like switching
wal_level to minimal, preferring COPY over multi-value INSERT, batch a
lot of them in the same transaction. Of course you can as well
increase wal_max_size to trigger less checkpoints, or use
synchronous_commit = off to reduce fsync costs.

> I tried to ALTER ... SET LOGGED, but that takes a VERY long time and pretty
> much negates the initial performance boost of loading into an unlogged
> table.

This triggers a table rewrite and makes sure that all the data gets
WAL-logged. The cost to pay for durability.

> Is there a way to get my cake and eat it too?

Not completely. Making data durable will have a cost at the end, but
you can leverage it.
-- 
Michael


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-09 14:37  David G. Johnston <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: David G. Johnston @ 2017-08-09 14:37 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: [email protected] <[email protected]>; pgsql-performance

On Wed, Aug 9, 2017 at 3:39 AM, Michael Paquier <[email protected]>
wrote:

> This triggers a table rewrite and makes sure that all the data gets
> WAL-logged. The cost to pay for durability.
>
> > Is there a way to get my cake and eat it too?
>
> Not completely. Making data durable will have a cost at the end, but
> you can leverage it.
>
>
​Aren't you over-playing the role of the WAL in providing durability.  An
unlogged table remains intact after a clean shutdown and so is "durable" if
one considers the primary "permanence" aspect of the word.

The trade-off the OP wishes for is "lose crash-safety to gain write-once
(to the data files) performance".  Seeming having this on a per-table basis
would be part of the desirability.  It sounds like OP would be willing to
place the table into "read only" mode in order to ensure this - which is
something that is not presently possible.  I could envision that putting an
unlogged table into read-only mode would cause the system to ensure that
the data files are fully populated and then set a flag in the catalog that
informs the crash recovery process to go ahead and omit truncating that
particular unlogged table since the data files are known to be accurate.

David J.


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-09 15:12  Stephen Frost <[email protected]>
  parent: David G. Johnston <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: Stephen Frost @ 2017-08-09 15:12 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: Michael Paquier <[email protected]>; [email protected] <[email protected]>; pgsql-performance

David, all,

* David G. Johnston ([email protected]) wrote:
> On Wed, Aug 9, 2017 at 3:39 AM, Michael Paquier <[email protected]>
> wrote:
> 
> > This triggers a table rewrite and makes sure that all the data gets
> > WAL-logged. The cost to pay for durability.

That's not entirely accurate- there are certain cases where we don't
have to WAL-log the data, in fact we've got a specific optimization to
avoid WAL logging when it isn't necessary (see
src/backend/commands/copy.c:2392 or so), and the data will still be
durable once the transaction commits.  There are limitations there
though, of course, but it sounds like those are ones the OP may be happy
to live with in this case.

> > > Is there a way to get my cake and eat it too?
> >
> > Not completely. Making data durable will have a cost at the end, but
> > you can leverage it.
>
> Aren't you over-playing the role of the WAL in providing durability.  An
> unlogged table remains intact after a clean shutdown and so is "durable" if
> one considers the primary "permanence" aspect of the word.

In database terms, however, durable is intended to be in the face of a
crash and not just a clean shutdown, otherwise we wouldn't need to bother
with this whole WAL thing at all.

> The trade-off the OP wishes for is "lose crash-safety to gain write-once
> (to the data files) performance".  Seeming having this on a per-table basis
> would be part of the desirability.  It sounds like OP would be willing to
> place the table into "read only" mode in order to ensure this - which is
> something that is not presently possible.  I could envision that putting an
> unlogged table into read-only mode would cause the system to ensure that
> the data files are fully populated and then set a flag in the catalog that
> informs the crash recovery process to go ahead and omit truncating that
> particular unlogged table since the data files are known to be accurate.

This does sound like a pretty interesting idea, though not really
necessary unless OP has a mix of data that needs to be WAL-log'd and
data that doesn't.

What I believe OP is really looking for here, specifically, is using
wal_level = minimal while creating the table (or truncating it) within
the same transaction as the data load is done.  That will avoid having
the table's contents written into the WAL, and PG will treat it as a
regular table post-commit, meaning that it won't be truncated on a
database crash.

Thanks!

Stephen


Attachments:

  [application/pgp-signature] signature.asc (819B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-09 16:14  Jeff Janes <[email protected]>
  parent: [email protected] <[email protected]>
  1 sibling, 0 replies; 11+ messages in thread

From: Jeff Janes @ 2017-08-09 16:14 UTC (permalink / raw)
  To: [email protected] <[email protected]>; +Cc: pgsql-performance

On Tue, Aug 8, 2017 at 8:20 PM, [email protected] <
[email protected]> wrote:

> Hello,
>
>
> We have a fairly large static dataset that we load into Postgres. We made
> the tables UNLOGGED and saw a pretty significant performance improvement
> for the loading. This was all fantastic until the server crashed and we
> were surprised to see during a follow up demo that the data had
> disappeared... Of course, it's all our fault for not understanding the
> implications of UNLOGGED proprely.
>
>
> However, our scenario is truly a set of tables with 100's of millions of
> rows that are effectively WORMs: we write them once only, and then only
> read from them afterwards. As such, they could not be possibly corrupted
> post-load (i think) during a server crash (short of physical disk
> defects...).
>

Yes, this is a feature many people have wanted.   You'd have to somehow
mark the unlogged table as immutable and then do a checkpoint, after which
it would no longer need to be truncated after a crash.  Alternatively, it
could be done automatically where the system would somehow know which
unlogged tables were possibly touched since the last successful checkpoint,
and truncate only those one.  But, no one has implemented such a thing.

>
> I'd like to have the performance improvement during a initial batch
> insert, and then make sure the table remains after "unclean" shutdowns,
> which, as you might have it, includes a regular Windows server shut down
> during patching for example.
>

Why doesn't the Windows scheduled shutdown signal postgres to shutdown
cleanly and wait for it to do so?  That is what is supposed to happen.


> So unlogged tables in practice are pretty flimsy. I tried to ALTER ... SET
> LOGGED, but that takes a VERY long time and pretty much negates the initial
> performance boost of loading into an unlogged table.
>

Are you using streaming or wal logging?

Cheers,

Jeff


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-09 18:15  George Neuner <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: George Neuner @ 2017-08-09 18:15 UTC (permalink / raw)
  To: pgsql-performance

On Wed, 9 Aug 2017 09:14:48 -0700, Jeff Janes <[email protected]> wrote:

 >Why doesn't the Windows scheduled shutdown signal postgres to shutdown
 >cleanly and wait for it to do so?  That is what is supposed to happen.

Windows *does* signal shutdown (and sleep and hibernate and wakeup).  
pg_ctl can catch these signals only when running as a service ... it 
will not catch any system signals when run as an application.

George


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-09 18:30  [email protected] <[email protected]>
  parent: George Neuner <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: [email protected] @ 2017-08-09 18:30 UTC (permalink / raw)
  To: George Neuner <[email protected]>; pgsql-performance

Ok, I am not sure. I run Postgres as a service, and when my Windows rebooted after a patch, UNLOGGED tables were cleaned... maybe the patch process in Windows messed something up, I don't know.

From: [email protected]
Sent: August 9, 2017 13:17
To: [email protected]
Subject: Re: [PERFORM] Unlogged tables


On Wed, 9 Aug 2017 09:14:48 -0700, Jeff Janes <[email protected]> wrote:

 >Why doesn't the Windows scheduled shutdown signal postgres to shutdown
 >cleanly and wait for it to do so?  That is what is supposed to happen.

Windows *does* signal shutdown (and sleep and hibernate and wakeup).
pg_ctl can catch these signals only when running as a service ... it
will not catch any system signals when run as an application.

George


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-09 19:52  George Neuner <[email protected]>
  parent: [email protected] <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: George Neuner @ 2017-08-09 19:52 UTC (permalink / raw)
  To: [email protected] <[email protected]>; +Cc: pgsql-performance

Please don't top post.

On 8/9/2017 2:30 PM, [email protected] wrote:
> > On 8/9/2017 2:17 PM, [email protected] wrote:
>
> >> On Wed, 9 Aug 2017 09:14:48 -0700, Jeff Janes <[email protected]> wrote:
>
> >> Why doesn't the Windows scheduled shutdown signal postgres to shutdown
> >> cleanly and wait for it to do so?  That is what is supposed to happen.
>
> > Windows *does* signal shutdown (and sleep and hibernate and wakeup).
> > pg_ctl can catch these signals only when running as a service ... it
> > will not catch any system signals when run as an application.
>
> Ok, I am not sure. I run Postgres as a service, and when my Windows 
> rebooted after a patch, UNLOGGED tables were cleaned... maybe the 
> patch process in Windows messed something up, I don't know.

Hmm.  Do you have checkpoint intervals set very long?  Or do you have 
the Windows shutdown delay(s) set short?

Data in unlogged tables persists only AFTER a checkpoint ... if the 
tables had been written to and were "dirty", and the system went down 
before the shutdown checkpoint (or before the shutdown checkpoint 
completed), then the tables would be truncated at the next startup.


Service control in Windows is very different from Unix/Linux, and 
Windows is not completely POSIX compatible.  I develop software for 
Windows and Linux, but I only use Postgresql.  Postgresql was written 
originally for Unix and it is possible that the Windows version is not 
doing something quite right.

I took a quick glance at the source for pg_ctl: SERVICE_CONTROL_SHUTDOWN 
and SERVICE_CONTROL_STOP both just set an shared event to notify the 
writer processes to terminate.  Offhand I don't see where pg_ctl - 
running as a service - is waiting for the writer processes to actually 
terminate ( it does wait if run from the command line ).   It's possible 
that your system shut down too quickly and the WAL writer was killed 
instead of terminating cleanly.


Just FYI, re: Postgresql as a user application.

Windows doesn't send *signals* (ala Unix) at all ... it is message 
based.  The control messages are different for applications and services 
- e.g., WM_SHUTDOWN is sent to applications, SERVICE_CONTROL_SHUTDOWN is 
sent to services.  In order for an application to catch a message, it 
must create a window.

pg_ctl is a command line program which does not create any windows (in 
any mode).  It was designed to enable it to run as a service, but when 
run as a user application it will can't receive any system messages.  
The user *must* manually stop a running database cluster before shutting 
down or sleeping.

George



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-10 05:29  [email protected] <[email protected]>
  parent: George Neuner <[email protected]>
  0 siblings, 2 replies; 11+ messages in thread

From: [email protected] @ 2017-08-10 05:29 UTC (permalink / raw)
  To: George Neuner <[email protected]>; +Cc: pgsql-performance



Sent from my BlackBerry - the most secure mobile device
From: [email protected]
Sent: August 9, 2017 14:52
To: [email protected]
Cc: [email protected]
Subject: Re: [PERFORM] Unlogged tables


Please don't top post.

On 8/9/2017 2:30 PM, [email protected]<mailto:[email protected]> wrote:
> On 8/9/2017 2:17 PM, [email protected]<mailto:[email protected]> wrote:

>> On Wed, 9 Aug 2017 09:14:48 -0700, Jeff Janes <[email protected]><mailto:[email protected]> wrote:

>> Why doesn't the Windows scheduled shutdown signal postgres to shutdown
>> cleanly and wait for it to do so?  That is what is supposed to happen.

> Windows *does* signal shutdown (and sleep and hibernate and wakeup).
> pg_ctl can catch these signals only when running as a service ... it
> will not catch any system signals when run as an application.

Ok, I am not sure. I run Postgres as a service, and when my Windows rebooted after a patch, UNLOGGED tables were cleaned... maybe the patch process in Windows messed something up, I don't know.

Hmm.  Do you have checkpoint intervals set very long?  Or do you have the Windows shutdown delay(s) set short?

Data in unlogged tables persists only AFTER a checkpoint ... if the tables had been written to and were "dirty", and the system went down before the shutdown checkpoint (or before the shutdown checkpoint completed), then the tables would be truncated at the next startup.


Service control in Windows is very different from Unix/Linux, and Windows is not completely POSIX compatible.  I develop software for Windows and Linux, but I only use Postgresql.  Postgresql was written originally for Unix and it is possible that the Windows version is not doing something quite right.

I took a quick glance at the source for pg_ctl:  SERVICE_CONTROL_SHUTDOWN and SERVICE_CONTROL_STOP both just set an shared event to notify the writer processes to terminate.  Offhand I don't see where pg_ctl - running as a service - is waiting for the writer processes to actually terminate ( it does wait if run from the command line ).   It's possible that your system shut down too quickly and the WAL writer was killed instead of terminating cleanly.


Just FYI, re: Postgresql as a user application.

Windows doesn't send *signals* (ala Unix) at all ... it is message based.  The control messages are different for applications and services - e.g., WM_SHUTDOWN is sent to applications, SERVICE_CONTROL_SHUTDOWN is sent to services.  In order for an application to catch a message, it must create a window.

pg_ctl is a command line program which does not create any windows (in any mode).  It was designed to enable it to run as a service, but when run as a user application it will can't receive any system messages.  The user *must* manually stop a running database cluster before shutting down or sleeping.

George


Hello George... I know about not doing top posting but was emailing from my phone, and just recently moved to Android. I think I am still not configured right.

Somewhat orthogonal, but any particular reason why top posts == bad, or just convention?

I will try a few scenarios and report back. I do not believe I have long cp intervals and I do not believe the windows machine shuts down faster than 'normal'

Finally, my true question was whether Postgres would support something like worm with the performance benefits of UNLOGGED, but not the inconveniences of auto truncates.

Thanks.


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-10 22:28  George Neuner <[email protected]>
  parent: [email protected] <[email protected]>
  1 sibling, 0 replies; 11+ messages in thread

From: George Neuner @ 2017-08-10 22:28 UTC (permalink / raw)
  To: [email protected] <[email protected]>; +Cc: pgsql-performance



On 8/10/2017 1:29 AM, [email protected] wrote:
> Hello George... I know about not doing top posting but was emailing 
> from my phone, and just recently moved to Android. I think I am still 
> not configured right.
>
> Somewhat orthogonal, but any particular reason why top posts == bad, 
> or just convention?

The standard joke reply is:

    Because it messes up the order in which people normally read.
 > Why is top-posting such a bad thing?
    >> Top-posting.
    >>> What is the most annoying thing in e-mail?

<grin>

It is just convention, but with a good reason:  most posts in groups are 
part of a discussion, and it's hard to follow a discussion when replies 
are far from the comment or question that provoked them. The convention 
for discussion is "interleaved" style.

The email top posting convention serves a different purpose: to preserve 
a record of the communication.  Polite people often use a mix of 
styles:  copying the [latest portion of the ] quoted message to the top 
and replying to it inline (as with a discussion).

see https://en.wikipedia.org/wiki/Posting_style


Then too, there is the issue of editing.  With an email, typically only 
a handful of people will receive it.  With a public group or mailing 
list, all of the participants  - perhaps thousands - will receive the 
post.  When lots of people in a popular discussion quote the entire 
message, it quickly grows to an unwieldy size and eventually will be 
rejected by the servers.

The polite thing when replying is to edit the original message to 
include just information relevant to your reply, and then reply inline.  
Leave archiving of the discussion to the servers.


> I will try a few scenarios and report back. I do not believe I have 
> long cp intervals and I do not believe the windows machine shuts down 
> faster than 'normal'

Your problem still may be related to the shutdown delay.

The way it works is: Windows sends a shutdown message to the service, 
and the service replies with an estimate of how long it will take to 
stop.  Until the service terminates, Windows waits and periodically 
polls the service asking for its progress.  Windows continues to wait 
until the service process either terminates, or until the system 
configured "drop-dead" timeout occurs, at which time Windows forcibly 
kills the service and continues with the shutdown.

The problem is that Postgresql is not a single process: pg_ctl spawns a 
bunch of children.  Looking further at the source, I believe pg_ctl is 
waiting for the children to terminate before stopping itself - but it is 
NOT responding to Windows progress messages, so Windows has no idea 
whether it is making headway or needs more time to complete.

Windows has no idea that those other processes are connected to the 
Postgresql service, so if it times out and kills pg_ctl, it assumes it 
is done with Postgresql.  The other processes then may be killed whether 
or not they are finished.


> Finally, my true question was whether Postgres would support something 
> like worm with the performance benefits of UNLOGGED, but not the 
> inconveniences of auto truncates.

I saw some of the other responses re: that issue.

As I mentioned previously, an unlogged table will be truncated on 
startup if it is dirty - i.e. there were any updates that haven't 
survived at least one checkpoint.  The only thing you could try to do is 
force a checkpoint immediately following an unlogged table write.  But 
that is expensive performance wise and is not encouraged.

https://www.postgresql.org/docs/current/static/sql-checkpoint.html

George



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Unlogged tables
@ 2017-08-10 22:52  George Neuner <[email protected]>
  parent: [email protected] <[email protected]>
  1 sibling, 0 replies; 11+ messages in thread

From: George Neuner @ 2017-08-10 22:52 UTC (permalink / raw)
  To: [email protected] <[email protected]>; +Cc: pgsql-performance



On 8/10/2017 1:29 AM, [email protected] wrote:
>
> Finally, my true question was whether Postgres would support something 
> like worm with the performance benefits of UNLOGGED, but not the 
> inconveniences of auto truncates.
>

If you can live with the limitations, one other thing you might try is 
storing WORM data in the filesystem and accessing it via file_fdw.
https://www.postgresql.org/docs/current/static/file-fdw.html

There are a lot of downsides to this:  file_fdw tables are read-only, so 
you have to update the external file through some other means.  Also, 
I've never used file_fdw, so I'm not sure whether you can create indexes 
on the tables - and even if you can, you would need to manually recreate 
the indexes periodically because Postgresql won't see your updates.

George



^ permalink  raw  reply  [nested|flat] 11+ messages in thread


end of thread, other threads:[~2017-08-10 22:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2017-08-09 03:20 Unlogged tables [email protected] <[email protected]>
2017-08-09 10:39 ` Michael Paquier <[email protected]>
2017-08-09 14:37   ` David G. Johnston <[email protected]>
2017-08-09 15:12     ` Stephen Frost <[email protected]>
2017-08-09 16:14 ` Jeff Janes <[email protected]>
2017-08-09 18:15 Re: Unlogged tables George Neuner <[email protected]>
2017-08-09 18:30 ` [email protected] <[email protected]>
2017-08-09 19:52   ` George Neuner <[email protected]>
2017-08-10 05:29     ` [email protected] <[email protected]>
2017-08-10 22:28       ` George Neuner <[email protected]>
2017-08-10 22:52       ` George Neuner <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox