Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1keN2N-0005HN-1z for pgsql-hackers@arkaria.postgresql.org; Sun, 15 Nov 2020 18:46:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1keN2L-0006im-OR for pgsql-hackers@arkaria.postgresql.org; Sun, 15 Nov 2020 18:46:05 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1keN2L-0006id-8w for pgsql-hackers@lists.postgresql.org; Sun, 15 Nov 2020 18:46:05 +0000 Received: from mout.kundenserver.de ([217.72.192.73]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1keN27-0006Bl-I4 for pgsql-hackers@lists.postgresql.org; Sun, 15 Nov 2020 18:46:03 +0000 Received: from [192.168.178.34] ([77.12.243.193]) by mrelayeu.kundenserver.de (mreue106 [212.227.15.145]) with ESMTPSA (Nemesis) id 1MfZ9C-1k36Ou3OQ4-00g4Rm; Sun, 15 Nov 2020 19:45:37 +0100 From: =?UTF-8?Q?J=c3=bcrgen_Purtz?= Subject: Re: Additional Chapter for Tutorial To: "David G. Johnston" Cc: Erik Rijkers , PostgreSQL Hackers , Justin Pryzby References: <13c65997-9502-7671-1a7b-50e5d5093514@purtz.de> <93fd8e15-8639-23c8-c7c4-d4cfca323189@purtz.de> <32c7ebc0-f69f-8a77-c397-8fcb9139d8d3@purtz.de> <779bb812-5238-f78b-2782-b1d990f952e3@purtz.de> <7a081d6c-e288-615c-f32c-839fb27366ed@purtz.de> <759f7a615736d8066b3441571d371c68@xs4all.nl> <99f58c4752e9984ffe38a4343fed3837@xs4all.nl> <1bfb1ae4-6f5b-bff4-15a8-a768e0ddd450@purtz.de> Message-ID: Date: Sun, 15 Nov 2020 19:45:35 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------1BB5DBAD17FB692E25728BB2" Content-Language: en-US X-Provags-ID: V03:K1:6Nx90r+XgJCS/NZ/mtjoiOPYmJty8iMz1/6YFS+Kf/bbqFhDkmO gZkzJFAC4drOAIgVJ++IXpsGFfuezsqWoH5Q0UwbVc+idbCMS7fMQvjuECHW2Swowd+GubO ybfNJUcZS8pGwVwQga8lWjUeIPKgbCWIaCHZw0sPKZ5Cq22zCaRqRXUkL0s4AT6I1sT/BUX jsio3KP48nGHkbu5WOlSQ== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:sD1GD/ergao=:csEFzf38coy0yHBTJXcQ1T 04o4qBLPepT290WlddqBY9lvOLCFcLMypd+2eybDv7JWFKSDYRd3RuKMTH0ziQYqucOYg0nSs pB4tcaUV8rqa1Zcqv1NL41TSexYA4Vk/a1PMz/etNn8DAHoR+FNpY1BPlcBskmuXzywEDQqz1 ja/NnkcJyxwYllsoX239+VWPn9VT2PxHMSP6MzE4T/a6o0QpS2HWvF1IvCWnggFR2P3i9sPlk pT8bnaFINillFrMlne3YLyDxViDJEnc5CpiC0e0RTqcwOMsyb8b70/lbzQM4hKJxsktRpmgtY yv4kpwJj0bGaQE4LjJNfY4WflZ87XpvEwoaY0wgb6sdfTJOX+cWwRnlH1NjK6JLgUejijC4FB FKdeM9K5v7YVy0jF9Z8vAkd2te+5iWGgS76zORvJQLGrwRv/8v+S1OJ1NWINN List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk This is a multi-part message in MIME format. --------------1BB5DBAD17FB692E25728BB2 Content-Type: multipart/alternative; boundary="------------66F2611A4B7BB4A55DC73A90" --------------66F2611A4B7BB4A55DC73A90 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 10.11.20 00:14, David G. Johnston wrote: > Reviewed the first three sections. > > template0 - I would remove the schema portions of this and simply note > this as being a pristine recovery database in the diagram. ok > > I would drop the word "more" and just say "system schemas".  I would > drop pg_toast from the list of system schema and focus on the three > user-facing ones. ok > > Instead of "my_schema" (optional) I would do "my_schema" (example) The terms 'optional' and 'default' are used at various places with their literal meaning. We shall not change them. > > Server Graphic > #3 Global SQL Objects: Objects which are shared among all databases > within a cluster. > #6 Client applications are prohibited from connecting to template0 ok > #1 If by you we mean "the client" saying that you work "in the cluster > data" doesn't really help.  I would emphasize the point that the > client sees an endpoint the Postmaster publishes as a port or socket > file and that plus the database name defines the endpoint the client > connects to (meld with #5) ok, with some changes. > > In lieu of some of the existing detail provided about structure I > would add information about configuration and search_path at this level. Search path appended. But IMO configuration questions are out of scope of this sub-chapter. > > I like the object type enumeration - I would suggest grouping them by > type in a manner consistent with the documentation and making each one > a link to its "primary" section - the SQL Command reference if all > else fails. ok. But don't how to group them in a better way. > > The "i" in internal in 51.3 (the image) needs capitalization). ok > > You correctly add both Extension and Collation as database-level > objects but they are not mentioned anywhere else.  They do belong here > and need to be tied in properly in the text. Have some courage to the gap, it's an introductory chapter. > > The whole thing needs a good pass focused on capitalization.  Both for > typos and to decide when various primary concepts like Instance should > be capitalized and when not. 'Instance' and 'Cluster' are now uppercase because of their importance, everything else lowercase for better reading. > > 51.4 - When you look at the diagram seeing /pg/data/base looks really > cool, but when reading the prose where both the "pg" and the "base" > are omitted and all you get are repeated references to "data", the > directory name choice becomes an issue IMO.  I suggest (and changed > the attached) to name the actual root directory "pgdata".  You should > change the /pg/ directory name to something like ".../tutorial_project/". The graphic shall reflect the default behavior of PG. Without the parameter -D, initdb creates the new cluster in the directory where PGDATA points to. This is in many cases |/var/lib/pgsql/data|. Therefore 'data' and its subdirectory 'base' are not my invention but reflects the default situation. (Diving a little deeper into this issue I noticed that there is a parameter 'cluster_name' in the config file. But it does not change the name of the cluster's root directory, it only changes the names of the running processes. Choosing 'instance_name' instead of 'cluster_name' as the parameter's name would be a better choice imo - but that is not what we are speaking about in the context of the new chapter). I changed the very first directory in the graphic to visualize the standard behavior; I reverted your recommendation to use 'pgdata' instead of 'data' in the text part. > Since you aren't following alphabetical order anyway I would place > pg_tblspc after globals since tablespaces are globals and thus > proximity links them here - and pointing out that pg_tblspc holds the > data makes stating that global doesn't contain tablespace data > unnecessary. ok > > Maybe point out somewhere the the "base/databaseOID" directory > represents the default tablespace for each database, which isn't > "global", only the non-default tablespaces are considered globals (or > just get rid of the mentioned on "non-default tablespace" for now). ok more: 1) some changes concerning the nature of connections (52.2: logical perspective). IMO accessing multiple databases within one connection is not a question of configuring, you have to take more actions. But I'm not sure we should mention this at all. 2) you propose to cancel or trim down the paragraphs behind figure 51.2. (cluster, database, schema). I believe that a textual description of this hierarchy is essential for the understanding of the system. Because it isn't described explicitly at a different place, it should remain. --- snipp -------- from other e-mail ---- > MVCC Section > > The first paragraph and example in the MVCC section is a good example > but seems misplaced - its relationship to MVCC generally is tenuous, > rather I would expect a discussion of the serializable isolation mode > to follow. > > I'm not sure how much detail this section wants to get into given the > coverage of concurrency elsewhere in the documentation.  "Not much" > would be my baseline. The paragraph focus on the fact that new row versions are generated instead of locking something. Explaining serialization isolation modes is imo very complicate and out of the scope of this subchapter. If we want to give an overview - in addition to the exiting documentation - it should be a separate subchapter. > > I would suggest spelling out what "OLTP" stands for and ideally > pointing the user to the glossary for the term. ok, but not added to glossary. The given explanation "... with a massive number of concurrent write actions" should be sufficient. > > Tending more toward a style gripe but the amount of leader phrases and > redundancy are at a level that I am noticing them when I read this but > do not have the same impression having read large portions of > documentation. In particular: Because I'm not a native English speaker, orthographic and style hits are always welcome. > > "When we speak about transaction IDs, you need to know that xids are > like sequences." > > "But keep in mind that xids are independent of any time measurement — > in milliseconds or otherwise. If you dive deeper into PostgreSQL, you > will recognize parameters with names such as 'xxx_age'. Despite their > names, these '_age' parameters do not specify a period of time but > represent a certain number of transactions, e.g., 100 million." > > Could just be:  xids are sequences and age computations involving them > measure a transaction count as opposed to a time interval. ok > > Then I would consider adding a bit more detail/context here. > > xids are 32bit sequences, with a reserved value to handle > wrap-around.  There are 4 billion values in the sequence but > wrap-around handling must occur every 2 billion transactions. Age > computations involving xids measure a transaction count as opposed to > a time interval. > > I would move the mentioning of "vacuum" to the main paragraph about > delete and not solely as a "keep in mind" note. The mentioning here at the food of the page is a crossover to the next subchapter. > > The part before the diagram seems like it should be much shorter, > concise, and provide links to the excellent documentation.  The part > after the image, and the image itself, are good material, though > possibly should be in a main administration chapter instead of an > internals chapter. vacuum: The problem - and one reason for the existence of this subchapter - is that vacuum's documentation is scattered across may pages: 19.4: parameters to configure the server, especially five parameters 'vacuum_cost_xxx'. 19.10: parameters to configure autovacuum. 19.11: parameters to configure client connections, especially five parameters 'vacuum_xxx' concerning their freeze-behavior. 24.1: explains the general necessity of (auto)vacuum and their strategies. The page about the SQL command VACUUM explains the different options (FULL, FREEZE, ..) and their meaning. Because of the structure of our documentation as well as the complexity of the issue that's ok. The existing documentation describes every parameter very well, but I'm missing a page where the 'big picture' of vacuum is explained (not necessarily here). It shall show the relationship between the huge number of parameters and an explanation *why* they exists. As far as we don't have such a page within the vacuum documentation the proposed subchapter fills the gap. (The provided graphics can be included multiple times without generating redundancies - here and at arbitrary other places.) > > The first bullet of "keep in mind" is both wordy and wrong - in > particular "as xids grow old row versions get out of scope over time" > doesn't make sense (or rather it only does in the context of > wrap-around, not normal visibility).  Having the only mention of bloat > be here is also not ideal, it too should be weaved into the main > narrative.  The "keep in mind" section here should be a recap of > already covered material in a succinct form, nothing should be new to > someone who just read the entire section. ok. > > I don't think that usage of exclamation marks (!) is warranted here, > though emphasis on the key phrase wouldn't hurt. ok > > Vacuum Section > > avoid -> prevent (continued growth) ok > > Autovacuum is enabled by default.  The whole note needs commas. ok > > I'd try to get rid of "at arbitrary point in time" ok > > "Instance." we've already described where instances are previously > ("on the server") ok > > The other sections - these seem misplaced for the tutorial, update the > main documentation if this information is wholly missing or lacking.  > The MVCC chapter can incorporate overview information as it is a > strict consequence of that implementation. > > Statistics belong elsewhere - the tutorial should not use poor command > implementation choices as a guide for user education. > > In short, this whole section should not exist and its content moved to > more appropriate areas (mainly MVCC).  Vacuum is a tool that one must > use but the narrative should be about the system generally. > > concerning vacuum section: see my comments above concerning 'the other sections' (transactions, reliability, backup (plus: someone should add 'replication', I'm not familiar with this issue)): The intention of the chapter is to give a *summary* about PG's essential architecture and about central implementation aspects. This implies that the chapters does not present any new information. They shall only show (or repeat) essential things in their context and explain *why* they are used. In this sense the three chapters may be reasonable. Concerning this, I like to hear some comments from other people. Attachments: 0013-architecture.patch: complete patch vs. master 0013-architecture.sgml.diff: changes in file architecture.sgml since 0012 0013-images.diff: changes in files *-raw.svg since 0012 -- J. Purtz --------------66F2611A4B7BB4A55DC73A90 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
On 10.11.20 00:14, David G. Johnston wrote:
Reviewed the first three sections.

template0 - I would remove the schema portions of this and simply note this as being a pristine recovery database in the diagram.
ok

I would drop the word "more" and just say "system schemas".  I would drop pg_toast from the list of system schema and focus on the three user-facing ones.
ok

Instead of "my_schema" (optional) I would do "my_schema" (example)
The terms 'optional' and 'default' are used at various places with their literal meaning. We shall not change them.

Server Graphic
#3 Global SQL Objects: Objects which are shared among all databases within a cluster.
#6 Client applications are prohibited from connecting to template0
ok
#1 If by you we mean "the client" saying that you work "in the cluster data" doesn't really help.  I would emphasize the point that the client sees an endpoint the Postmaster publishes as a port or socket file and that plus the database name defines the endpoint the client connects to (meld with #5)
ok, with some changes.

In lieu of some of the existing detail provided about structure I would add information about configuration and search_path at this level.
Search path appended. But IMO configuration questions are out of scope of this sub-chapter.

I like the object type enumeration - I would suggest grouping them by type in a manner consistent with the documentation and making each one a link to its "primary" section - the SQL Command reference if all else fails.
ok. But don't how to group them in a better way.

The "i" in internal in 51.3 (the image) needs capitalization).
ok

You correctly add both Extension and Collation as database-level objects but they are not mentioned anywhere else.  They do belong here and need to be tied in properly in the text.
Have some courage to the gap, it's an introductory chapter.

The whole thing needs a good pass focused on capitalization.  Both for typos and to decide when various primary concepts like Instance should be capitalized and when not.
'Instance' and 'Cluster' are now uppercase because of their importance, everything else lowercase for better reading.

51.4 - When you look at the diagram seeing /pg/data/base looks really cool, but when reading the prose where both the "pg" and the "base" are omitted and all you get are repeated references to "data", the directory name choice becomes an issue IMO.  I suggest (and changed the attached) to name the actual root directory "pgdata".  You should change the /pg/ directory name to something like ".../tutorial_project/".

The graphic shall reflect the default behavior of PG. Without the parameter -D, initdb creates the new cluster in the directory where PGDATA points to. This is in many cases /var/lib/pgsql/data. Therefore 'data' and its subdirectory 'base' are not my invention but reflects the default situation.

(Diving a little deeper into this issue I noticed that there is a parameter 'cluster_name' in the config file. But it does not change the name of the cluster's root directory, it only changes the names of the running processes. Choosing 'instance_name' instead of 'cluster_name' as the parameter's name would be a better choice imo - but that is not what we are speaking about in the context of the new chapter).

I changed the very first directory in the graphic to visualize the standard behavior; I reverted your recommendation to use 'pgdata' instead of 'data' in the text part.

Since you aren't following alphabetical order anyway I would place pg_tblspc after globals since tablespaces are globals and thus proximity links them here - and pointing out that pg_tblspc holds the data makes stating that global doesn't contain tablespace data unnecessary.
ok

Maybe point out somewhere the the "base/databaseOID" directory represents the default tablespace for each database, which isn't "global", only the non-default tablespaces are considered globals (or just get rid of the mentioned on "non-default tablespace" for now).

ok

more:

1) some changes concerning the nature of connections (52.2: logical perspective). IMO accessing multiple databases within one connection is not a question of configuring, you have to take more actions. But I'm not sure we should mention this at all.

2) you propose to cancel or trim down the paragraphs behind figure 51.2. (cluster, database, schema). I believe that a textual description of this hierarchy is essential for the understanding of the system. Because it isn't described explicitly at a different place, it should remain.

--- snipp -------- from other e-mail ----

MVCC Section

The first paragraph and example in the MVCC section is a good example but seems misplaced - its relationship to MVCC generally is tenuous, rather I would expect a discussion of the serializable isolation mode to follow.

I'm not sure how much detail this section wants to get into given the coverage of concurrency elsewhere in the documentation.  "Not much" would be my baseline.
The paragraph focus on the fact that new row versions are generated instead of locking something. Explaining serialization isolation modes is imo very complicate and out of the scope of this subchapter. If we want to give an overview - in addition to the exiting documentation - it should be a separate subchapter.

I would suggest spelling out what "OLTP" stands for and ideally pointing the user to the glossary for the term.
ok, but not added to glossary. The given explanation "... with a massive number of concurrent write actions" should be sufficient.

Tending more toward a style gripe but the amount of leader phrases and redundancy are at a level that I am noticing them when I read this but do not have the same impression having read large portions of documentation. In particular:
Because I'm not a native English speaker, orthographic and style hits are always welcome.

"When we speak about transaction IDs, you need to know that xids are like sequences."

"But keep in mind that xids are independent of any time measurement — in milliseconds or otherwise. If you dive deeper into PostgreSQL, you will recognize parameters with names such as 'xxx_age'. Despite their names, these '_age' parameters do not specify a period of time but represent a certain number of transactions, e.g., 100 million."

Could just be:  xids are sequences and age computations involving them measure a transaction count as opposed to a time interval.
ok

Then I would consider adding a bit more detail/context here.

xids are 32bit sequences, with a reserved value to handle wrap-around.  There are 4 billion values in the sequence but wrap-around handling must occur every 2 billion transactions. Age computations involving xids measure a transaction count as opposed to a time interval.

I would move the mentioning of "vacuum" to the main paragraph about delete and not solely as a "keep in mind" note.
The mentioning here at the food of the page is a crossover to the next subchapter.

The part before the diagram seems like it should be much shorter, concise, and provide links to the excellent documentation.  The part after the image, and the image itself, are good material, though possibly should be in a main administration chapter instead of an internals chapter.

vacuum: The problem - and one reason for the existence of this subchapter - is that vacuum's documentation is scattered across may pages:

19.4: parameters to configure the server, especially five parameters 'vacuum_cost_xxx'.

19.10: parameters to configure autovacuum.

19.11: parameters to configure client connections, especially five parameters 'vacuum_xxx' concerning their freeze-behavior.

24.1: explains the general necessity of (auto)vacuum and their strategies.

The page about the SQL command VACUUM explains the different options (FULL, FREEZE, ..) and their meaning.

Because of the structure of our documentation as well as the complexity of the issue that's ok. The existing documentation describes every parameter very well, but I'm missing a page where the 'big picture' of vacuum is explained (not necessarily here). It shall show the relationship between the huge number of parameters and an explanation *why* they exists. As far as we don't have such a page within the vacuum documentation the proposed subchapter fills the gap. (The provided graphics can be included multiple times without generating redundancies - here and at arbitrary other places.)


The first bullet of "keep in mind" is both wordy and wrong - in particular "as xids grow old row versions get out of scope over time" doesn't make sense (or rather it only does in the context of wrap-around, not normal visibility).  Having the only mention of bloat be here is also not ideal, it too should be weaved into the main narrative.  The "keep in mind" section here should be a recap of already covered material in a succinct form, nothing should be new to someone who just read the entire section.
ok.

I don't think that usage of exclamation marks (!) is warranted here, though emphasis on the key phrase wouldn't hurt.
ok

Vacuum Section

avoid -> prevent (continued growth)
ok

Autovacuum is enabled by default.  The whole note needs commas.
ok

I'd try to get rid of "at arbitrary point in time"
ok

"Instance." we've already described where instances are previously ("on the server")
ok

The other sections - these seem misplaced for the tutorial, update the main documentation if this information is wholly missing or lacking.  The MVCC chapter can incorporate overview information as it is a strict consequence of that implementation.

Statistics belong elsewhere - the tutorial should not use poor command implementation choices as a guide for user education.

In short, this whole section should not exist and its content moved to more appropriate areas (mainly MVCC).  Vacuum is a tool that one must use but the narrative should be about the system generally.


concerning vacuum section: see my comments above

concerning 'the other sections' (transactions, reliability, backup (plus: someone should add 'replication', I'm not familiar with this issue)): The intention of the chapter is to give a *summary* about PG's essential architecture and about central implementation aspects. This implies that the chapters does not present any new information. They shall only show (or repeat) essential things in their context and explain *why* they are used. In this sense the three chapters may be reasonable. Concerning this, I like to hear some comments from other people.


Attachments:

0013-architecture.patch: complete patch vs. master

0013-architecture.sgml.diff: changes in file architecture.sgml since 0012

0013-images.diff: changes in files *-raw.svg since 0012

--

J. Purtz


--------------66F2611A4B7BB4A55DC73A90-- --------------1BB5DBAD17FB692E25728BB2 Content-Type: text/x-patch; charset=UTF-8; name="0013-architecture.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0013-architecture.patch" diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index 2d4ab85d45..5c3245c0ec 100644 --- a/doc/src/sgml/advanced.sgml +++ b/doc/src/sgml/advanced.sgml @@ -1,7 +1,7 @@ - Advanced Features + Advanced SQL Features Introduction diff --git a/doc/src/sgml/arch-dev.sgml b/doc/src/sgml/arch-dev.sgml index 7883c3cd82..9db0ae2c78 100644 --- a/doc/src/sgml/arch-dev.sgml +++ b/doc/src/sgml/arch-dev.sgml @@ -1,7 +1,7 @@ - Overview of PostgreSQL Internals + Overview of Query Handling Author diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml new file mode 100644 index 0000000000..4bbd6abb8a --- /dev/null +++ b/doc/src/sgml/architecture.sgml @@ -0,0 +1,1524 @@ + + + + Overview of Architecture and Implementation + + + Every DBMS implements basic strategies to ensure a fast + and robust system. This chapter provides an overview of the + basic techniques PostgreSQL uses to + achieve this aim. It does not offer anything which exceeds + the information content of other pages. Instead, it tries to + explain why certain implementation + decisions have been taken. + + + + Collaboration of Processes, RAM, and Files + + In a client/server architecture clients do not have direct access + to database files and the data stored in them. Instead, they send + requests to the server and receive the requested data in the response. + In the case of PostgreSQL, the server + launches a single process for each client connection, referred to as a + Backend process. + Such a Backend process handles the client's requests by acting on the + Shared Memory. + This leads to other activities (file access, WAL, vacuum, ...) of the + Instance. The + Instance is a group of server-side processes acting on a common + Shared Memory. PostgreSQL does not use threading. + + + + All aspects of an Instance are launched and managed using a single primary + process termed the + Postmaster. + It loads configuration files, allocates Shared Memory, and + starts the other collaborating processes of the Instance: + Background Writer, + Checkpointer, + WAL Writer, + WAL Archiver, + Autovacuum, + Statistics Collector, + Logger, and more. + Later, the Postmaster listens on its configured system port and in response + to client connection attempts launches + Backend processes + to which it delegates authentication, communication, and the handling of their requests. + visualizes the processes + of an Instance and the main aspects of their collaboration. + + +
+ Architecture + + + + + + + + + + +
+ + + Client requests like SELECT or + UPDATE usually lead to the + necessity to read or write data. This is carried out + by the client's backend process. Reads involve a page-level + cache, located in Shared Memory (for details see: + ) for the benefit of all processes + in the Instance. Writes also use this cache, in addition + to a journal, called the write-ahead-log or WAL. + + + + Shared Memory is limited in size and it can become necessary + to evict pages. As long as the content of such pages hasn't + changed, this is not a problem. But writes directly modify + the pages in Shared Memory. Modified pages are called dirty + pages (or dirty buffers) and before they can be evicted they + must be written to disk. This happens regularly by the + Checkpointer and the Background Writer processes to ensure + that the disk version of the pages are up-to-date. + The synchronization from RAM to disk consists of two steps. + + + + First, whenever the content of a page changes, a + WAL record + is created from the delta-information (difference between the + old and the new content) and stored in another area of + Shared Memory. The concurrently running WAL Writer process + reads them and appends them to the end of the current + WAL file. + Such sequential writes are faster than writes to random + positions of heap and index files. All WAL records created + from one dirty page must be transferred to disk before the + dirty page itself can be transferred to disk in the second step. + + + + Second, the Instance transfers dirty buffers from Shared Memory to + files. This is the primary task of the + Background Writer process. Because I/O activities can block + other processes, it starts periodically and + acts only for a short period. Doing so, its extensive (and + expensive) I/O activities are spread over time, avoiding + debilitating I/O peaks. The Checkpointer process + also transfers dirty buffers to file. + + + + The Checkpointer process creates + Checkpoints. + A Checkpoint is a point in time when all older dirty buffers, + all older WAL records, and finally a special Checkpoint record + are written and flushed to disk. + Older WAL files are no longer required to recover the system from a crash. + + + + While the Checkpointer ensures that the database system can, + after a crash, restart itself in a valid state, the administrator needs + to handle the case where the heap or other files become + corrupted (and possibly the locally written WAL, though that is + less common). Options and details are covered + in the backup and restore section (). + For our purposes here, just note that the WAL Archiver process + can be enabled and configured to run a script on completed WAL + files — usually to copy them to a remote location. Note + that when a Checkpoint record is written to the WAL the current + file is immediately completed. + + + + The Statistics Collector collects counters about access to + SQL objects like tables, rows, indexes, pages, and more. It + stores the obtained information in system tables. + + + + The Logger writes text lines about serious and less serious + events that may happen during database access, e.g., wrong + password, no permission, long-running queries, etc. + + +
+ + + The logical Perspective: Cluster, Database, Schema + + + A Server contains one or more + Database Clusters + (Clusters + for short). By default each newly initialized Cluster contains three + databases + (one interactive and two templates, see ). + Each database can contain many user-writable + schemas + (public, by name and permissiveness, by default), the system + generated user-facing schemas pg_catalog, + pg_temp, and information_schema, + and some more system schemas. + Tables, + views, and a lot + of other objects uniquely reside in a single schema. + visualizes + this hierarchy. + + + + + Client connections act at the database level and can access + its schemas simultaneously. Special techniques like + foreing data wrapper + or dblink are required + to access multiple databases, even within the same Cluster, + from a single client connection. + + +
+ Cluster, Database, Schema + + + + + + + + + + +
+ + + A Cluster is the outer container for a + collection of databases. Clusters are created by the command + . + + + + template0 is the very first + database of any Cluster. It + is created during the initialization phase of the Cluster. + In a second step, database template1 is generated + as a copy of template0, and finally database + postgres is generated as a copy of + template1. Any + new databases + of the Cluster that a user might need, + such as my_db, will be copied from the + template1 database. Due to the unique + role of template0 as the pristine original + of all other databases, no client is allowed to connect to it. + + + + SQL Objects + are contained in a schema. + Schemas are namespaces for SQL objects and ensure + (with one exception) that the SQL object names are used only once within + their scope across all types of SQL objects. E.g., it is not possible + to have a table employee and a view + employee within the same schema. But it is + possible to have two tables employee in two + different schemas. In this case, the two tables + are separate objects and independent of each + other. The only exception to this cross-type uniqueness is that + unique constraints + and the according unique index + () use the same name. + + + + Some schemas are predefined. public + acts as the default schema and contains all SQL objects + which are created within public or + without using an explicit schema name. public + should not contain user-defined SQL objects. Instead, it is + recommended to create a separate schema that holds individual + objects like application-specific tables or views. To access + objects in such a schema they can be fully qualified, e.g. + my_schema.my_table, or by changing the + schema search path. + + + + pg_catalog is a schema for all tables and views of the + System Catalog. + information_schema is a similar schema. It + contains several tables and views of the System Catalog in a + way that conforms to the SQL standard. + + + + There are many different SQL object types: + database, + schema, + table, + view, + materialized view, + index, + constraint, + sequence, + function, + procedure, + trigger, + role, + data type, + operator, + tablespace, + extension, + foreign data wrapper, + and more. A few of them, the + Global SQL Objects, are outside of the + strict hierarchy: All database names, + all tablespace names, and all + role names are automatically + available throughout the Cluster, not just the database in which + the SQL command was executed. + + shows the relation between the object types. + + +
+ Hierarchy of Internal Objects + + + + + + + + + + +
+ +
+ + + The Physical Perspective: Directories and Files + + + PostgreSQL organizes long-lasting (persistent) + data as well as volatile state information about transactions + or replication actions in the file system. Every + has its root directory + somewhere in the file system. In many cases, the environment + variable PGDATA points to this directory. + The example shown in + uses + data as the name of the cluster's root directory. + + +
+ Directory Structure + + + + + + + + + + +
+ + + The cluster's root directory contains many subdirectories and + some files, all of which are necessary to store long-lasting + as well as temporary data. The root's name can be selected + as desired, but the names of its subdirectories and files + are more or less fix and detertermined by + PostgreSQL. The following + paragraphs describe the most important subdirectories + and files. + + + + base contains one + subdirectory per database. The names of those + subdirectories consist of numbers. These are the internal + Object Identifiers (OID), which are numbers to identify + their definition in the + System Catalog. + + + + Within the database-specific subdirectories of base + there are many files: one or more for every table + and every index. Those files are accompanied by files for the + Free Space Maps + (suffixed _fsm) and + Visibility Maps + (suffixed _vm), which contain optimization information. + + + + Another subdirectory is global. It + contains files with information about + Global SQL Objects. + + + + In pg_tblspc, there are symbolic links + that point to directories that are outside of the root + directory tree, e.g. at a different disc. Files for tables + and indexes of non-default tablespaces reside there. As + previously mentioned, those defined within the default + tablespace reside in the database-specific subdirectories. + + + + The subdirectory pg_wal contains the + WAL files. + They arise and grow in parallel with data changes in the + Cluster and remain as long as + they are required for recovery, archiving, or replication. + + + + The subdirectory pg_xact contains + information about the status of each transaction: + in_progress, committed, + aborted, or sub_committed. + + + + In the root directory + there are also some files. In many cases, the configuration + files of the Cluster are stored here. If the + Instance is up and running, the file + postmaster.pid exists here + (by default) + and contains the process ID (pid) of the + Postmaster which started the Instance. + + + + For more details about the physical implementation + of database objects, see . + + +
+ + + MVCC — Multiversion Concurrency Control + + + In most cases, PostgreSQL databases + support many clients at the same time which makes it necessary to + protect concurrently running requests from unwanted overwriting + of other's data as well as from reading inconsistent data. Imagine an + online shop offering the last copy of an article. Two clients have the + article displayed at their user interface. After a while, but at the same time, + both users decide to put it to their shopping cart or even to buy it. + Both have seen the article, but only one can be allowed to get it. + The database must bring the two requests in a row, permit the access + to one of them, block the other, and inform the blocked client + that the data was changed by a different process. + + + + PostgreSQL implements a + sophisticated technique which avoids locking: + Multiversion Concurrency Control (MVCC). + The advantage of MVCC over technologies that use row locks + becomes evident in multiuser Online Transaction Processing (OLTP) + environments with a massive number of concurrent write + actions. There, MVCC generally performs better than solutions + using locks. In a PostgreSQL + database, reading never blocks writing and writing never + blocks reading, even in the strictest level of transaction + isolation. + + + + Instead of locking rows, the MVCC technique creates + a new version of the row when a data-change takes place. To + distinguish between these two versions, and to track the timeline + of the row, each of the versions contains, in addition to their user-defined + columns, two special system columns, which are not visible + for the usual SELECT * FROM ... command. + The column xmin contains the transaction ID (xid) + of the transaction which created this version of the row. + xmax contains the xid of the transaction which has + deleted this version, or zero if the version is not + deleted. You can read both with the command + SELECT xmin, xmax, * FROM ... . + + + + Xids are sequences (with a reserved value to handle wrap-around + in pre-9.4 PostgreSQL versions). + Age computations involving them measure a transaction + count as opposed to a time interval (in milliseconds or otherwise). + If you dive deeper into PostgreSQL, + you will recognize parameters with names such as 'xxx_age'. + Despite their names, these '_age' parameters do not specify + a period of time but represent a certain number of transactions, + e.g., 100 million. + + + + The description in this chapter simplifies by omitting details. + When many transactions are running simultaneously, things can + get complicated. Sometimes transactions are aborted via + ROLLBACK immediately or after a lot of other activities, sometimes + a single row is involved in more than one transaction, sometimes + a client crashes, sometimes the sequence of xids restarts + from zero, ... . Therefore, every version of a row contains more + system columns and flags, not only xmin + and xmax. + + + + So, what's going on in detail when write access takes place? + shows details concerning + xmin, xmax, and user data. + + +
+ Multiversion Concurrency Control + + + + + + + + + +
+ + + An INSERT command creates the first + version of a row. Besides its user data 'x', + this version contains the ID of the creating transaction + 123 in xmin and + 0 in xmax. + xmin indicates that the version + exists since transaction 123 and + xmax that it is currently not deleted. + + + + Somewhat later, transaction 135 + executes an UPDATE of this row by + changing the user data from 'x' to + 'y'. According to the MVCC principles, + the data in the old version of the row is not changed. + The value 'x' remains as it was before. + Only xmax changes to 135. + Now, this version is treated as valid exclusively for + transactions with xids from 123 to + 134. In addition to the non-occurring + data change in the old version, the UPDATE + creates a new version of the row with its xid in + xmin, 0 in + xmax, and 'y' in the + user data (plus all other user data from the old version). + This new row version is visible to all future transactions. + (Internally, an UPDATE command acts + as a DELETE command followed by + an INSERT command.) + + + + All subsequent UPDATE commands behave + in the same way as the first one: they put their xid in + xmax of the current version, create + a new version with their xid in xmin and + 0 in xmax. + + + + Finally, a row may be deleted by a DELETE + command. Even in this case, all versions of the row remain as + before; nothing is thrown away. Only xmax + of the last version is set to the xid of the DELETE + transaction, which indicates that (if committed) it is only visible to + transactions with xids older than that (from + 142 to 820 in this + example). + + + + In summary, the MVCC technology creates more and more versions + of the same row in the table's heap file and leaves them there, + even after a DELETE command. Only the youngest + version is relevant for all future transactions. But the + system must also preserve some of the older ones for + awhile, because they could still be needed by + transactions which started before the deleting transaction commits. + Over time, also the older ones get out of scope + for ALL transactions and therefore become unnecessary. + Nevertheless, they do exist physically on the disk and occupy + space. They are called dead rows and are part + of the bloat. + + + + Keep in mind: + + + + + + xmin and xmax + indicate the range in which + row versions are valid (visible) for transactions. + This range doesn't imply any direct temporal meaning; + the sequence of xids reflects only the sequence of + transaction begin events. + + + + + + Internally, an UPDATE command acts in the + same way as a DELETE command followed by + an INSERT command. + + + + + + Nothing is removed — with the consequence that the database + occupies more and more disk space. It is obvious that + this behavior has to be corrected in some way. The next + chapter explains how vacuum and + autovacuum fulfill this task. + + + + + +
+ + + Vacuum + + + As we have seen in the previous chapter, the database + tends to occupy more and more disk space, caused by + bloat. + This chapter explains how the SQL command + VACUUM and the automatically running + Autovacuum processes clean up + and prevent continued growth. + + + + + Autovacuum runs automatically, by + default. Its default parameters as well as those for + VACUUM are appropriate for most standard + situations. Therefore a novice database manager can + skip the rest of this chapter which explains + a lot of details. + + + + + Client processes can issue the SQL command VACUUM + at any time. DBAs do this when they recognize + special situations, or they start it in batch jobs which run + periodically. Additionally, there is a constantly running + Autovacuum daemon which is part of the + Instance. It continuously + monitors the state of all databases based on values that are collected by the + Statistics Collector + and starts Autovacuum processes whenever it detects + certain situations. Thus, it's a dynamic behavior of + PostgreSQL with the intention to tidy + up whenever it is appropriate. + + + + VACUUM, as well as Autovacuum, don't just eliminate + bloat. They perform additional tasks for minimizing future + I/O activities of themselves as well as of other processes. + This extra work can be done in a very efficient way since in most + cases the expensive physical access to pages has taken place anyway + to eliminate bloat. The additional operations are: + + + + + + + Freeze: Mark certain row versions + as frozen. This means that they + are always treated as valid (visible) independent from + the wraparound problem (see below). + + + + + + Visibility Map and + Free Space Map: Log information about + the state of the handled pages in two additional files, the + Visibility Map and the Free Space Map. + + + + + + Statistics: Collect statistics about the + number of rows per table, the distribution of values, and so on, + as the basis for decisions of the query planner. + + + + + + + The eagerness — or 'aggressiveness' — of the + operations for eliminating bloat and + freeze is controlled by configuration + parameters, runtime flags, and in extreme situations by + the processes themselves. Because vacuum operations typically are I/O + intensive, which can hinder other activities, Autovacuum + avoids performing many vacuum operations in bulk. Instead, + it carries out many small actions with delay points in between. + The SQL command VACUUM runs immediately + and without any time delay. + + + Eliminate Bloat + + + To determine which of the row versions are no longer needed, the + elimination operation must evaluate xmax + against several criteria which must all apply: + + + + + + xmax must be different from zero because a + value of zero indicates that the row version is still valid. + + + + + + xmax must contain an xid which is older + than the oldest xid of all currently running transactions + (min(pg_stat_activity.backend_xmin)). + This criterion guarantees that no existing or upcoming transaction + will have read or write access to this row version. + + + + + + The transaction of xmax must be committed. If it was rollback-ed, + this row version is treated as valid. + + + + + + If there is the situation that the row version is part of + multiple transactions, special care and some more actions + must be taken, see: . + + + + + + + After the vacuum operation detects an unused row version, it + marks its space as free for future use of writing actions. Only + in rare situations (or in the case of VACUUM FULL), + is this space released to the operating system. In most cases, + it remains occupied by PostgreSQL + and will be used by future INSERT or + UPDATE commands to this table. + + + + Which actions start the elimination of bloat? + + + + + + When a client issues the SQL command VACUUM + in its default format, i.e., without any option. To boost performance, + in this and the next case VACUUM does not + read and act on all pages of the heap file. + The Visibility Map, which is very compact and therefore fast to read, + contains information about which pages have no deleted row versions, and + can be skipped by VACUUM. + + + + + + When a client issues the SQL command VACUUM + with the option FREEZE. (In this case, + it undertakes many more actions, see + Freeze Row Versions.) + + + + + + When a client issues the SQL command VACUUM + with the option FULL. + In this mode, an exclusive lock is taken, and + the whole table is copied to a different file, skipping all outdated row + versions. All bloat is thereby eliminated, which + may lead to a significant reduction of used disk space. + The old file is deleted. + + + + + + When an Autovacuum process acts. For optimization + purposes, it considers the Visibility Map in the same way as + VACUUM. Additionally, it ignores tables with few modifications; + see , + which defaults to 50 rows and + , + which defaults to 20%. + + + + + + + + This logic only applies to row versions of the heap. Index entries + don't use xmin/xmax. Nevertheless, such index + entries, which would lead to outdated row versions, are cleaned up + accordingly. + + + + The above descriptions omit the fact that xids on a real computer + have a limited size, and after + a certain number of transactions they are forced to restart + from the beginning, which is called wraparound. + Therefore the terms 'old transaction' / 'young transaction' does + not always correlate with low / high values of xids. Near to + wraparound point, there are cases where xmin has + a higher value than xmax, although their meaning + is said to be older than xmax. + + +
+ Cyclic usage of XIDs + + + + + + + + + + +
+ + Freeze Row Versions + + + The use of a limited range of IDs for transactions leads + to the necessity to restart the sequence sooner or later. + This not only has the rare consequence, previously + described, that sometimes xmin is + higher than xmax. A far + more critical problem is that whenever the system has + to evaluate a WHERE condition, it must decide which row + version is valid (visible) from the perspective of the + transaction of the query. If a wraparound couldn't happen, + this decision would be relatively easy: the xid + must be between xmin and xmax, + and the corresponding transactions of xmin + and xmax must be committed. However, + PostgreSQL has to consider the + possibility of wraparound. + Therefore the decision becomes more complex. The general + idea of the solution is to use the 'between + xmin and xmax' + comparison only during the youngest period of the row + versions lifetime and afterward replace it with a + 'valid forever' flag in its header. + + + + + + + As a first step, PostgreSQL + divides the complete range of + possible xids into two halves with the two split-points + 'txid_current' and 'txid_current + 2^31'. The half behind + 'txid_current' is considered to represent xids of the + 'past' and the half ahead of 'txid_current' those of the + 'future'. Those of the 'past' are valid (visible) and those + of the 'future' not. + + + + + + With each newly created transaction the two split-points + move forward. If 'txid_current + 2^31' reached a + row version with xmin equal to that value, it would + immediately jump from 'past' to 'future' and would be + no longer visible! + + + + + + If not handled in some way, data inserted many transactions ago + would become invisibile. The vacuum operation freeze + avoids this long before the split-point is reached by setting + a flag in the header of the row version which avoids + future comparison of its xmin/xmax and indicates + that the version is valid not only in the 'past'-half + but also in the 'future'-half, as well as in all coming + epochs. + + + + + + Which row versions can be frozen by the vacuum operation? + Again, several criteria must be checked, and all must be met. + + + + + + xmax must be zero because only + non-deleted rows can be visible 'forever'. + + + + + + xmin must be older than all currently + existing transactions. This guarantees that no existing + transaction can modify or delete the version. + + + + + + The transactions of xmin and + xmax must be committed. + + + + + + + At what point in time does the freeze operation take place? + + + + + When a client issues the SQL command VACUUM + with its FREEZE option. In this case, all + pages are processed that are marked in the Visibility Map + as potentially having unfrozen rows. + + + + + When a client issues the SQL command VACUUM without + any options but there are xids older than + + (default: 150 million) minus + + (default: 50 million). + As before, all pages are processed that are + marked in the Visibility Map to potentially having unfrozen + rows. + + + + + When an Autovacuum process runs. Such a process acts in one + of two modes: + + + + + + In the normal mode, it skips + pages with row versions that are younger than + + (default: 50 million) and works only on pages where + all xids are older. The skipping of young xids prevents + work on such pages, which are likely to be changed + by one of the future SQL commands. + + + + + The process switches + to an aggressive mode if it recognizes + that for the processed table the oldest xid exceeds + + (default: 200 million). The value of the oldest unfrozen + xid is stored per table in pg_class.relfrozenxid. + In this aggressive mode Autovacuum + processes all such pages of the selected table that are marked + in the Visibility Map to potentially have bloat or unfrozen rows. + + + + + + + + + + In the first two cases and with Autovacuum in + aggressive mode, the system knows + to which value the oldest unfrozen xid has moved forward and + logs the value in pg_class.relfrozenxid. + The distance between this value and the 'txid_current' split + point becomes smaller, and the distance to 'txid_current + 2^31' + becomes larger than before. + + +
+ Freeze + + + + + + + + + + +
+ + Protection against Wraparound Failure + + + The Autovacuum processes are initiated by the constantly running + Autovacuum daemon. If the daemon detects that for a table + autovacuum_freeze_max_age is exceeded, it + starts an Autovacuum process in aggressive mode + (see above) — even if Autovacuum is disabled. + + + Visibility Map and Free Space Map + + + The Visibility Map + (VM) contains two flags — stored as + two bits — for each page of the heap file. The + first bit indicates that the associated page does + not contain any bloat. The second bit indicates + that the page contains only frozen row versions. + + + + Please consider two details. First, in most cases a page + contains many rows or row versions. + However, the flags are associated with the page, + not with an individual row version. The flags are set + only under the condition that they are valid for ALL + row versions of the page. Second, since there + are only two bits per page, the VM is considerably + smaller than the heap. + + + + The setting of the flags is silently done by VACUUM + and Autovacuum during their bloat and freeze operations. + This is done to speed up future vacuum actions, + regular access to heap pages, and some access to + the index. Every data-modifying operation on any row + version of the page clears the flags. + + + + The Free Space Map + (FSM) tracks the amount of free space per page. It is + organized as a highly condensed b-tree of (rounded) sizes. + Whenever VACUUM or Autovacuum changes + the free space on any processed page, they log the new + values in the FSM in the same way as all other writing + processes. + + + Statistics + + + Statistical information helps the Query Planner to make optimal + decisions for the generation of execution plans. This + information can be gathered with the SQL commands + ANALYZE or VACUUM ANALYZE. + But Autovacuum processes also gather + such information. Depending on the percentage of changed rows + , + and minimum number of changed rows , + the Autovacuum daemon starts Autovacuum processes to collect + statistics per table. The automatic analysis + allows PostgreSQL to + adapt query execution to changing circumstances. + + + + For more details about vacuum operations, especially for its + numerous parameters, see . + + +
+ + + Transactions + + Transactions + are a fundamental concept of relational database systems. + Their essential point is that they bundle multiple + read- or write-operations into a single all-or-nothing + operation. Furthermore, they separate and protect concurrent + actions of different connections from each other. Thereby + they implement the ACID paradigm. + + + + In PostgreSQL there are two ways + to establish a transaction. The explicit way uses the keywords + BEGIN and + COMMIT (respectively + ROLLBACK) before + and after a sequence of SQL statements. The keywords mark + the transaction's start- and end-point. On the other hand, you + can omit the keywords. This is the implicit way, where + every single SQL command automatically establishes a new + transaction. + + +BEGIN; -- establish a new transaction +UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; +UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; +COMMIT; -- finish the transaction + +-- this UPDATE runs as the only command of a separate transaction ... +UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; + +-- ... and this one runs in another transaction +UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; + + + + + As mentioned, the primary property of a transaction is its + atomicity: either all or none of its operations succeed, + regardless of the fact that it may consist of a lot of + different write-operations, and each such operation may + affect many rows. As soon as one of the + operations fails, all previous operations fail also, which + means that all modified rows retain their values as of the + beginning of the transaction. + + + + The atomicity also affects the visibility of changes. No + connection running simultaneously with a data modifying + transaction will ever see any change before the + transaction successfully executes a COMMIT + — even in the lowest + isolation level + of transactions. PostgreSQL + never shows uncommitted changes to other connections. + + + + The situation regarding visibility is somewhat different + from the point of view of the modifying transaction. + A SELECT command issued inside a + transaction shows all changes done so far by this + transaction. + + + How does it work? + + + Every INSERT, UPDATE, + and DELETE command creates new row + versions — according to the MVCC rules. This + creates the risk that other transactions may see the + new row versions, and after a while and some more + activities of the modifying transaction they may see the + next row versions. Results would be a kind of 'moving + target' which would be contrary to the all-or-nothing + principle. + + + + PostgreSQL overcomes this + problem by showing only such row versions to other + transactions whose originating transaction has + successfully committed. It skips all row versions of + uncommitted transactions. And + PostgreSQL solves one more + problem. Even the single COMMIT + command needs a short time interval for its execution. + Therefore its critical 'dead-or-survival' phase + runs in a privileged mode that cannot be + interrupted by other processes. + + + What are the benefits? + + + Transactions relieve applications from many standard + actions that would otherwise need to be implemented for nearly every use case. + + + + Business logic often contains strong, but for a computer, + relative abstract requirements. The above example shows + the transfers of some money from one account to another. + It is obvious + that the decrease of the one account and the increase of the + other must be indivisible. Nevertheless, there is no particular + need for an application to do something to ensure the + atomicity + of this behavior. It's enough to surround them with + BEGIN and COMMIT. + + + + Applications often demand the feature of 'undoing' + previously taken actions under some application-specific + conditions. In such cases, the application simply issues a + ROLLBACK command instead of a + COMMIT. The ROLLBACK + cancels the transaction, and all changes made so far remain + invisible forever; it is as if they had never happened. There + is no need for the application to log its activities and + undo every step of the transaction separately. + + + + Transactions ensure that the + database always remains + consistent. + Declarative rules like + primary- or + foreign keys, + checks, + other constraints, or + triggers + are part of the all-or-nothing nature of transactions. + + + + There is the additional feature + 'isolation level', + which separates transactions from each other in certain ways. + It automatically prevents applications from some strange + situations. + + + + Lastly, it is worth noticing that changes done by a + committed transaction will survive all failures in the application or + the Database Cluster. The next chapter explains the + durability + guarantees. + + + + + Reliability + + + Nothing is perfect and failures inevitably happen. + However, the most common types of failure are + well known and PostgreSQL + implements strategies to overcome them. + Such strategies use parts of the previously presented + techniques MVCC and transaction-rollback, plus additional + features. + + + Failures at the client side + + A client + can fail in different ways. Its hardware can get damaged, + the power supply can fail, the network connection to the + server can break, or the client application may run into + a severe software error like a null pointer exception. + Because PostgreSQL uses a + client/server architecture, no direct problem for the + database will occur. In all of these cases, the + Backend process, + which is the client's counterpart at the server side, + may recognize that the network connection is no longer + working, or it may run into a timeout after a while. It + terminates, and there is no harm to the database. As + usual, uncommitted data changes initiated by this client + are not visible to any other client. + + + Failures at the server-side + + Instance failure + + The Instance may suddenly fail because of power off + or other problems. This will affect all running processes, the RAM, + and possibly the consistency of disk files. + + + + After a restart, PostgreSQL + automatically recognizes that the last shutdown of the + Instance did not happen as expected: files might not be + closed properly and the postmaster.pid + file unexpectedly exists. PostgreSQL + tries to clean up the situation. This is possible because + all changes in the database are stored twice. First, + the WAL files contain them as a chronology of + WAL records, + which include the new data values and information about commit + actions. The WAL records are written first. Only then is + the data itself written to the heap and index files. + In contrast to the WAL records, this part may or may + not have been transferred entirely from Shared Memory + to the files. + + + + The automatic recovery searches within the WAL files for + the latest + checkpoint. + This checkpoint signals that the database files are in + a consistent state, especially that all WAL records up to + this point were successfully stored in heap and index files. Starting + here, the recovery process copies any remaining WAL records + to heap and index. The result is that the heap files contain all + changes recorded to the WAL and reach a consistent state. Changes of committed + transactions are visible; those of uncommitted transactions + are also in the files, but - as usual - they are never seen + by any of the following transactions because uncommitted + changes are never shown. These recovery actions run + automatically; it is not necessary that a + database administrator configure or start anything by + himself. + + + Disk crash + + If a disk crashes, the course of action described previously + cannot work: it is likely that the WAL files and/or the + data and index files are no longer available. The + database administrator must take special actions to + prepare for such a situation. + + + + They obviously need a backup. How to take such a backup + and use it as a starting point for a recovery of the + Cluster is explained in more detail in the next + chapter. + + + Disk full + + It is conceivable that over time the disk gets full, + and there is no room for additional data. In this case, + PostgreSQL stops accepting + data-modifying commands or even terminates completely. + Committed data is neither lost nor corrupted. + + + + To recover from such a situation, the administrator should + remove unused files from this disk. But they should never + delete files from the + data directory. + Nearly all of them are necessary for the consistency + of the database. + + + High availability + + Database servers can work together to allow a second + server to quickly take over the workload if the + primary server fails for whatever reason + (high availability), + or to allow several computers to serve the same data + for the purpose of load balancing. + + + + + + Backup + + + Taking backups is a basic task of database maintenance. + PostgreSQL supports + three different strategies; each has its own + strengths and weaknesses. + + + + File system level backup + + + + + Logical backup via pg_dump + + + + + Continuous archiving based on pg_basebackup + and WAL files + + + + + + File system level backup + + You can use any appropriate OS tool to create a + copy + of the Cluster's directory structure and files. In + case of severe problems such a copy can serve as + the source of recovery. But in order to get a + USABLE backup by this method, + the database server MUST be + shut down during the complete runtime of the copy + command! + + + + The obvious disadvantage of this method is that there + is a downtime. + The other two strategies run during regular operating + times. + + + Logical backup via pg_dump + + The tool pg_dump is able to take a + copy + of the complete Cluster or certain parts of it. It stores + the copy in the form of SQL commands like CREATE + and COPY. It runs in + parallel with other processes, in its own transaction. + + + + The output of pg_dump may be used as + input of psql to restore the data + (or to copy it to another database). + + + + The main advantage over the other two methods is that it + can pick parts of the Cluster, e.g., a single table or one + database. The other two methods work only at the level of + the complete Cluster. + + + Continuous archiving based on pg_basebackup and WAL files + + This method + is the most sophisticated and most complex one. It + consists of two phases. + + + + First, you need to create a so-called + basebackup with the tool + pg_basebackup. The result is a + directory structure plus files which contain a + consistent copy of the original Cluster. + pg_basebackup runs in + parallel with other processes in its own transaction. + + + + The second step is recommended but not necessary. All + changes to the data are stored in WAL files. If you + continuously save such WAL files, you have the history + of the Cluster. This history can be applied to a + basebackup in order to recreate + any state of the Cluster between the time of + pg_basebackup's start time and + any later point in time. This technique + is called 'Point-in-Time Recovery (PITR)'. + + + + If configured, the + Archiver process + will automatically copy every single WAL file to a save location. + Its configuration + consists mainly of a string that contains a copy command + in the operating system's syntax. In order to protect your + data against a disk crash, the destination location + of a basebackup as well as of the + archived WAL files should be on a + disk which is different from the data disk. + + + + If it becomes necessary to restore the Cluster, you have to + copy the basebackup and the archived WAL files to + their original directories. The configuration of this + recovery procedure + contains a string with the reversed copy command: from + archive location to database location. + + + + + + +
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 38e8aa0bbf..7490d3c9c2 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -80,6 +80,7 @@ %allfiles; + diff --git a/doc/src/sgml/images/cluster-db-schema-ink-svgo.svg b/doc/src/sgml/images/cluster-db-schema-ink-svgo.svg new file mode 100644 index 0000000000..487eec47e1 --- /dev/null +++ b/doc/src/sgml/images/cluster-db-schema-ink-svgo.svg @@ -0,0 +1,184 @@ + + + Server (Hardware, Container, or VM) + + + + + + + + + + + schema 'public' + + + tables, views, ... + + + + System schemas + + + + + + + schema 'public' + + + tables, views, ... + + + + System schemas + + + + 'my_schema' (optional) + + + tables, views, ... + + + + + UML Note + + + + + + UML Note + + + + + + + + + + Server (Hardware, Container, or VM) + + + + cluster 'data' (default, managed by one instance) + + + + cluster 'cluster_2' (optional, managed by a different instance) + + + + + database 'template0' + + + + 'template0' is a pristine + + + and unchangeable template. + + + It is used to generate + + + database 'template1'. + + + + + + database 'template1' + + + + + + database 'postgres' + + + + + + database 'my_db' (optional) + + + + + + Global SQL objects + + + + + + + + + 1) + + + By default, clients connect via port '5432' to Cluster 'data', + + + database 'postgres', schema 'public'. Of course, they can + + + specify other connection properties. + + + 2) + + + System schemas: pg_catalog, information_schema, + + + pg_temp, ... + + + 3) + + + Global SQL objects: Objects which are shared among all + + + databases within a Cluster. + + + 4) + + + The command 'initdb' creates a new cluster with the three + + + databases 'template0', 'template1', and 'postgres'. The command + + + 'createdb' creates a new database. + + + 5) + + + If multiple clusters are active on one server at the same time, + + + each one is managed by an individual instance. Each such instance + + + uses a different port. + + + 6) + + + Client application are prohibited from connecting to 'template0'. + + + diff --git a/doc/src/sgml/images/cluster-db-schema-ink.svg b/doc/src/sgml/images/cluster-db-schema-ink.svg new file mode 100644 index 0000000000..05623a4092 --- /dev/null +++ b/doc/src/sgml/images/cluster-db-schema-ink.svg @@ -0,0 +1,528 @@ + + + + + + image/svg+xml + + Server (Hardware, Container, or VM) + + + + + Server (Hardware, Container, or VM) + + + + + + + + + + + + + + + schema 'public' + tables, views, ... + + System schemas + + + + + + + + schema 'public' + tables, views, ... + + System schemas + + 'my_schema' (optional) + tables, views, ... + + + UML Note + + + + UML Note + + + + + + + + + + + + + + Server (Hardware, Container, or VM) + + + + cluster 'data' (default, managed by one instance) + + + + cluster 'cluster_2' (optional, managed by a different instance) + + + + + database 'template0' + + 'template0' is a pristine + and unchangeable template. + It is used to generate + database 'template1'. + + + + + database 'template1' + + + + + database 'postgres' + + + + + database 'my_db' (optional) + + + + + Global SQL objects + + + + + + + + 1) + By default, clients connect via port '5432' to Cluster 'data', + database 'postgres', schema 'public'. Of course, they can + specify other connection properties. + 2) + System schemas: pg_catalog, information_schema, + pg_temp, ... + 3) + Global SQL objects: Objects which are shared among all + databases within a Cluster. + 4) + The command 'initdb' creates a new cluster with the three + databases 'template0', 'template1', and 'postgres'. The command + 'createdb' creates a new database. + 5) + If multiple clusters are active on one server at the same time, + each one is managed by an individual instance. Each such instance + uses a different port. + 6) + Client application are prohibited from connecting to 'template0'. + + diff --git a/doc/src/sgml/images/cluster-db-schema-raw.svg b/doc/src/sgml/images/cluster-db-schema-raw.svg new file mode 100644 index 0000000000..8534b18ec2 --- /dev/null +++ b/doc/src/sgml/images/cluster-db-schema-raw.svg @@ -0,0 +1,191 @@ + + + + Server (Hardware, Container, or VM) + + + + + + + + + + + + + + + + + + + + + + schema 'public' + tables, views, ... + + + System schemas + + + + + + + + + + schema 'public' + tables, views, ... + + + System schemas + + + 'my_schema' (optional) + tables, views, ... + + + + UML Note + + + + + UML Note + + + + + + + + + + + + + + + + + + Server (Hardware, Container, or VM) + + + + + cluster 'data' (default, managed by one instance) + + + + cluster 'cluster_2' (optional, managed by a different instance) + + + + + + + database 'template0' + + 'template0' is a pristine + and unchangeable template. + It is used to generate + database 'template1'. + + + + + + database 'template1' + + + + + + database 'postgres' + + + + + + database 'my_db' (optional) + + + + + + Global SQL objects + + + + + + + + + + 1) + By default, clients connect via port '5432' to Cluster 'data', + database 'postgres', schema 'public'. Of course, they can + specify other connection properties. + + 2) + System schemas: pg_catalog, information_schema, + pg_temp, ... + + 3) + Global SQL objects: Objects which are shared among all + databases within a Cluster. + + 4) + The command 'initdb' creates a new cluster with the three + databases 'template0', 'template1', and 'postgres'. The command + 'createdb' creates a new database. + + 5) + If multiple clusters are active on one server at the same time, + each one is managed by an individual instance. Each such instance + uses a different port. + + 6) + Client application are prohibited from connecting to 'template0'. + + + + diff --git a/doc/src/sgml/images/directories-ink-svgo.svg b/doc/src/sgml/images/directories-ink-svgo.svg new file mode 100644 index 0000000000..a59198c66f --- /dev/null +++ b/doc/src/sgml/images/directories-ink-svgo.svg @@ -0,0 +1,164 @@ + + + Directory structure of a cluster + + + + + + Directory + + + + + + + File + + + + + + + + + + + Directory Structure + + + + + /var/lib/pgsql/ + + + An arbitrary directory + + + + + + data/ + + + Cluster's root directory (with an arbitrary name; see: PGDATA) + + + + + + base/ + + + Subdirectory containing per-database subdirectories + + + + + + 1/ + + + Subdirectory for data in default-tablespace of first database 'template0' + + + + + + 12992/ + + + Subdirectory for data in default-tablespace of second database 'template1' + + + + + + 12999/ + + + Subdirectory for data in default-tablespace of third database 'postgres' + + + + + + nnnnn/ + + + Optional: one subdirectory for each new database, e.g. '23456' for 'my_db' + + + + + + global/ + + + Subdirectory for information about Global SQL Objects + + + + + + pg_tblspc/ + + + Subdirectory for symbolic links to non-default tablespaces + + + + + + pg_wal/ + + + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + + + + + + pg_xact/ + + + Subdirectory for transaction commit status ('pg_clog' before version 10) + + + + + + pg_... / + + + Some more subdirectories + + + + + + + 'postmaster.pid' and other files with cluster-wide relevance + + + + + + ... + + + Same or another arbitrary directory + + + + + + cluster_2/ + + + Root of another cluster + + + diff --git a/doc/src/sgml/images/directories-ink.svg b/doc/src/sgml/images/directories-ink.svg new file mode 100644 index 0000000000..615a6d214a --- /dev/null +++ b/doc/src/sgml/images/directories-ink.svg @@ -0,0 +1,403 @@ + + + + + + image/svg+xml + + Directory structure of a cluster + + + + + Directory structure of a cluster + + + + + Directory + + + + + + File + + + + + + + + + + + + Directory Structure + + + + /var/lib/pgsql/ + An arbitrary directory + + + + data/ + Cluster's root directory (with an arbitrary name; see: PGDATA) + + + + base/ + Subdirectory containing per-database subdirectories + + + + + 1/ + Subdirectory for data in default-tablespace of first database 'template0' + + + + 12992/ + Subdirectory for data in default-tablespace of second database 'template1' + + + + 12999/ + Subdirectory for data in default-tablespace of third database 'postgres' + + + + nnnnn/ + Optional: one subdirectory for each new database, e.g. '23456' for 'my_db' + + + + global/ + Subdirectory for information about Global SQL Objects + + + + pg_tblspc/ + Subdirectory for symbolic links to non-default tablespaces + + + + pg_wal/ + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + + + + pg_xact/ + Subdirectory for transaction commit status ('pg_clog' before version 10) + + + + pg_... / + Some more subdirectories + + + + + 'postmaster.pid' and other files with cluster-wide relevance + + + + + ... + Same or another arbitrary directory + + + + cluster_2/ + Root of another cluster + + diff --git a/doc/src/sgml/images/directories-raw.svg b/doc/src/sgml/images/directories-raw.svg new file mode 100644 index 0000000000..5cce551b0c --- /dev/null +++ b/doc/src/sgml/images/directories-raw.svg @@ -0,0 +1,150 @@ + + + + Directory structure of a cluster + + + + + + + + Directory + + + + + + + File + + + + + + + + + + + + + + + + Directory Structure + + + + + /var/lib/pgsql/ + An arbitrary directory + + + + + data/ + Cluster's root directory (with an arbitrary name; see: PGDATA) + + + + + base/ + Subdirectory containing per-database subdirectories + + + + + + 1/ + Subdirectory for data in default-tablespace of first database 'template0' + + + + 12992/ + Subdirectory for data in default-tablespace of second database 'template1' + + + + 12999/ + Subdirectory for data in default-tablespace of third database 'postgres' + + + + nnnnn/ + Optional: one subdirectory for each new database, e.g. '23456' for 'my_db' + + + + + global/ + Subdirectory for information about Global SQL Objects + + + + + pg_tblspc/ + Subdirectory for symbolic links to non-default tablespaces + + + + + pg_wal/ + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + + + + + pg_xact/ + Subdirectory for transaction commit status ('pg_clog' before version 10) + + + + + pg_... / + Some more subdirectories + + + + + + 'postmaster.pid' and other files with cluster-wide relevance + + + + + + ... + Same or another arbitrary directory + + + + + cluster_2/ + Root of another cluster + + + diff --git a/doc/src/sgml/images/freeze-ink-svgo.svg b/doc/src/sgml/images/freeze-ink-svgo.svg new file mode 100644 index 0000000000..b1543ffae5 --- /dev/null +++ b/doc/src/sgml/images/freeze-ink-svgo.svg @@ -0,0 +1,84 @@ + + + Freeze + + + + + + + + + + + + Freeze to keep visible + + + + + | (0) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > (1) (5) | (2) | (3) | (4) + + + + PAST + + + FUTURE + + + + + + + + + + + + + + + + + + + + 0: 0 .. 2 ^ 32 - 1 + + + 1: txid_current + 2 ^ 31 (split-point) + + + 2: autovacuum_freeze_max_age (200 mio.) + + + 3: vacuum_freeze_table_age (150 mio.) + + + 4: vacuum_freeze_min_age (50 mio.) + + + 5: txid_current (split-point, youngest xid) + + + per table: pg_class.relfrozenxid must be between (1) and (5); + + + normally it is between (3) and (4) + + + + Unfrozen xid + + + + Frozen xid + + + (figure is out of scale) + + + diff --git a/doc/src/sgml/images/freeze-ink.svg b/doc/src/sgml/images/freeze-ink.svg new file mode 100644 index 0000000000..316703d069 --- /dev/null +++ b/doc/src/sgml/images/freeze-ink.svg @@ -0,0 +1,365 @@ + + + + + + image/svg+xml + + Freeze + + + + + Freeze + + + + + + + + + + + + + + + + Freeze + to keep visible + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + (1) + (5) + | + (2) + | + (3) + | + (4) + + + + PAST + FUTURE + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0: 0 .. 2 ^ +32 + - 1 + 1: txid_current + 2 ^ 31 (split-point) + 2: autovacuum_freeze_max_age (200 mio.) + 3: vacuum_freeze_table_age (150 mio.) + 4: vacuum_freeze_min_age (50 mio.) + 5: txid_current (split-point, youngest xid) + per table: pg_class.relfrozenxid + must + be between (1) and (5); + normally it is between (3) and (4) + + Unfrozen xid + + Frozen xid + (figure is out of scale) + + diff --git a/doc/src/sgml/images/freeze-raw.svg b/doc/src/sgml/images/freeze-raw.svg new file mode 100644 index 0000000000..9482167c3f --- /dev/null +++ b/doc/src/sgml/images/freeze-raw.svg @@ -0,0 +1,123 @@ + + + + Freeze + + + + + + + + + + + + + + + + + + + + + + Freeze to keep visible + + + + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + (1) + (5) + + | + (2) + + | + (3) + + | + (4) + + + + + + PAST + FUTURE + + + + + + + + + + + + + + + + + + + + + + 0: 0 .. 2 ^ 32 - 1 + 1: txid_current + 2 ^ 31 (split-point) + 2: autovacuum_freeze_max_age (200 mio.) + 3: vacuum_freeze_table_age (150 mio.) + 4: vacuum_freeze_min_age (50 mio.) + 5: txid_current (split-point, youngest xid) + per table: pg_class.relfrozenxid + must be between (1) and (5); + normally it is between (3) and (4) + + + Unfrozen xid + + + Frozen xid + + (figure is out of scale) + + + diff --git a/doc/src/sgml/images/internal-objects-hierarchy-ink-svgo.svg b/doc/src/sgml/images/internal-objects-hierarchy-ink-svgo.svg new file mode 100644 index 0000000000..e353ce4d87 --- /dev/null +++ b/doc/src/sgml/images/internal-objects-hierarchy-ink-svgo.svg @@ -0,0 +1,83 @@ + + + Hierarchy of Internal Objects + + + + + Hierarchy of Internal Objects + + + + + + Cluster + + + + + Database Names + + + + + + Tablespace + + + + + + Replication Origins + + + + + + Subscription for + + + Logical Replication + + + + + + Role + + + + + + + Database + + + + + Extension + + + + + + Collation + + + + + + Schema + + + + + Table, View, ... + + + + + + diff --git a/doc/src/sgml/images/internal-objects-hierarchy-ink.svg b/doc/src/sgml/images/internal-objects-hierarchy-ink.svg new file mode 100644 index 0000000000..3429266782 --- /dev/null +++ b/doc/src/sgml/images/internal-objects-hierarchy-ink.svg @@ -0,0 +1,255 @@ + + + + + + image/svg+xml + + Hierarchy of Internal Objects + + + + + + Hierarchy of Internal Objects + + + + + + Hierarchy of Internal Objects + + + + + Cluster + + + Database Names + + + + Tablespace + + + + Replication Origins + + + + Subscription for + Logical Replication + + + + Role + + + + + Database + + + + Extension + + + + Collation + + + + Schema + + + Table, View, ... + + + + + + diff --git a/doc/src/sgml/images/internal-objects-hierarchy-raw.svg b/doc/src/sgml/images/internal-objects-hierarchy-raw.svg new file mode 100644 index 0000000000..78850283d9 --- /dev/null +++ b/doc/src/sgml/images/internal-objects-hierarchy-raw.svg @@ -0,0 +1,95 @@ + + + + Hierarchy of Internal Objects + + + + + + + + Hierarchy of Internal Objects + + + + + + + + Cluster + + + + Database Names + + + + + Tablespace + + + + + Replication Origins + + + + + Subscription for + Logical Replication + + + + + Role + + + + + + + Database + + + + + Extension + + + + + Collation + + + + + Schema + + + + Table, View, ... + + + + + + + diff --git a/doc/src/sgml/images/mvcc-ink-svgo.svg b/doc/src/sgml/images/mvcc-ink-svgo.svg new file mode 100644 index 0000000000..6ff6a4da79 --- /dev/null +++ b/doc/src/sgml/images/mvcc-ink-svgo.svg @@ -0,0 +1,151 @@ + + + MVCC + + + + + + + + + + + + + + + + + T 123 : INSERT + + + + + 123 + + + 0 + + + 'x' + + + + + T 135 : UPDATE + + + + + 135 + + + 0 + + + 'y' + + + + 123 + + + 135 + + + 'x' + + + + + T 142 : UPDATE + + + + + 142 + + + 0 + + + 'z' + + + + 135 + + + 142 + + + 'y' + + + + 123 + + + 135 + + + 'x' + + + + + T 821 : DELTE + + + + + 142 + + + 821 + + + 'z' + + + + 135 + + + 142 + + + 'y' + + + + 123 + + + 135 + + + 'x' + + + + + + Legend + + + + xmin + + + xmax + + + data + + + + diff --git a/doc/src/sgml/images/mvcc-ink.svg b/doc/src/sgml/images/mvcc-ink.svg new file mode 100644 index 0000000000..80ae8d29e0 --- /dev/null +++ b/doc/src/sgml/images/mvcc-ink.svg @@ -0,0 +1,398 @@ + + + + + + image/svg+xml + + MVCC + + + + + MVCC + + + + + + + + + + + + + + + + + + + + + + + T + 123 +: INSERT + + + + 123 + 0 + 'x' + + + + T + 135 +: UPDATE + + + + 135 + 0 + 'y' + + 123 + 135 + 'x' + + + + T + 142 +: UPDATE + + + + 142 + 0 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + T + 821 +: DELTE + + + + 142 + 821 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + + Legend + + xmin + xmax + data + + + diff --git a/doc/src/sgml/images/mvcc-raw.svg b/doc/src/sgml/images/mvcc-raw.svg new file mode 100644 index 0000000000..ee27e3df7b --- /dev/null +++ b/doc/src/sgml/images/mvcc-raw.svg @@ -0,0 +1,145 @@ + + + + MVCC + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + T + 123 + : INSERT + + + + + 123 + 0 + 'x' + + + + + T + 135 + : UPDATE + + + + + 135 + 0 + 'y' + + 123 + 135 + 'x' + + + + + T + 142 + : UPDATE + + + + + 142 + 0 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + + T + 821 + : DELTE + + + + + 142 + 821 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + + + Legend + + xmin + xmax + data + + + + + diff --git a/doc/src/sgml/images/ram-proc-file-ink-svgo.svg b/doc/src/sgml/images/ram-proc-file-ink-svgo.svg new file mode 100644 index 0000000000..aa0445352d --- /dev/null +++ b/doc/src/sgml/images/ram-proc-file-ink-svgo.svg @@ -0,0 +1,285 @@ + + + PG Overall Server Architecture + + + + + + UML Note (200 x 20 px) + + + + + + UML Note (250 x 20 px) + + + + + + UML Note (100 x 35 px) + + + + + + UML Note (170 x 50 px) + + + + + + UML State (300x120) + + + + + + UML State (350x120) + + + + + + Disc + + + + + + + + + Laptop + + + + + + + + + + + + + + + + + + + Client + + + Server + + + + + + maintenance_work_mem (per connection) + + + work_mem (per query operation) + + + autovacuum_work_mem (per worker process) + + + temp_buffer (per connection) + + + ... + + + + Individual Memory + + + + + + shared_buffers (heap and index) + + + wal_buffers (WAL records) + + + ... + + + + Shared Memory (per Instance) + + + + + Postmaster + + + + + + 1 + + + + + Backend processes (one per connection) + + + + + + + + 3 + + + + + + + Creates backend processes + + + + + + 2 + + + + + + + + WAL Writer + + + + + + Checkpointer + + + + + + + + Checkpoint + + + Record + + + + + Background Writer + + + + + + WAL Archiver + + + + + + Autovacuum + + + + + + Logger + + + + Stats Collector + + + + + Log + + + text lines, + + + sequential + + + + + + + Heap and + + + Index + + + binary blocks, + + + random + + + + + + + Read heap and index + + + pages and transfer + + + them to shared_buffers + + + + + + WAL + + + binary records, + + + sequential + + + + + + Archived + + + WAL + + + + + + + Via TCP/IP or socket + + + + RAM + + + PROCESSES + + + FILES + + diff --git a/doc/src/sgml/images/ram-proc-file-ink.svg b/doc/src/sgml/images/ram-proc-file-ink.svg new file mode 100644 index 0000000000..9ef89639c4 --- /dev/null +++ b/doc/src/sgml/images/ram-proc-file-ink.svg @@ -0,0 +1,841 @@ + + + + + + image/svg+xml + + PG Overall Server Architecture + + + + + PG Overall Server Architecture + + + + + UML Note (200 x 20 px) + + + + UML Note (250 x 20 px) + + + + UML Note (100 x 35 px) + + + + UML Note (170 x 50 px) + + + + + UML State (300x120) + + + + UML State (350x120) + + + + + Disc + + + + + + + + + + + + Laptop + + + + + + + + + + + + + + + + + + + + + + + Client + Server + + + + + maintenance_work_mem (per connection) + work_mem (per query operation) + autovacuum_work_mem (per worker process) + temp_buffer (per connection) + ... + + Individual Memory + + + + + shared_buffers (heap and index) + wal_buffers (WAL records) + ... + + Shared Memory (per Instance) + + + + + Postmaster + + + + + 1 + + + + + Backend processes (one per connection) + + + + + + + 3 + + + + + + Creates backend processes + + + + 2 + + + + + + + + + + WAL Writer + + + + + + + Checkpointer + + + + + + + Checkpoint + Record + + + + + Background Writer + + + + + + + WAL Archiver + + + + + + + Autovacuum + + + + + + + Logger + + + + + Stats Collector + + + + + + + Log + text lines, + sequential + + + + + Heap and + Index + binary blocks, + random + + + + + Read heap and index + pages and transfer + them to shared_buffers + + + + WAL + binary records, + sequential + + + + Archived + WAL + + + + + + Via TCP/IP or socket + + + + RAM + PROCESSES + FILES + + diff --git a/doc/src/sgml/images/ram-proc-file-raw.svg b/doc/src/sgml/images/ram-proc-file-raw.svg new file mode 100644 index 0000000000..775ba91571 --- /dev/null +++ b/doc/src/sgml/images/ram-proc-file-raw.svg @@ -0,0 +1,301 @@ + + + + PG Overall Server Architecture + + + + + + + + UML Note (200 x 20 px) + + + + UML Note (250 x 20 px) + + + + UML Note (100 x 35 px) + + + + UML Note (170 x 50 px) + + + + + + UML State (300x120) + + + + UML State (350x120) + + + + + + Disc + + + + + + + + + Laptop + + + + + + + + + + + + + + + + + + + + + + + + + + + + Client + Server + + + + + + + maintenance_work_mem (per connection) + work_mem (per query operation) + autovacuum_work_mem (per worker process) + temp_buffer (per connection) + ... + + Individual Memory + + + + + + shared_buffers (heap and index) + wal_buffers (WAL records) + ... + + Shared Memory (per Instance) + + + + + + Postmaster + + + + + 1 + + + + + + Backend processes (one per connection) + + + + + + + + 3 + + + + + + + Creates backend processes + + + + 2 + + + + + + + + + + + + WAL Writer + + + + + + + + Checkpointer + + + + + + + Checkpoint + Record + + + + + + Background Writer + + + + + + + + WAL Archiver + + + + + + + + Autovacuum + + + + + + + + Logger + + + + + + Stats Collector + + + + + + + + Log + text lines, + sequential + + + + + + Heap and + Index + binary blocks, + random + + + + + + Read heap and index + pages and transfer + them to shared_buffers + + + + + WAL + binary records, + sequential + + + + + Archived + WAL + + + + + + + Via TCP/IP or socket + + + + + RAM + PROCESSES + FILES + + + diff --git a/doc/src/sgml/images/wraparound-ink-svgo.svg b/doc/src/sgml/images/wraparound-ink-svgo.svg new file mode 100644 index 0000000000..d5af6ce0b8 --- /dev/null +++ b/doc/src/sgml/images/wraparound-ink-svgo.svg @@ -0,0 +1,40 @@ + + + Cyclic usage of XIDs + + + + + + + + + + Cyclic usage of XIDs modulo 2 ^ 32 + + + + + | (0) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | (1) | (2) | (3) | (4) + + + + + 0: 0 .. 2 ^ 32 - 1 + + + 1: oldest active xid (pg_stat_activity.backend_xmin) + + + 2: xmin of one row version + + + 3: xmax of the same row version + + + 4: youngest xid (txid_current) + + + diff --git a/doc/src/sgml/images/wraparound-ink.svg b/doc/src/sgml/images/wraparound-ink.svg new file mode 100644 index 0000000000..51d35a9f85 --- /dev/null +++ b/doc/src/sgml/images/wraparound-ink.svg @@ -0,0 +1,198 @@ + + + + + + image/svg+xml + + Cyclic usage of XIDs + + + + + Cyclic usage of XIDs + + + + + + + + + + + + Cyclic usage of XIDs modulo 2 + ^ +32 + + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + | + (1) + | + (2) + | + (3) + | + (4) + + + + 0: 0 .. 2 ^ +32 + - 1 + 1: oldest active + xid (pg_stat_activity.backend_xmin) + 2: xmin of one row version + 3: xmax of the same row version + 4: youngest xid (txid_current) + + diff --git a/doc/src/sgml/images/wraparound-raw.svg b/doc/src/sgml/images/wraparound-raw.svg new file mode 100644 index 0000000000..637e68bb82 --- /dev/null +++ b/doc/src/sgml/images/wraparound-raw.svg @@ -0,0 +1,79 @@ + + + + Cyclic usage of XIDs + + + + + + + + + + + + + + + + Cyclic usage of XIDs modulo 2 + ^ 32 + + + + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + | + (1) + | + (2) + | + (3) + | + (4) + + + + + 0: 0 .. 2 ^ 32 - 1 + 1: oldest active xid (pg_stat_activity.backend_xmin) + 2: xmin of one row version + 3: xmax of the same row version + 4: youngest xid (txid_current) + + + diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index 730d5fdc34..e9e9f9495f 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -248,6 +248,7 @@ break is not needed in a wider output rendering. + &architecture; &arch-dev; &catalogs; &protocol; diff --git a/doc/src/sgml/start.sgml b/doc/src/sgml/start.sgml index 9bb5c1a6d5..abb61445f2 100644 --- a/doc/src/sgml/start.sgml +++ b/doc/src/sgml/start.sgml @@ -53,7 +53,7 @@ - Architectural Fundamentals + Client/Server Model Before we proceed, you should understand the basic @@ -68,34 +68,52 @@ client/server model. A PostgreSQL session consists of the following cooperating processes (programs): + - - - - A server process, which manages the database files, accepts - connections to the database from client applications, and - performs database actions on behalf of the clients. The - database server program is called - postgres. - postgres - - + + + + A process at the server site with the name + Postmaster. + postgres + postmaster + It accepts connection requests from client applications, starts + (forks) a new + Backend process for each of them, and passes + the connection to it. From that point on, the client and the new + Backend process communicate directly without intervention by the original + Postmaster process. Thus, the Postmaster process is always running, + waiting for new client connections, whereas clients and associated + Backend processes come and go. (All of this is of course invisible + to the user. We only mention it here for completeness.) + + - - - The user's client (frontend) application that wants to perform - database operations. Client applications can be very diverse - in nature: a client could be a text-oriented tool, a graphical - application, a web server that accesses the database to - display web pages, or a specialized database maintenance tool. - Some client applications are supplied with the - PostgreSQL distribution; most are - developed by users. - - + + + A group of processes at the server site, the Instance, to which also + the Postmaster process belongs. Their duties are handling of + central, common database activities like file access, transaction + handling, vacuum, checkpoints, replication, and more. The mentioned + Backend processes delegate those actions to the instance. + + - - + + + The user's client (frontend) application that wants to perform + database operations. Client applications can be very diverse + in nature: a client could be a text-oriented tool, a graphical + application, a web server that accesses the database to + display web pages, or a specialized database maintenance tool. + Some client applications are supplied with the + PostgreSQL distribution; most are + developed by users. + + + + As is typical of client/server applications, the client and the @@ -106,18 +124,6 @@ file name) on the database server machine. - - The PostgreSQL server can handle - multiple concurrent connections from clients. To achieve this it - starts (forks) a new process for each connection. - From that point on, the client and the new server process - communicate without intervention by the original - postgres process. Thus, the - supervisor server process is always running, waiting for - client connections, whereas client and associated server processes - come and go. (All of this is of course invisible to the user. We - only mention it here for completeness.) - --------------1BB5DBAD17FB692E25728BB2 Content-Type: text/x-patch; charset=UTF-8; name="0013-architecture.sgml.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0013-architecture.sgml.diff" diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml index b7589f9a4f..4bbd6abb8a 100644 --- a/doc/src/sgml/architecture.sgml +++ b/doc/src/sgml/architecture.sgml @@ -6,8 +6,11 @@ Every DBMS implements basic strategies to ensure a fast and robust system. This chapter provides an overview of the - techniques PostgreSQL uses to - achieve this. + basic techniques PostgreSQL uses to + achieve this aim. It does not offer anything which exceeds + the information content of other pages. Instead, it tries to + explain why certain implementation + decisions have been taken. @@ -28,10 +31,11 @@ - The first step when an Instance starts is the start of the + All aspects of an Instance are launched and managed using a single primary + process termed the Postmaster. - It loads the configuration files, allocates Shared Memory, and - starts the other processes of the Instance: + It loads configuration files, allocates Shared Memory, and + starts the other collaborating processes of the Instance: Background Writer, Checkpointer, WAL Writer, @@ -39,9 +43,10 @@ Autovacuum, Statistics Collector, Logger, and more. - Later, the Postmaster starts + Later, the Postmaster listens on its configured system port and in response + to client connection attempts launches Backend processes - which communicate with clients and handle their requests. + to which it delegates authentication, communication, and the handling of their requests. visualizes the processes of an Instance and the main aspects of their collaboration. @@ -62,14 +67,6 @@ - - When a client application tries to connect to a - database, - this request is handled initially by the Postmaster. It - starts a new Backend process, which handles all further - client's requests. - - Client requests like SELECT or UPDATE usually lead to the @@ -77,16 +74,16 @@ by the client's backend process. Reads involve a page-level cache, located in Shared Memory (for details see: ) for the benefit of all processes - in the instance. Writes also use this cache, in addition + in the Instance. Writes also use this cache, in addition to a journal, called the write-ahead-log or WAL. Shared Memory is limited in size and it can become necessary to evict pages. As long as the content of such pages hasn't - changed, this is not a problem. But in Shared Memory also - write actions take place. Modified pages are called dirty - pages or dirty buffers and before they can be evicted they + changed, this is not a problem. But writes directly modify + the pages in Shared Memory. Modified pages are called dirty + pages (or dirty buffers) and before they can be evicted they must be written to disk. This happens regularly by the Checkpointer and the Background Writer processes to ensure that the disk version of the pages are up-to-date. @@ -98,7 +95,7 @@ WAL record is created from the delta-information (difference between the old and the new content) and stored in another area of - Shared Memory. The parallel running WAL Writer process + Shared Memory. The concurrently running WAL Writer process reads them and appends them to the end of the current WAL file. Such sequential writes are faster than writes to random @@ -108,8 +105,8 @@ - Second, the transfer of dirty buffers from Shared Memory to - files must take place. This is the primary task of the + Second, the Instance transfers dirty buffers from Shared Memory to + files. This is the primary task of the Background Writer process. Because I/O activities can block other processes, it starts periodically and acts only for a short period. Doing so, its extensive (and @@ -123,14 +120,8 @@ Checkpoints. A Checkpoint is a point in time when all older dirty buffers, all older WAL records, and finally a special Checkpoint record - are written and flushed to disk. Heap and index files, - and WAL files are now in sync. - Older WAL is no longer required. In other words, - a possibly occurring recovery, which integrates the delta - information of WAL into heap and index files, will happen - by replaying only WAL past the last-recorded checkpoint. - This limits the amount of WAL to be replayed - during recovery in the event of a crash. + are written and flushed to disk. + Older WAL files are no longer required to recover the system from a crash. @@ -141,8 +132,10 @@ less common). Options and details are covered in the backup and restore section (). For our purposes here, just note that the WAL Archiver process - can be enabled and configured to run a script on filled WAL - files — usually to copy them to a remote location. + can be enabled and configured to run a script on completed WAL + files — usually to copy them to a remote location. Note + that when a Checkpoint record is written to the WAL the current + file is immediately completed. @@ -163,23 +156,35 @@ The logical Perspective: Cluster, Database, Schema - A server contains one or more - database clusters - (clusters - for short). Each cluster contains three or more - databases. - Each database can contain many - schemas. - A schema can contain - tables, + A Server contains one or more + Database Clusters + (Clusters + for short). By default each newly initialized Cluster contains three + databases + (one interactive and two templates, see ). + Each database can contain many user-writable + schemas + (public, by name and permissiveness, by default), the system + generated user-facing schemas pg_catalog, + pg_temp, and information_schema, + and some more system schemas. + Tables, views, and a lot - of other objects. Each table or view belongs to a single schema - only; they cannot belong to another schema as well. The same is - true for the schema/database and database/cluster relation. + of other objects uniquely reside in a single schema. visualizes this hierarchy. + + + Client connections act at the database level and can access + its schemas simultaneously. Special techniques like + foreing data wrapper + or dblink are required + to access multiple databases, even within the same Cluster, + from a single client connection. + +
Cluster, Database, Schema @@ -197,31 +202,30 @@
- A cluster is the outer container for a + A Cluster is the outer container for a collection of databases. Clusters are created by the command . template0 is the very first - database of any cluster. Database template0 - is created during the initialization phase of the cluster. + database of any Cluster. It + is created during the initialization phase of the Cluster. In a second step, database template1 is generated as a copy of template0, and finally database postgres is generated as a copy of template1. Any new databases - of the cluster that a user might need, + of the Cluster that a user might need, such as my_db, will be copied from the template1 database. Due to the unique role of template0 as the pristine original - of all other databases, no client can connect to it. + of all other databases, no client is allowed to connect to it. - Every database must contain at least one schema because all SQL Objects - must be contained in a schema. + are contained in a schema. Schemas are namespaces for SQL objects and ensure (with one exception) that the SQL object names are used only once within their scope across all types of SQL objects. E.g., it is not possible @@ -243,26 +247,46 @@ without using an explicit schema name. public should not contain user-defined SQL objects. Instead, it is recommended to create a separate schema that holds individual - objects like application-specific tables or views. + objects like application-specific tables or views. To access + objects in such a schema they can be fully qualified, e.g. + my_schema.my_table, or by changing the + schema search path. + + + pg_catalog is a schema for all tables and views of the System Catalog. - information_schema is a schema for several - tables and views of the System Catalog in a way that conforms - to the SQL standard. + information_schema is a similar schema. It + contains several tables and views of the System Catalog in a + way that conforms to the SQL standard. - There are many different SQL object - types: database, schema, table, view, materialized - view, index, constraint, sequence, function, procedure, - trigger, role, data type, operator, tablespace, extension, - foreign data wrapper, and more. A few of them, the + There are many different SQL object types: + database, + schema, + table, + view, + materialized view, + index, + constraint, + sequence, + function, + procedure, + trigger, + role, + data type, + operator, + tablespace, + extension, + foreign data wrapper, + and more. A few of them, the Global SQL Objects, are outside of the strict hierarchy: All database names, all tablespace names, and all role names are automatically - available throughout the cluster, independent from - the database or schema in which they were defined originally. + available throughout the Cluster, not just the database in which + the SQL command was executed. shows the relation between the object types. @@ -286,7 +310,7 @@
- The physical Perspective: Directories and Files + The Physical Perspective: Directories and Files PostgreSQL organizes long-lasting (persistent) @@ -297,7 +321,7 @@ variable PGDATA points to this directory. The example shown in uses - data as the name of this root directory. + data as the name of the cluster's root directory.
@@ -317,27 +341,29 @@
- data contains many subdirectories and + The cluster's root directory contains many subdirectories and some files, all of which are necessary to store long-lasting - as well as temporary data. The following paragraphs - describe the files and subdirectories in - data. + as well as temporary data. The root's name can be selected + as desired, but the names of its subdirectories and files + are more or less fix and detertermined by + PostgreSQL. The following + paragraphs describe the most important subdirectories + and files. - base is a subdirectory in which one - subdirectory per database exists. The names of those + base contains one + subdirectory per database. The names of those subdirectories consist of numbers. These are the internal Object Identifiers (OID), which are numbers to identify - the database definition in the + their definition in the System Catalog. - Within the database-specific - subdirectories, there are many files: one or more for - every table and every index to store heap and index - data. Those files are accompanied by files for the + Within the database-specific subdirectories of base + there are many files: one or more for every table + and every index. Those files are accompanied by files for the Free Space Maps (suffixed _fsm) and Visibility Maps @@ -345,20 +371,25 @@ - Another subdirectory is global which + Another subdirectory is global. It contains files with information about Global SQL Objects. - One type of such Global SQL Objects are - tablespaces. - In global there is information about - the tablespaces; not the tablespaces themselves. + + + + In pg_tblspc, there are symbolic links + that point to directories that are outside of the root + directory tree, e.g. at a different disc. Files for tables + and indexes of non-default tablespaces reside there. As + previously mentioned, those defined within the default + tablespace reside in the database-specific subdirectories. The subdirectory pg_wal contains the WAL files. They arise and grow in parallel with data changes in the - cluster and remain as long as + Cluster and remain as long as they are required for recovery, archiving, or replication. @@ -370,19 +401,14 @@ - In pg_tblspc, there are symbolic links - that point to directories containing SQL objects - that exist within a non-default tablespace. - - - - In the root directory data + In the root directory there are also some files. In many cases, the configuration - files of the cluster are stored here. If the - instance is up and running, the file + files of the Cluster are stored here. If the + Instance is up and running, the file postmaster.pid exists here + (by default) and contains the process ID (pid) of the - Postmaster which started the instance. + Postmaster which started the Instance. @@ -413,8 +439,8 @@ PostgreSQL implements a sophisticated technique which avoids locking: Multiversion Concurrency Control (MVCC). - The advantage of MVCC - over technologies that use row locks becomes evident in multiuser OLTP + The advantage of MVCC over technologies that use row locks + becomes evident in multiuser Online Transaction Processing (OLTP) environments with a massive number of concurrent write actions. There, MVCC generally performs better than solutions using locks. In a PostgreSQL @@ -439,15 +465,15 @@ - When we speak about transaction IDs, you need to know that xids are like - sequences. Every new transaction receives the next number as its ID. - Therefore, this flow of xids represents the flow of transaction - start events over time. But keep in mind that xids are independent of - any time measurement — in milliseconds or otherwise. If you dive - deeper into PostgreSQL, you will recognize - parameters with names such as 'xxx_age'. Despite their names, - these '_age' parameters do not specify a period of time but represent - a certain number of transactions, e.g., 100 million. + Xids are sequences (with a reserved value to handle wrap-around + in pre-9.4 PostgreSQL versions). + Age computations involving them measure a transaction + count as opposed to a time interval (in milliseconds or otherwise). + If you dive deeper into PostgreSQL, + you will recognize parameters with names such as 'xxx_age'. + Despite their names, these '_age' parameters do not specify + a period of time but represent a certain number of transactions, + e.g., 100 million. @@ -498,7 +524,7 @@ executes an UPDATE of this row by changing the user data from 'x' to 'y'. According to the MVCC principles, - the data in the old version of the row is not changed! + the data in the old version of the row is not changed. The value 'x' remains as it was before. Only xmax changes to 135. Now, this version is treated as valid exclusively for @@ -526,7 +552,7 @@ Finally, a row may be deleted by a DELETE command. Even in this case, all versions of the row remain as - before. Nothing is thrown away! Only xmax + before; nothing is thrown away. Only xmax of the last version is set to the xid of the DELETE transaction, which indicates that (if committed) it is only visible to transactions with xids older than that (from @@ -545,7 +571,8 @@ Over time, also the older ones get out of scope for ALL transactions and therefore become unnecessary. Nevertheless, they do exist physically on the disk and occupy - space. + space. They are called dead rows and are part + of the bloat. @@ -560,12 +587,7 @@ row versions are valid (visible) for transactions. This range doesn't imply any direct temporal meaning; the sequence of xids reflects only the sequence of - transaction begin events. As - xids grow, old row versions get out of scope over time. - If an old row version is no longer relevant for ANY existing - transactions, it can be marked dead. The - space occupied by dead row versions is part of the - bloat. + transaction begin events. @@ -581,9 +603,9 @@ Nothing is removed — with the consequence that the database occupies more and more disk space. It is obvious that - this behavior has to be corrected in some - way. The next chapter explains how autovacuum - fulfills this task. + this behavior has to be corrected in some way. The next + chapter explains how vacuum and + autovacuum fulfill this task. @@ -601,12 +623,12 @@ This chapter explains how the SQL command VACUUM and the automatically running Autovacuum processes clean up - and avoid continued growth. + and prevent continued growth. - Autovacuum runs automatically by + Autovacuum runs automatically, by default. Its default parameters as well as those for VACUUM are appropriate for most standard situations. Therefore a novice database manager can @@ -617,11 +639,11 @@ Client processes can issue the SQL command VACUUM - at arbitrary points in time. DBAs do this when they recognize + at any time. DBAs do this when they recognize special situations, or they start it in batch jobs which run - periodically. Autovacuum processes run as part of the - Instance at the server. - There is a constantly running Autovacuum daemon. It continuously + periodically. Additionally, there is a constantly running + Autovacuum daemon which is part of the + Instance. It continuously monitors the state of all databases based on values that are collected by the Statistics Collector and starts Autovacuum processes whenever it detects @@ -1234,7 +1256,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; Lastly, it is worth noticing that changes done by a committed transaction will survive all failures in the application or - database cluster. The next chapter explains the + the Database Cluster. The next chapter explains the durability guarantees. @@ -1276,7 +1298,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; Instance failure - The instance may suddenly fail because of power off + The Instance may suddenly fail because of power off or other problems. This will affect all running processes, the RAM, and possibly the consistency of disk files. @@ -1284,7 +1306,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; After a restart, PostgreSQL automatically recognizes that the last shutdown of the - instance did not happen as expected: files might not be + Instance did not happen as expected: files might not be closed properly and the postmaster.pid file unexpectedly exists. PostgreSQL tries to clean up the situation. This is possible because @@ -1330,7 +1352,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; They obviously need a backup. How to take such a backup and use it as a starting point for a recovery of the - cluster is explained in more detail in the next + Cluster is explained in more detail in the next chapter. @@ -1396,7 +1418,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; You can use any appropriate OS tool to create a copy - of the cluster's directory structure and files. In + of the Cluster's directory structure and files. In case of severe problems such a copy can serve as the source of recovery. But in order to get a USABLE backup by this method, @@ -1416,7 +1438,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; The tool pg_dump is able to take a copy - of the complete cluster or certain parts of it. It stores + of the complete Cluster or certain parts of it. It stores the copy in the form of SQL commands like CREATE and COPY. It runs in parallel with other processes, in its own transaction. @@ -1430,9 +1452,9 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; The main advantage over the other two methods is that it - can pick parts of the cluster, e.g., a single table or one + can pick parts of the Cluster, e.g., a single table or one database. The other two methods work only at the level of - the complete cluster. + the complete Cluster. Continuous archiving based on pg_basebackup and WAL files @@ -1447,7 +1469,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; basebackup with the tool pg_basebackup. The result is a directory structure plus files which contain a - consistent copy of the original cluster. + consistent copy of the original Cluster. pg_basebackup runs in parallel with other processes in its own transaction. @@ -1456,9 +1478,9 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; The second step is recommended but not necessary. All changes to the data are stored in WAL files. If you continuously save such WAL files, you have the history - of the cluster. This history can be applied to a + of the Cluster. This history can be applied to a basebackup in order to recreate - any state of the cluster between the time of + any state of the Cluster between the time of pg_basebackup's start time and any later point in time. This technique is called 'Point-in-Time Recovery (PITR)'. @@ -1478,7 +1500,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; - If it becomes necessary to restore the cluster, you have to + If it becomes necessary to restore the Cluster, you have to copy the basebackup and the archived WAL files to their original directories. The configuration of this recovery procedure --------------1BB5DBAD17FB692E25728BB2 Content-Type: text/x-patch; charset=UTF-8; name="0013-images.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0013-images.diff" diff --git a/doc/src/sgml/images/cluster-db-schema-raw.svg b/doc/src/sgml/images/cluster-db-schema-raw.svg index 022bc45560..8534b18ec2 100644 --- a/doc/src/sgml/images/cluster-db-schema-raw.svg +++ b/doc/src/sgml/images/cluster-db-schema-raw.svg @@ -36,36 +36,49 @@ + + + + + + + + schema 'public' - tables, views, ... + tables, views, ... - (more system schemas) + System schemas - + schema 'public' - tables, views, ... + tables, views, ... - - 'my_schema' (optional) - tables, views, ... + + System schemas - - (more system schemas) + + 'my_schema' (optional) + tables, views, ... - + UML Note - + + + + + UML Note + @@ -109,23 +122,28 @@ database 'template0' + + 'template0' is a pristine + and unchangeable template. + It is used to generate + database 'template1'. - + database 'template1' - + database 'postgres' - + database 'my_db' (optional) @@ -133,7 +151,6 @@ Global SQL objects - @@ -141,32 +158,33 @@ - + 1) - By default, you work in the cluster 'data', database 'postgres', - schema 'public'. - - 2) - More system schemas: pg_catalog, information_schema, - pg_temp, pg_toast. - - 3) - Global SQL objects: Some SQL objects are automatically active - and known database- or even cluster-wide. - - 4) - The command 'initdb' creates a new cluster with the three - databases 'template0', 'template1', and 'postgres'. The command - 'createdb' creates a new database. - - 5) - If multiple clusters are active on one server at the same time, - each one is managed by an individual instance. Each such instance - uses a different port. - - 6) - No client application is allowed to connect to 'template0'. + By default, clients connect via port '5432' to Cluster 'data', + database 'postgres', schema 'public'. Of course, they can + specify other connection properties. + + 2) + System schemas: pg_catalog, information_schema, + pg_temp, ... + + 3) + Global SQL objects: Objects which are shared among all + databases within a Cluster. + + 4) + The command 'initdb' creates a new cluster with the three + databases 'template0', 'template1', and 'postgres'. The command + 'createdb' creates a new database. + + 5) + If multiple clusters are active on one server at the same time, + each one is managed by an individual instance. Each such instance + uses a different port. + + 6) + Client application are prohibited from connecting to 'template0'. diff --git a/doc/src/sgml/images/directories-raw.svg b/doc/src/sgml/images/directories-raw.svg index 757342a554..5cce551b0c 100644 --- a/doc/src/sgml/images/directories-raw.svg +++ b/doc/src/sgml/images/directories-raw.svg @@ -14,6 +14,12 @@ font-family:"Open Sans", sans-serif; fill:black; } + .text_italic {font-style:italic; + font-weight:normal; + font-size:14px; + font-family:"Open Sans", sans-serif; + fill:silver; + } .text_big {font-style:normal; font-weight:normal; font-size:24px; @@ -54,14 +60,14 @@ - ... /pg/ + /var/lib/pgsql/ An arbitrary directory - data/ - Root of cluster 'data' (see: PGDATA) + data/ + Cluster's root directory (with an arbitrary name; see: PGDATA) @@ -74,46 +80,46 @@ 1/ - Subdirectory for data of first database 'template0' + Subdirectory for data in default-tablespace of first database 'template0' 12992/ - Subdirectory for data of second database 'template1' + Subdirectory for data in default-tablespace of second database 'template1' 12999/ - Subdirectory for data of third database 'postgres' + Subdirectory for data in default-tablespace of third database 'postgres' nnnnn/ - Optional: more subdirectories for databases, e.g. 'my_db' + Optional: one subdirectory for each new database, e.g. '23456' for 'my_db' global/ - Subdirectory with information about Global SQL Objects + Subdirectory for information about Global SQL Objects - pg_wal/ - Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + pg_tblspc/ + Subdirectory for symbolic links to non-default tablespaces - pg_xact/ - Subdirectory for transaction commit status ('pg_clog' before version 10) + pg_wal/ + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) - pg_tblspc/ - Subdirectory containing symbolic links to tablespaces + pg_xact/ + Subdirectory for transaction commit status ('pg_clog' before version 10) @@ -131,14 +137,14 @@ - ... /xyz/ + ... Same or another arbitrary directory - cluster_2/ - Root of another cluster 'cluster_2' + cluster_2/ + Root of another cluster diff --git a/doc/src/sgml/images/internal-objects-hierarchy-raw.svg b/doc/src/sgml/images/internal-objects-hierarchy-raw.svg index a58f334fe3..78850283d9 100644 --- a/doc/src/sgml/images/internal-objects-hierarchy-raw.svg +++ b/doc/src/sgml/images/internal-objects-hierarchy-raw.svg @@ -26,7 +26,7 @@ - Hierarchy of internal Objects + Hierarchy of Internal Objects --------------1BB5DBAD17FB692E25728BB2--