From: Erik Rijkers Date: Fri, 30 Oct 2020 17:45:00 +0100 Subject: [PATCH 1/3] Additional Chapter for Tutorial --- doc/src/sgml/architecture.sgml | 142 +++++++++++++++------------------ 1 file changed, 66 insertions(+), 76 deletions(-) diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml index e547a87d08..ffdac61975 100644 --- a/doc/src/sgml/architecture.sgml +++ b/doc/src/sgml/architecture.sgml @@ -19,19 +19,18 @@ In the case of PostgreSQL, the server launches a single process for each client connection, referred to as a Backend process. - Those Backend processes handle the client's requests by acting on the + Such a Backend process handles the client's requests by acting on the Shared Memory. This leads to other activities (file access, WAL, vacuum, ...) of the Instance. The Instance is a group of server-side processes acting on a common - Shared Memory. Notably, PostgreSQL does not utilize application - threading within its implementation. + Shared Memory. PostgreSQL does not utilize threading. - The first step in an Instance start is the start of the + The first step when an Instance starts is the start of the Postmaster. - He loads the configuration files, allocates Shared Memory, and + It loads the configuration files, allocates Shared Memory, and starts the other processes of the Instance: Background Writer, Checkpointer, @@ -66,32 +65,32 @@ When a client application tries to connect to a database, - this request is handled initially by the Postmaster. He + this request is handled initially by the Postmaster. It starts a new Backend process and instructs the client application to connect to it. All further client requests - go to this process and are handled by it. + are handled by this process. Client requests like SELECT or UPDATE usually lead to the - necessity to read or write some data. This is carried out + necessity to read or write data. This is carried out by the client's backend process. Reads involve a page-level - cache housed in Shared Memory (for details see: + cache, located in Shared Memory (for details see: ) for the benefit of all processes - in the instance. Writes also involve this cache, in additional + in the instance. Writes also use this cache, in addition to a journal, called a write-ahead-log or WAL. - Shared Memory is limited in size. Thus, it becomes necessary + Shared Memory is limited in size and it can become necessary to evict pages. As long as the content of such pages hasn't changed, this is not a problem. But in Shared Memory also write actions take place. Modified pages are called dirty pages or dirty buffers and before they can be evicted they - must be written back to disk. This happens regularly by the + must be written to disk. This happens regularly by the Background Writer and the Checkpointer process to ensure - that the disk version of the pages are kept up-to-date. + that the disk version of the pages are up-to-date. The synchronisation from RAM to disk consists of two steps. @@ -109,7 +108,7 @@ Shared Memory. The parallel running WAL Writer process reads them and appends them to the end of the current WAL file. - Such sequential writes are much faster than writes to random + Such sequential writes are faster than writes to random positions of heap and index files. All WAL records created out of one dirty page must be transferred to disk before the dirty page itself can be transferred to disk in the second step. @@ -119,19 +118,19 @@ Second, the transfer of dirty buffers from Shared Memory to files must take place. This is the primary task of the Background Writer process. Because I/O activities can block - other processes significantly, it starts periodically and + other processes, it starts periodically and acts only for a short period. Doing so, its extensive (and expensive) I/O activities are spread over time, avoiding - debilitating I/O peaks. Also, the Checkpointer process - transfers dirty buffers to file. + debilitating I/O peaks. The Checkpointer process + also transfers dirty buffers to file. - The Checkpointer creates + The Checkpointer process creates Checkpoints. A Checkpoint is a point in time when all older dirty buffers, all older WAL records, and finally a special Checkpoint record - have been written and flushed to disk. Heap and index files + are written and flushed to disk. Heap and index files on the one hand and WAL files on the other hand are in sync. Previous WAL is no longer required. In other words, a possibly occurring recovery, which integrates the delta @@ -141,13 +140,13 @@ - While the Checkpointer ensures that a running system can crash + While the Checkpointer ensures that the database system can crash and restart itself in a valid state, the administrator needs to handle the case where the heap and files themselves become corrupted (and possibly the locally written WAL, though that is less common). The options and details are covered extensively in the backup and restore section (). - For our purposes here, note just that the WAL Archiver process + For our purposes here, just note that the WAL Archiver process can be enabled and configured to run a script on filled WAL files — usually to copy them to a remote location. @@ -234,13 +233,13 @@ Every database must contain at least one schema because all SQL Objects - are contained in a schema. - Schemas are namespaces for their SQL objects and ensure - (with one exception) that within their scope names are used - only once across all types of SQL objects. E.g., it is not possible + must be contained in a schema. + Schemas are namespaces for SQL objects and ensure + (with one exception) that the SQL object names are used only once within + their scope across all types of SQL objects. E.g., it is not possible to have a table employee and a view employee within the same schema. But it is - possible to have two tables employee in + possible to have two tables employee in two different schemas. In this case, the two tables are separate objects and independent of each other. The only exception to this cross-type uniqueness is that @@ -273,7 +272,7 @@ Global SQL Objects, are outside of the strict hierarchy: All database names, all tablespace names, and all - role names are automatically known and + role names are automatically available throughout the cluster, independent from the database or schema in which they where defined originally. @@ -302,7 +301,7 @@ The physical Perspective: Directories and Files - PostgreSQL organizes long-lasting + PostgreSQL organizes long-lasting (persistent) data as well as volatile state information about transactions or replication actions in the file system. Every has its root directory @@ -352,20 +351,19 @@ every table and every index to store heap and index data. Those files are accompanied by files for the Free Space Maps - (extension _fsm) and + (suffixed _fsm) and Visibility Maps - (extension _vm), which contain optimization information. + (suffixed _vm), which contain optimization information. - Another subdirectory is global. - In analogy to the database-specific - subdirectories, there are files containing information about + Another subdirectory is global which + contains files with information about Global SQL Objects. One type of such Global SQL Objects are tablespaces. In global there is information about - the tablespaces, not the tablespaces themselves. + the tablespaces; not the tablespaces themselves. @@ -392,11 +390,11 @@ In the root directory data there are also some files. In many cases, the configuration - files of the cluster are stored here. As long as the + files of the cluster are stored here. If the instance is up and running, the file postmaster.pid exists here and contains the process ID (pid) of the - Postmaster which has started the instance. + Postmaster which started the instance. @@ -411,7 +409,7 @@ In most cases, PostgreSQL databases - support many clients at the same time. Therefore, it is necessary to + support many clients at the same time which makes it necessary to protect concurrently running requests from unwanted overwriting of other's data as well as from reading inconsistent data. Imagine an online shop offering the last copy of an article. Two clients have the @@ -432,11 +430,11 @@ PostgreSQL implements a third, more sophisticated technique: Multiversion Concurrency Control (MVCC). The crucial advantage of MVCC - over other technologies gets evident in multiuser OLTP + over other technologies becomes evident in multiuser OLTP environments with a massive number of concurrent write actions. There, MVCC generally performs better than solutions using locks. In a PostgreSQL - database reading never blocks writing and writing never + database, reading never blocks writing and writing never blocks reading, even in the strictest level of transaction isolation. @@ -444,14 +442,14 @@ Instead of locking rows, the MVCC technique creates a new version of the row when a data-change takes place. To - distinguish between these two versions and to track the timeline + distinguish between these two versions, and to track the timeline of the row, each of the versions contains, in addition to their user-defined columns, two special system columns, which are not visible for the usual SELECT * FROM ... command. The column xmin contains the transaction ID (xid) - of the transaction, which created this version of the row. Accordingly, - xmax contains the xid of the transaction, which has - deleted this version, or zero, if the version is not + of the transaction which created this version of the row. + xmax contains the xid of the transaction which has + deleted this version, or zero if the version is not deleted. You can read both with the command SELECT xmin, xmax, * FROM ... . @@ -469,7 +467,7 @@ - The description in this chapter simplifies by omitting some details. + The description in this chapter simplifies by omitting details. When many transactions are running simultaneously, things can get complicated. Sometimes transactions get aborted via ROLLBACK immediately or after a lot of other activities, sometimes @@ -526,8 +524,8 @@ creates a new version of the row with its xid in xmin, 0 in xmax, and 'y' in the - user data (plus all the other user data from the old version). - This version is now valid for all coming transactions. + user data (plus all other user data from the old version). + This version is now valid for all future transactions. @@ -624,9 +622,9 @@ Autovacuum runs automatically by default. Its default parameters as well as such for - VACUUM fit well for most standard + VACUUM are appropriate for most standard situations. Therefore a novice database manager can - easily skip the rest of this chapter which explains + skip the rest of this chapter which explains a lot of details. @@ -687,7 +685,7 @@ The eagerness — you can call it 'aggression' — of the - operations eliminating bloat and + operations for eliminating bloat and freeze is controlled by configuration parameters, runtime flags, and in extreme situations by the processes themselves. Because vacuum operations typically are I/O @@ -783,7 +781,7 @@ When a client issues the SQL command VACUUM with the option FULL. Also, in this mode, the bloat disappears, but the strategy used - is very different: In this case, the complete table is copied + is very different: in this case, the complete table is copied to a different file skipping all outdated row versions. This leads to a significant reduction of used disk space because the new file contains only the actual data. The old file @@ -1143,7 +1141,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; atomicity: either all or none of its operations succeed, regardless of the fact that it may consist of a lot of different write-operations, and each such operation may - affect thousands or millions of rows. As soon as one of the + affect many rows. As soon as one of the operations fails, all previous operations fail also, which means that all modified rows retain their values as of the beginning of the transaction. @@ -1157,14 +1155,14 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; — even in the lowest isolation level of transactions. PostgreSQL - does never show uncommitted changes to other connections. + never shows uncommitted changes to other connections. The situation regarding visibility is somewhat different from the point of view of the modifying transaction. - SELECT commands issued inside a - transaction delivers all changes done so far by this + A SELECT command issued inside a + transaction shows all changes done so far by this transaction. @@ -1231,7 +1229,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; Transactions ensure that the consistency - of the complete database always keeps valid. Declarative + of the complete database always remains valid. Declarative rules like primary- or foreign keys, @@ -1241,13 +1239,6 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; are part of the all-or-nothing nature of transactions. - - Also, all self-evident — but possibly not obvious - — low-level demands on the database system are - ensured; e.g. index entries for rows must become - visible at the same moment as the rows themselves. - - There is the additional feature 'isolation level', @@ -1287,7 +1278,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; a severe software error like a null pointer exception. Because PostgreSQL uses a client/server architecture, no direct problem for the - database will occur. In all of this cases, the + database will occur. In all of these cases, the Backend process, which is the client's counterpart at the server-side, may recognize that the network connection is no longer @@ -1310,7 +1301,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; automatically recognizes that the last shutdown of the instance did not happen as expected: files might not be closed properly and the postmaster.pid - file exists. PostgreSQL + file unexpectedly exists. PostgreSQL tries to clean up the situation. This is possible because all changes in the database are stored twice. First, the WAL files contain them as a chronology of @@ -1328,8 +1319,8 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; checkpoint. This checkpoint signals that the database files are in a consistent state, especially that all WAL records up to - this point were successfully stored in heap and index. Starting - here, the recovery process copies the following WAL records + this point were successfully stored in heap and index files. Starting + here, the recovery process copies the remaining WAL records to heap and index. As a result, the files contain all changes and reach a consistent state. Changes of committed transactions are visible; those of uncommited transactions @@ -1344,10 +1335,10 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; Disk crash If a disk crashes, the course of action described previously - cannot work. It is likely that the WAL files and/or the + cannot work: it is likely that the WAL files and/or the data and index files are no longer available. The database administrator must take special actions to - overcome such situations. + prepare for such a situation. He obviously needs a backup. How to take such a backup @@ -1427,7 +1418,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; The obvious disadvantage of this method is that there - is a downtime where no user interaction is possible. + is a downtime. The other two strategies run during regular operating times. @@ -1456,14 +1447,14 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; Continuous archiving based on pg_basebackup and WAL files This method - is the most sophisticated and complex one. It + is the most sophisticated and most complex one. It consists of two phases. - First, you need to create a so called + First, you need to create a so-called basebackup with the tool pg_basebackup. The result is a - directory structure plus files which contains a + directory structure plus files which contain a consistent copy of the original cluster. pg_basebackup runs in parallel to other processes in its own transaction. @@ -1484,7 +1475,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; Archiver process will automatically copy every single WAL file to a save location. Its configuration - consists mainly of a string, which contains a copy command + consists mainly of a string that contains a copy command in the operating system's syntax. In order to protect your data against a disk crash, the destination location of a basebackup as well as of the @@ -1492,9 +1483,8 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; disk which is different from the data disk. - If it gets necessary to restore the cluster, you have to - copy the basebackup and the - archived WAL files to + If it becomes necessary to restore the cluster, you have to + copy the basebackup and the archived WAL files to their original directories. The configuration of this recovery procedure contains a string with the reverse copy command: from -- 2.17.0 --aVD9QWMuhilNxW9f Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0002-Fix-accesses-per-suggestion-from-Erik.patch"