From: Justin Pryzby Date: Sat, 31 Oct 2020 15:44:22 -0500 Subject: [PATCH 3/3] More fixes on top --- doc/src/sgml/architecture.sgml | 239 ++++++++++++++++----------------- 1 file changed, 114 insertions(+), 125 deletions(-) diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml index 6f819220dc..f3acdaa6b3 100644 --- a/doc/src/sgml/architecture.sgml +++ b/doc/src/sgml/architecture.sgml @@ -66,9 +66,7 @@ When a client application tries to connect to a database, this request is handled initially by the Postmaster. It - starts a new Backend process and instructs the client - application to connect to it. All further client requests - are handled by this process. + starts a new Backend process to service the client's requests. @@ -89,7 +87,7 @@ write actions take place. Modified pages are called dirty pages or dirty buffers and before they can be evicted they must be written to disk. This happens regularly by the - Background Writer and the Checkpointer process to ensure + Checkpointer and Background Writer processes to ensure that the disk version of the pages are up-to-date. The synchronisation from RAM to disk consists of two steps. @@ -117,7 +115,7 @@ Second, the transfer of dirty buffers from Shared Memory to files must take place. This is the primary task of the - Background Writer process. Because I/O activities can block + Checkpointer process. Because I/O activities can block other processes, it starts periodically and acts only for a short period. Doing so, its extensive (and expensive) I/O activities are spread over time, avoiding @@ -136,7 +134,9 @@ a possibly occurring recovery, which integrates the delta information of WAL into heap and index files, will happen by replaying only WAL past the last recorded checkpoint - on top of the current heap and files. This speeds up recovery. + on top of the current heap and files. + This limits the amount of WAL which needs to be replayed during recovery in + the event of a crash. @@ -369,8 +369,8 @@ The subdirectory pg_wal contains the WAL files. - They arise and grow parallel to data changes in the - cluster and remain alive as long as + They arise and grow in parallel with data changes in the + cluster and remain as long as they are required for recovery, archiving, or replication. @@ -383,8 +383,8 @@ In pg_tblspc, there are symbolic links - that point to directories containing such SQL objects - that are created within tablespaces. + that point to directories containing SQL objects + that exist within a non-default tablespace. @@ -459,7 +459,7 @@ sequences. Every new transaction receives the next number as its ID. Therefore, this flow of xids represents the flow of transaction start events over time. But keep in mind that xids are independent of - any time measurement — in milliseconds or whatever. If you dive + any time measurement — in milliseconds or otherwise. If you dive deeper into PostgreSQL, you will recognize parameters with names such as 'xxx_age'. Despite their names, these '_age' parameters do not specify a period of time but represent @@ -514,38 +514,36 @@ executes an UPDATE of this row by changing the user data from 'x' to 'y'. According to the MVCC principles, - the data in the old version of the row does not change! - The value 'x' remains as it was before. - Only xmax changes to 135. - Now, this version is treated as valid exclusively for + the old version of the row is not changed! + Internally, an UPDATE command acts + as a DELETE command followed by + an INSERT command. + xmax of the old row version is changed to 135, + and a new row version is added with + xmin=135, + xmax=0, and 'y' in the + user data (plus any other user columns from the old version). + The old row version is visible only to transactions with xids from 123 to - 134. As a substitute for the non-occurring - data change in the old version, the UPDATE - creates a new version of the row with its xid in - xmin, 0 in - xmax, and 'y' in the - user data (plus all other user data from the old version). - This version is now valid for all future transactions. + 134, and the new row version + is visible to all future transactions. All subsequent UPDATE commands behave - in the same way as the first one: they put their xid to + in the same way as the first one: they put their xid in xmax of the current version, create - the next version with their xid in xmin, - 0 in xmax, and the - new user data. + a new version with their xid in xmin and + 0 in xmax. Finally, a row may be deleted by a DELETE command. Even in this case, all versions of the row remain as - before. Nothing is thrown away so far! Only xmax - of the last version changes to the xid of the DELETE - transaction, which indicates that it is only valid for - transactions with xids older than its own (from - 142 to 820 in this - example). + before. Nothing is thrown away! Only xmax + of the last version is set to the xid of the DELETE + transaction, which indicates that (if committed) it is only visible to + transactions with xids older than that. @@ -553,10 +551,10 @@ of the same row in the table's heap file and leaves them there, even after a DELETE command. Only the youngest version is relevant for all future transactions. But the - system must also preserve some of the older ones for a - certain amount of time because the possibility exists that - they are or could become relevant for any pending - transactions. Over time, also the older ones get out of scope + system must also preserve some of the older ones for + awhile, because they could still be needed by + transactions which started before the deleting transaction commits. + Over time, also the older ones get out of scope for ALL transactions and therefore become unnecessary. Nevertheless, they do exist physically on the disk and occupy space. @@ -571,26 +569,18 @@ xmin and xmax indicate the range from where to where - row versions are valid (visible) for transactions. + row versions are valid (visible) for transactions. This range doesn't imply any direct temporal meaning; the sequence of xids reflects only the sequence of transaction begin events. As xids grow, old row versions get out of scope over time. - If an old row version is no longer valid for ALL existing - transactions, it's called dead. The - space occupied by dead row versions is part of the + If an old row version is no longer relevant for ANY existing + transactions, it can be marked dead. The + space occupied by dead row versions is wasted space called bloat. - - - Internally, an UPDATE command acts in the - same way as a DELETE command followed by - an INSERT command. - - - Nothing gets wiped away — with the consequence that the database @@ -610,12 +600,12 @@ As we have seen in the previous chapter, the database - tends to occupy more and more disk space, the + tends to occupy more and more disk space, caused by bloat. This chapter explains how the SQL command VACUUM and the automatically running Autovacuum processes clean up - by eliminating bloat. + and avoid continued growth. @@ -635,8 +625,8 @@ special situations, or they start it in batch jobs which run periodically. Autovacuum processes run as part of the Instance at the server. - There is a constantly running Autovacuum daemon. It permanently - controls the state of all databases based on values that are collected by the + There is a constantly running Autovacuum daemon. It continuously + monitors the state of all databases based on values that are collected by the Statistics Collector and starts Autovacuum processes whenever it detects certain situations. Thus, it's a dynamic behavior of @@ -657,7 +647,7 @@ - Freeze: Mark the youngest row version + Freeze: Mark old row version as frozen. This means that the version is always treated as valid (visible) independent from the wraparound problem (see below). @@ -684,22 +674,22 @@ - The eagerness — you can call it 'aggression' — of the + The eagerness — you can call it 'aggressiveness' — of the operations for eliminating bloat and freeze is controlled by configuration parameters, runtime flags, and in extreme situations by the processes themselves. Because vacuum operations typically are I/O intensive, which can hinder other activities, Autovacuum avoids performing many vacuum operations in bulk. Instead, - it carries out many small actions with time gaps in between. - The SQL command VACUUM runs immediately - and without any time gaps. + it carries out many small actions with delay points in between. + When invoked manually, the SQL command VACUUM + runs immediately and (by default) without any time delay. Eliminate Bloat - To determine which of the row versions are superfluous, the + To determine which of the row versions are no longer needed, the elimination operation must evaluate xmax against several criteria which all must apply: @@ -740,14 +730,13 @@ - After the vacuum operation detects a superfluous row version, it + After the vacuum operation detects an unused row version, it marks its space as free for future use of writing actions. Only in rare situations (or in the case of VACUUM FULL), - this space is released to the operating system. In most cases, + is this space released to the operating system. In most cases, it remains occupied by PostgreSQL and will be used by future INSERT or - UPDATE commands concerning this row or a - completely different one. + UPDATE commands to this table. @@ -761,9 +750,9 @@ in its default format, i.e., without any option. To boost performance, in this and the next case VACUUM does not read and act on all pages of the heap. - The Visibility Map, which is very compact and therefore has a small - size, contains information about pages, where bloat-candidates might - be found. Only such pages are processed. + The Visibility Map, which is very compact and therefore fast to read, + contains information about which pages have no deleted row versions, and + can be skipped by vacuum. @@ -771,7 +760,7 @@ When a client issues the SQL command VACUUM with the option FREEZE. (In this case, - it undertakes much more actions, see + it undertakes many more actions, see Freeze Row Versions.) @@ -780,12 +769,11 @@ When a client issues the SQL command VACUUM with the option FULL. - Also, in this mode, the bloat disappears, but the strategy used - is very different: in this case, the complete table is copied - to a different file skipping all outdated row versions. This - leads to a significant reduction of used disk space because - the new file contains only the actual data. The old file - is deleted. + In this mode, an exclusive lock is taken, and + the whole table is copied to a different file, skipping all outdated row + versions. All bloat is thereby eliminated, which + may lead to a significant reduction of used disk space. + The old file is deleted. @@ -807,17 +795,17 @@ This logic only applies to row versions of the heap. Index entries don't use xmin/xmax. Nevertheless, such index - entries, which would lead to outdated row versions, are released + entries, which would lead to outdated row versions, are cleaned up accordingly. The above descriptions omit the fact that xids on a real computer - have a limited size. They count up in the same way as sequences, and after - a certain number of new transactions they are forced to restart + have a limited size, and after + a certain number of transactions they are forced to restart from the beginning, which is called wraparound. Therefore the terms 'old transaction' / 'young transaction' does - not always correlate with low / high values of xids. Near to the + not always correlate with low / high values of xids. Near the wraparound point, there are cases where xmin has a higher value than xmax, although their meaning is said to be older than xmax. @@ -856,7 +844,7 @@ and the corresponding transactions of xmin and xmax must be committed. However, PostgreSQL has to consider the - possibility of wraparounds. + possibility of wraparound. Therefore the decision becomes more complex. The general idea of the solution is to use the 'between xmin and xmax' @@ -883,7 +871,7 @@ With each newly created transaction the two split-points - move forward. When 'txid_current + 2^31' would reach a + move forward. If 'txid_current + 2^31' reached a row version with xmin equal to that value, it would immediately jump from 'past' to 'future' and would be no longer visible! @@ -892,11 +880,11 @@ - To avoid this unacceptable extinction of data, the vacuum - operation freeze clears the situation - long before the split-point is reached. It sets a flag - in the header of the row version, which completely eliminates - the future use of xmin/xmax and indicates + If not handled in some way, data inserted many transactions ago would become invisibile. + The vacuum operation freeze avoids this + long before the split-point is reached by setting a flag + in the header of the row version which avoids + future comparison of its xmin/xmax and indicates that the version is valid not only in the 'past'-half but also in the 'future'-half as well as in all coming epochs. @@ -943,19 +931,19 @@ When a client issues the SQL command VACUUM with its FREEZE option. In this case, all pages are processed that are marked in the Visibility Map - to potentially have unfrozen rows. + as potentially having unfrozen rows. When a client issues the SQL command VACUUM without - any options but finds that there are xids older than + any options but there are xids older than (default: 150 million) minus (default: 50 million). As before, all pages are processed that are - marked in the Visibility Map to potentially have unfrozen + marked in the Visibility Map as potentially having unfrozen rows. @@ -981,7 +969,7 @@ The process switches to an aggressive mode if it recognizes - that for the processed table their oldest xid exceeds + that for the processed table the oldest xid exceeds (default: 200 million). The value of the oldest unfrozen xid is stored per table in pg_class.relfrozenxid. @@ -1037,22 +1025,21 @@ The Visibility Map (VM) contains two flags — stored as - two bits — for each page of the heap. If the first bit - is set, that indicates that the associated page does not - contain any bloat. If the second one is set, that indicates - that the page contains only frozen rows. + two bits — for each page of the heap. The first bit + indicates that the associated page does not + contain any bloat. The second bit indicates + that the page contains only frozen row versions. Please consider two details. First, in most cases a page - contains many rows, some of them in many versions. + contains many rows or row-versions. However, the flags are associated with the page, - not with a row or a row version. The flags are set + not with an individual row version. The flags are set only under the condition that they are valid for ALL row versions of the page. Second, since there are only two bits per page, the VM is considerably - smaller than the heap. Therefore it is buffered - in RAM in almost all cases. + smaller than the heap. @@ -1068,7 +1055,7 @@ The Free Space Map (FSM) tracks the amount of free space per page. It is organized as a highly condensed b-tree of (rounded) sizes. - As long as VACUUM or Autovacuum change + Whenever VACUUM or Autovacuum changes the free space on any processed page, they log the new values in the FSM in the same way as all other writing processes. @@ -1077,18 +1064,19 @@ Statistics - Statistic information helps the Query Planner to make optimal decisions for the generation of execution plans. This information can be gathered with the SQL commands ANALYZE or VACUUM ANALYZE. - But also Autovacuum processes gather + But Autovacuum processes also gather such information. Depending on the percentage of changed rows - per table , + , + and minimum number of changed rows , the Autovacuum daemon starts Autovacuum processes to collect - statistics per table. This dynamic invocation of analyze - operations allows PostgreSQL to - adopt queries to changing circumstances. + statistics per table. The automatic analysis + allows PostgreSQL to + adapt query execution to changing circumstances. @@ -1149,7 +1137,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; The atomicity also affects the visibility of changes. No - connection running simultaneously to a data modifying + connection running simultaneously with a data modifying transaction will ever see any change before the transaction successfully executes a COMMIT — even in the lowest @@ -1228,9 +1216,9 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; Transactions ensure that the - consistency - of the complete database always remains valid. Declarative - rules like + database always remains + consistent. + Declarative rules like primary- or foreign keys, checks, @@ -1248,11 +1236,12 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; - Lastly, it is worth to notice that changes done by a - committed transaction will survive all future application, - instance, or hardware failures. The next chapter - explains this - durability. + Lastly, it is worth noticing that changes done by a + committed transaction will survive all failures in the application or + database cluster. + The next chapter explains the + durability + guarantees. @@ -1309,7 +1298,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; which include the new data values and information about commit actions. The WAL records are written first. Second, the data itself shall exist in the heap and index files. - In opposite to the WAL records, this part may or may + In constrast with the WAL records, this part may or may not have been transferred entirely from Shared Memory to the files. @@ -1321,15 +1310,15 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; a consistent state, especially that all WAL records up to this point were successfully stored in heap and index files. Starting here, the recovery process copies the remaining WAL records - to heap and index. As a result, the files contain all - changes and reach a consistent state. Changes of committed - transactions are visible; those of uncommited transactions + to heap and index. As a result, the heap files contain all + changes recorded to the WAL and reach a consistent state. Changes of committed + transactions are visible; those of uncommitted transactions are also in the files, but - as usual - they are never seen - by any of the following transactions because uncommited + by any of the following transactions because uncommitted changes are never shown. Such recovery actions run completely automatically, it is not necessary that a database administrator configure or start anything by - himself. + themself. Disk crash @@ -1341,7 +1330,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; prepare for such a situation. - He obviously needs a backup. How to take such a backup + They obviously needs a backup. How to take such a backup and use it as a starting point for a recovery of the cluster is explained in more detail in the next chapter. @@ -1353,11 +1342,11 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; and there is no room for additional data. In this case, PostgreSQL stops accepting data-modifying commands or even terminates completely. - No data loss or data corruption will occur. + Committed data is neither lost nor corrupted. - To come out of such a situation, the administrator should - remove unused files from this disk. But he should never + To recover from such a situation, the administrator should + remove unused files from this disk. But they should never delete files from the data directory. Nearly all of them are necessary for the consistency @@ -1428,9 +1417,9 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; The tool pg_dump is able to take a copy of the complete cluster or certain parts of it. It stores - the copy in the form of SQL CREATE and - INSERT commands. It runs in - parallel to other processes in its own transaction. + the copy in the form of SQL commands like CREATE and + COPY. It runs in + parallel with other processes in its own transaction. The output of pg_dump may be used as @@ -1457,7 +1446,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; directory structure plus files which contain a consistent copy of the original cluster. pg_basebackup runs in - parallel to other processes in its own transaction. + parallel with other processes in its own transaction. The second step is recommended but not necessary. All -- 2.17.0 --aVD9QWMuhilNxW9f--