Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jTnU2-0000PF-C2 for pgsql-docs@arkaria.postgresql.org; Wed, 29 Apr 2020 14:14:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1jTnTT-0007nF-VA for pgsql-docs@arkaria.postgresql.org; Wed, 29 Apr 2020 14:14:07 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jTnTT-0007n8-1X for pgsql-docs@lists.postgresql.org; Wed, 29 Apr 2020 14:14:07 +0000 Received: from mout.kundenserver.de ([212.227.126.134]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1jTnTF-0001gq-AH for pgsql-docs@lists.postgresql.org; Wed, 29 Apr 2020 14:14:05 +0000 Received: from [192.168.178.43] ([77.189.248.134]) by mrelayeu.kundenserver.de (mreue009 [212.227.15.129]) with ESMTPSA (Nemesis) id 1MmU9R-1ilQ6s2U0v-00iVKb for ; Wed, 29 Apr 2020 16:13:47 +0200 Subject: Re: Additional Chapter for Tutorial To: pgsql-docs@lists.postgresql.org References: <13c65997-9502-7671-1a7b-50e5d5093514@purtz.de> <40aca6b45b0da614b94c8ef86d89b70b@xs4all.nl> <4b8bdb3b-10a7-e541-18ff-6a556ce89cd1@purtz.de> From: =?UTF-8?Q?J=c3=bcrgen_Purtz?= Message-ID: <9df5b9c8-71fe-47a8-4670-346a0ae7e2cd@purtz.de> Date: Wed, 29 Apr 2020 16:13:46 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <4b8bdb3b-10a7-e541-18ff-6a556ce89cd1@purtz.de> Content-Type: multipart/mixed; boundary="------------7C8283DFD83D0B16D0FFF65A" Content-Language: en-US X-Provags-ID: V03:K1:1d7giZKc5Y871tLcZTBb8c44z5AO1/tCtY4Chtt4swydz05r/03 wxxgbS/cp7PtH+3n5FMdpnfSpT8adylaJJmPmwm0v2hJmXd4uWfHe//2xQBAuzO3OMJqh2a 55rHvNAhEmZo//8il0qA/I1Nl/NfE2+7ePG5+eWhi0yJPuZ9PH58Cy/w48WdsdSEFEmtTp0 Rnl05vvEXM36wK329Zrlg== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:tNyKYKxubcY=:0I7ydpnMc8FHbtfwZ9Dq9b Vjg+obNS6h16VtXq2ExsAaTcEh0zTL64baybDSfFKA71eQJgIrQVghOgyeG0wfvJ56mm1lDTb /SQ13VvBa3m3YY0izBmL+N97iFgQr3JqtER7jLm7uwh10JQizXecXU6YjDqWjxru0q5GnfAH3 ecokI2v3HjVKLjIS6uljZW9s333m/M2AQDg1hWqT1asp/OKNFh8ScDYMGIwhkUzok6vDmZky1 X55nWhc8SdjgTuLPxob5NWr+h+3MOpuLUrOkwxPHja9HOei0lJ1QiZEAv6dkcJMZIpTK90gOh AfBChso/qrrPVatkoXaooWIIVfvdSnZa7NrJdUMIP/Jy85ZzPqY+ApVOMlipRCtwnXDsd8XIv oT/LfpsCumFaI2zhzi93KyS8fNbafZYiYcx0RBoqesPwDFL7mPnbNeN+Eu9fkBzfa3zOMJygC XQ4Wsbb6aaHZLL3i4JNINaP33it1BV/aqktUteHyr0Vtsf6NlZhBwG7dI3mXM+4DVnx/NuPtr WIrYOgAkSVsyI/lFNTHaa8I5KF0ODxggJcSmsK9c2PQ01YQHV74KWPgbet4gohtUB6uUKCl3v DTdWvElsnadgi34YHNxQjhYwJlZV5X1nEs3shrzqgx02vXdPAgLa6Y9pEBFmZ4AMHknU/nLKl zmCEliDmsYEokdUNcldpRVEEzi6yhk/vZ1ioyUOWptBKdGCR7CX7gjhYC3YIINZF9T4ja5l9G YZo2jSB6FsE2DWODs6Zz0j2/cFLqiYULcKWiOzNAGXKQfgWJtQd7xmEsazjl/KbzYGCHs4apx +FnN2NS17me/BTpGi0cVeymioR3ntoXNNETIeotUejIWEHZ8RB5u50qQz5kyb31qlWGcduJ List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk This is a multi-part message in MIME format. --------------7C8283DFD83D0B16D0FFF65A Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 20.04.20 10:30, Jürgen Purtz wrote: > On 17.04.20 20:40, Erik Rijkers wrote: >> Very good stuff, and useful. I think. >> >> I mean that but nevertheless here is a lot of comment :) >> >> (I didn't fully compile as docs, just read the 'text' from the patch >> file) > > Thanks. Added nearly all of the suggestions. > > What is new? Added two sub-chapters 'mvcc' and 'vacuum' plus graphics. Made some modifications in previous sub-chapters and in existing titles. Added some glossary entries. -- Jürgen Purtz --------------7C8283DFD83D0B16D0FFF65A Content-Type: text/x-patch; charset=UTF-8; name="0003-architecture.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0003-architecture.patch" diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index f6c4627c3e..be04972bd7 100644 --- a/doc/src/sgml/advanced.sgml +++ b/doc/src/sgml/advanced.sgml @@ -1,7 +1,7 @@ - Advanced Features + Advanced SQL Features Introduction diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml new file mode 100644 index 0000000000..014a1172b6 --- /dev/null +++ b/doc/src/sgml/architecture.sgml @@ -0,0 +1,1159 @@ + + + + Architecture of <productname>PostgreSQL</productname> + + + Every DBMS implements basic strategies to achieve a fast and + robust system. This chapter provides an overview of what + techniques PostgreSQL uses to + reach this aim. + + + + Collaboration of Processes, RAM, and Files + + As is a matter of course, in a client/server architecture + clients do not have direct access to the database. Instead, + they merely send requests to the server side and receives + according information from there. In the case of + PostgreSQL, at the server + side there is one process per client, the so-called + Backend process. + It acts in close cooperation with the + Instance which + is a group of tightly coupled other server side processes plus a + Shared Memory + area. + + + + At start time, an instance is initiated by the + Postgres + (or: Postmaster) process. + It loads the configuration files, allocates the + Shared Memory + and starts the comprehensive network of processes: + Background Writer, + Checkpointer, + WAL Writer, + WAL Archiver, + Autovacuum processes, + Statistics Collector, + Logger, and more. + visualizes + main aspects of their collaboration. + + +
+ Architecture + + + + + + + + + + +
+ + + Whenever a client application tries to connect to a + database, + this request is handled in a first step by the + Postgres process. It checks the authorization, + starts a new Backend process, + and instructs the client application to connect to it. All + further client requests go to this process and are handled + by it. + + + + Client requests (SELECT, UPDATE, ...) usually leads to the + necessity to read or write some data. In a first attempt + the client's Backend process tries + to get the information out of Shared + Memory. This Shared + Memory is a mirror of parts of the + heap and + index files. + Because files are much larger than memory, it's likely that + the desired information is not (completely) available + in the RAM. In this case the Backend process + must transfer additional file pages to + Shared Memory. Files are physically + organized in pages. Every transfer between files and + RAM is performed in units of complete pages, retaining + their size and layout. + + + + Reading file pages is notedly slower than reading + RAM. This is the main motivation for the existence of + Shared Memory. As soon as one + of the Backend processes has done + the job those pages are available for all other + Backend processes for direct + access in RAM. + + + + Shared Memory is limited in size. + Sooner or later it becomes necessary to overwrite old RAM + pages. As long as the content of such pages hasn't + changed this is not a problem. But in + Shared Memory also write + actions take place + - performed by any of the Backend + processes (or an + autovacuum process, + or other processes). Such modified pages are called + dirty pages. + Before dirty pages can be overwritten, + they must be transferred back to disk. This is a two-step process. + + + + First, whenever the content of a page changes, a + WAL record + is created out + of the delta-information (difference between old and + new content) and stored in another area of the + Shared Memory. These + WAL records are read by the + WAL Writer process, + which runs in parallel to the Backend + processes and all other processes of + the Instance. It writes + the continuously arising WAL records to + the end of the current + WAL file. + Because of the sequential nature of this writing, it is much + faster than the more or less random access + to data files with heap + and index information. + As mentioned, this WAL-writing happens + in an independent process. Nevertheless all + WAL records created out of one + dirty page must be transferred + to disk before the dirty page + itself can be transferred to disk. + + + + Second, the transfer of dirty buffers + from Shared Memory to files must + take place. This is the primary duty of the + Background Writer process. Because + huge I/O activities can block other processes significantly, + it starts periodically and acts only for a short period. + Doing so, his expensive I/O activities are spread over + time avoiding huge I/O peaks. Also the + Checkpointer process transfers + dirty buffers to files, see next + paragraph. + + + + The Checkpointer has a special + duty. As its name suggests, he has to create + Checkpoints. Such a + Checkpoint + is a point in time when all older dirty buffers, + all older WAL records, and + lastly a special Checkpoint record + have been written and flushed to disk. + In consequence, after a Checkpoint + data files and WAL files are in sync. + In case of a recovery (after a crash of the instance) + it is known that the information of all + WAL records preceding + the last Checkpoint record + is already integrated into the data files. This + speeds up a possibly occurring recovery. + + + + In correlation with data changes, + WAL records arise and are written + to WAL files. + Those WAL files - in combination with + a previously taken Base Backup - + are necessary to restore a database after a crash of the + disk, where data files have been stored. Therefore it is + recommended to transfer a copy of the + WAL files + to a second, independent place. The purpose of the + WAL Archiver process is to perform + this copy action. + + + + The Stats Collector + collects counters about accesses to SQL + objects like tables, rows, indexes, pages, + and more. It stores the obtained information in system + tables. + + + + The Logger writes + text lines about serious and non-serious events which can happen + during database access, e.g.: wrong password, no permission, + long-running queries, ... . + + +
+ + + The logical Perspective: Cluster, Database, Schema + + + On a Server + exists one or more Cluster, + each of them contains three or more + databases, each + database contains many schema, + a schema contains tables, + views, and a lot of other objects. + Each table or view + belongs to a certain schema, they cannot + belong to another schema. The same is + true for the schema/database and database/cluster relation. + visualizes + this hierarchy. + + +
+ Cluster, Database, Schema + + + + + + + + + + +
+ + + A Cluster is the outer frame for a + collection of databases. Clusters are created by the command + . + + + + template0 is the very first + database of any + cluster. C-routines create + template0 during the initialization phase of + the cluster. + In a second step template1 is generated + as a copy of template0 and finally + postgres as a copy of + template1. All other + new databases + of this cluster, + such as my_db, are also copied from + template1. Due to the special + role of template0 as the origin + of all other databases, no client + can connect to it. + + + + Every database contains + schemas, and + schemas contain the other + SQL Objects. + Schemas are namespaces for + their SQL objects and ensure - with one + exception - that within their scope names are used only once across all + types of SQL objects. E.g., it is not possible + to have a table employee and a view + employee within the same + schema. But it is possible to have + two tables employee in different + schemas. In this case the two tables + are different objects and absolutely independent from each + other. The only exception to this cross-type uniqueness is that + unique constraints + and the according unique index + use the same name. + + + + Some schemas are predefined. + public acts as the default + schema and contains all such + SQL objects, which are created + within public or without using any schema + name. public shall not contain user defined + SQL objects. Instead, it is recommended to + create a separate schema which + holds individual objects like application-specific tables or + views. pg_catalog is a schema for all tables + and views of the + System Catalog. + information_schema is a schema for several + tables and views of the System Catalog + in a way which conforms to the SQL standard. + + + + There are a lot of different SQL object + types: database, schema, table, view, materialized + view, index, constraint, sequence, function, procedure, + trigger, role, data type, operator, tablespace, extension, + foreign data wrapper, and more. A few of them, the + Global SQL Objects, + are outside of the strict hierarchy: + All database names, all tablespace names, and all role names + are automatically known and available throughout the + complete cluster, independent from + the database or schema in which they where originally + defined. + shows the relation between the object types. + + +
+ Hierarchy of Internal Objects + + + + + + + + + + +
+ +
+ + + The physical Perspective: Directories and Files + + + PostgreSQL organizes long lasting + data as well as volatile state information about transactions + or replication actions in the file system. Every + Cluster has its own root directory + anywhere in the file system. In many cases, the environment + variable PGDATA points to this directory. + The example of the following survey, which is shown in + , uses + data as the name of this root directory. + + +
+ Directory Structure + + + + + + + + + + +
+ + + data contains many subdirectories and + some files, all of which are necessary to store long lasting + as well as temporary data. The following paragraphs + describe the files and subdirectories in + data. + + + + base is a subdirectory in which one + subdirectory per database exists. + The names of those subdirectories consist of numbers. + These are the internal + Object Identifiers (OID), which are + numbers to identify the database definition in the + System Catalog. + + + + Within the database-specific + subdirectories there are many files: one or more for + every table and every index to store heap + and index data. Those files are + accompanied by files for the + Free Space Maps + (extension _fsm) and + Visibility Maps + (extension _vm), which contain optimization information. + + + + Another important subdirectory is global. + In analogy to the database-specific + subdirectories, there are files containing information about + Global SQL objects. + One type of such Global Objects are + tablespaces. In + global there is information about + the tablespaces, not the + tablespaces itself. + + + + The subdirectory pg_wal contains the + WAL files. + They arise and grow parallel to data changes in the + cluster and remain alive as long as + they are required for recovery, archiving, or replication. + + + + The subdirectory pg_xact contains + information about the status of each transaction + (IN_PROGRESS, COMMITTED, ABORTED, or SUB_COMMITTED). + + + + In pg_tblspc there are symbolic links + that point to directories containing such + SQL objects that are created within + tablespaces. + + + + In the root directory data + there are also some files. + In many cases, the configuration + files of this cluster + are stored here. As long as the + cluster is up and running, the file + postmaster.pid exists here + and contains the ID (pid) of the + Postgres (respectively + Postmaster) process. + + + + For more details about the physical implementation + of database objects, see . + + +
+ + + MVCC + + + In most cases, PostgreSQL based applications + support many clients at the same time. Therefore, it is necessary to + protect simultaneously running requests from unwanted overwriting + of other's data as well as from reading inconsistent data. Imagine an + online shop offering the last copy of an article. Two clients show the + article at their user interface. After a while, but at the same time, + both users decide to put it to their shopping cart or even to buy it. + The dabase must take a sensible decision to prevent the application + from promising delivery of the single article to both clients. + + + + A first approach may be the locking of critical rows. There are + two main categories of such techniques: Optimistic Concurrency + Control (OCC) and Two Phase Locking (2PL). + PostgreSQL implements the more sophisticated + technique Multiversion Concurrency Control (MVCC). + The crucial advantage of MVCC over other technologies gets evident + in multiuser OLTP environments with a huge number of concurrent write + actions. There, MVCC generally performs better than solutions using locks. + In a PostgreSQL database reading never blocks + writing and writing never blocks reading, even in the strictest level of + transaction isolation. + + + + Instead of locking rows, the MVCC technique creates + a new version of the same row when any data-change takes place. In order + to distinct between these versions as well as to track the timeline + of the row, each of the versions contain, in additon to their user + defined columns, two special system columns, which are not visible + for the usual SELECT * FROM ... command. + The column xmin contains the transaction ID (xid) + of the transaction which creates this version of the row. Accordingly, + xmax contains the xid of the transaction which has + deleted this version respectively a zero, if the version is not + deleted. You can read both with the command + SELECT xmin, xmax, * FROM ... . + + + + When we speak about transaction IDs, you need to know that xids are like + sequences. Every new transaction receives the next number as its ID. + Therefore, this flow of xids represents the flow of transaction + start events over time. But keep in mind that xids are independent from + any real time messurement - in milliseconds or whatever. If you dive + deeper into PostgreSQL, you will recognize + parameters with names such as 'xxx_age'. Despite of their names, + these '_age' parameters do not specify a period of time but represent + a certain number of transactions, e.g. 100 millions. + + + + Please note that the description in this chapter simplifies the facts + by omitting details. When many transactions are running simultaneously, + things can get very complicated. Sometimes they get aborted via + ROLLBACK immediately or after a lot of other activities, sometimes + a single row is involved in more than one transaction, sometimes + a client crashes, sometimes the sequence of xids restarts + from zero, ... . Therefore, every version of a row contains more + system columns and flags, not only xmin and xmax. + + + + So, what's going on in detail when write accesses take place? + shows details concerning + xmin, xmax, and the user data. + + +
+ Multiversion Concurrency Control + + + + + + + + + +
+ + + An INSERT command creates the first + version of a row. Beside its user data 'x', + this version contains the ID of the creating transaction 123 + in xmin and 0 in xmax. xmin indicates that the version + exists since transaction 123 and xmax indicates that + it is currently not deleted. + + + + Somewhat later, transaction 135 executes an + UPDATE of this row, intending to change the user data + from 'x' to 'y'. According to + the MVCC principles, the data in the old version of the row does not + change! The value 'x' keeps as it was before. + Only xmax changes to 135. Now this version + is treated as valid exclusively for transactions with xids from + 123 to 134. As a substitute + for the non-ocurring change of data in the old version, the + UPDATE creates a new version of the row with + its xid in xmin, 0 in xmax, and 'y' in the + user data (plus all the other user data from the old version). + This version is now valid for all comming transactions. + + + + All subsequent UPDATE commands behave in the + same way as the first one: they put their + xid to xmax of the current version, create the next version + with their xid in xmin, 0 in xmax, and the new user data. + + + + Finally, a row may be deleted by a DELETE. + Even in this case all versions of the row keep as + before, nothing is thrown away! Only xmax of the last + version changes to the xid of the DELETE + transaction, which indicates that it is only valid for + transactions with xids older than its own (from + 142 to 820 in this + example). + + + + In summary, the MVCC technology creates more and more versions + of the same row in a table's heap file and leaves them there, + even with a DELETE command. The youngest version is relevant + for all future transactions. But the system must also preserve + some of the older ones for a certain amount of time as there + is the possiblility that they are or could become relevant + for any of the pending transactions. Over time, also the + older ones get out of scope for ALL transactions and therefore + becomes unnecessary. Nevertheless, they exist physically on + the disk and occupy space. + + + + Please keep in mind: + + + + + + xmin and xmax indicate the range from where to where + row versions are valid (visible) for transactions. + This range doesn't imply any direct temporal meaning, + the sequence of xids reflects only the sequence of + transaction begin events. As + xids grow, old row versions get out of scope over time. + If an old row version is no longer valid for ALL existing + transactions, it's called dead. The + space occupied by the sum of all dead row versions is + called bloat. + + + + + + Internally, an UPDATE command acts in the + same way as a DELETE command followed by + an INSERT command. + + + + + + Nothing gets wiped away - with the consequence that the database + occupies more and more disc space. It is obvious that + this behaviour has to be automatically corrected in some + way. The next chapter explains how AUTOVACUUM fulfills + this task. + + + + + +
+ + + Vacuum + + + As we have seen in the previous chapter, the database tends to occupy + more and more disc space, the bloat. + This chapter explains how the SQL command + VACUUM and the automatically running + AUTOVACUUM processes clear the situation. + + + + Client processes can issue the SQL command VACUUM at arbitrary + points in time. DBAs do this when they recognize special situations + or they start it in batch jobs which run on a regular basis. + AUTOVACUUM processes run as part of the + Instance at the server. + There is an constantly running AUTOVACUUM daemon. He permantly + controlls the state of all databases based on values which are + collected by the + Statistics Collector and starts + AUTOVACUUM processes whenever he detects certain situations. + Thus, it's a dynamic behavior of PostgreSQL + with the intention to tidy up - not always, but whenever it + is appropriate. + + + + VACUUM as well as AUTOVACUUM don't just eliminate bloat. + They perform additional tasks to minimizing future + I/O activities of themselves as well as of other processes. + This additional work can be done in a very efficient way, + since in most cases the expensive physical access to pages + has taken place anyway for the purpose of eliminating bloat. + The additional operations are: + + + + + + + Freeze: Mark the jungest row version + as frozen. This means that the version + is always treated as valid (visible) independent from + the wraparound problematic (see below). + + + + + + Visibility Map and + Free Space Map: Log information about + the state of the handled pages in two additional files, the + Visibility Map and the Free Space Map. + + + + + + Statistics: Collect statistics about the + number of rows per table, the distribution of values, and so on, + as the basis for query planners decision making. + + + + + + + The eagerness - you can call it 'agressivity' - of the + operations eliminating bloat and + freeze is controlled by configuration + parameters, runtime flags and in extreme situations by + themselves. Because vacuum operations typically are I/O + intensive, which can hinder other activities, AUTOVACUUM + avoids to perform many vacuum operations in a bulk. Instead, + it carries out many short actions with time gaps in between. + The SQL command VACUUM runs immediately without any + time gaps. + + + + Eliminate Bloat + + + To determine which of the row versions are superfluous, the + elimination operation must evaluate xmax against several criterias + which all must apply: + + + + + + xmax must be different from zero because a value of zero + indicates that the row version is still valid. + + + + + + xmax must contain an xid which is older than the oldest xid of all + currently running transactions (min(pg_stat_activity.backend_xmin)). + This criterion guarantees that no existing or upcomming transaction + will have read or write access this row version. + + + + + + The transaction of xmax must be commited. If it was rollback-ed, + this row version is treated as valid. + + + + + + If there is the situation that the row version is part of + multiple transactions, special care and some more actions + must be taken, see: . + + + + + + + After the vacuum operation detects a superflous row version, it + marks its space as free for future use of writing + actions. Only in rare situations (or in the case of VACUUM FULL) + this space is released to the operating system. In most cases + it keeps occupied by PostgreSQL and will be used by future + INSERT or UPDATE commands concerning this or a completely + different row. + + + + Which actions start the elimination of bloat? + + + + + + When a client issues the SQL command VACUUM in its default format, + i.e. without any option. To boost performance, in this and the + next case VACUUM does not read and act on all pages of the heap. + The Visibility Map, which is very compact and therefore has a small + size, contains information about pages, where bloat-candidates might + be found. Only such pages are processed. + + + + + + When a client issues the SQL command VACUUM with the option FREEZE. + (In this case it undertakes much more actions, see + ). + + + + + + When a client issues the SQL command VACUUM with the option FULL. + Also in this mode the bloat disappears, but the used strategy + is very different: In this case the complete table is copied + to a different file skipping all outdated row versions. This + leads to a significant reduction of used disc space because + the new file contains only the actual data and the old file + is deleted. + + + + + + When an AUTOVACUUM process acts. For optimization + purposes he considers the Visiblility Map in the same way as + VACUUM. Additionally, he ignores tables with few modifications, + see + which defaults to 50 rows and + + which defaults to 20%. + + + + + + + + This logic only applies to row versions of the heap. Index entries + don't use xmin/xmax. Nevertheless such index entries, which would + lead to outdated row versions, are released accordingly. + (??? more explanations ???) + + + + The above descriptions omit the fact that xids on a real computer + have a limited size. They count up in the same way as sequences and after + a certain number of new transactions they are forced to restart + from the beginning, which is called wraparound. + Therefore the terms 'old transaction' / 'young transaction' does + not always correlate with low / hight values of xids. Near to the + wraparound point there are cases where xmin has a highter value + than xmax, although their meaning is said to be older than xmax. + + +
+ Cyclic usage of XIDs + + + + + + + + + + +
+ +
+ + + Freeze Row Versions + + + The use of a limited range of IDs for transactions leads + to the necessity to restart the sequence sooner or later. + This does not only have the rare consequence previously + described that sometimes xmin is huger than xmax. The far + more important problem is, that whenever the system has + to evaluate a WHERE condition, it must decide which row + versions are valid (visible) from the perspective of the + transaction of this query. If a wraparound couldn't happen, + this decision would be relative easy: the xid + must be between xmin and xmax, and the corresponding + transactions of xmin and xmax must be commited. However, + PostgeSQL has to consider the possiblity of wraparounds. + Therefore the decision becomes more complex. The general + idea of the solution is, to use the 'between xmin and xmax' + comparision only during the youngest period of the row + versions livetime and afterwards replace it with a + 'valid forever' flag in its header. + + + + + + + In a first step, PostgreSQL divides the complete range of + possible xids into two halfs with the two splitt-points + 'txid_current' and 'txid_current + 2^31'. The half behind + 'txid_current' is considered to represent xids of the + 'past' and the half ahead of 'txid_current' those of the + 'future'. Those of the 'past' are valid (visible) and those + of the 'future' not. + + + + + + Over time the two splitt-points move forward. When + 'txid_current + 2^31' would reach a row version with + xmin equal to that value, it would immediately jump + from 'past' to 'future' and would be no longer visible! + + + + + + To avoid this unacceptable extinction of data the vacuum + operation freeze clears the situation + long before the splitt-point is reached. It sets a flag + in the header of the row version, which eliminates + completely the future use of xmin/xmax and indicates + that the version is valid not only in the 'past'-half + but also in the 'future'-half as well as in all comming + epochs. + + + + + + Which row versions can be frozen by the vacuum operation? + Again, serveral criterias must be checked and all must be met. + + + + + + xmax must be zero because only non-deleted rows can be + visible 'forever'. + + + + + + xmin must be older than all currently existing transactions. + This guarantees that no existing transaction can modify or + delete the version. + + + + + + The transaction of xmin must be commited. + + + + + + + At what point in time the freeze operation will take place? + + + + + When a client issues the SQL command VACUUM with its + FREEZE option. In this case all such pages are + processed that are marked in the Visibility Map + to potentially have unfrozen rows. + + + + + + When a client issues the SQL command VACUUM without any + option, but finds that there are xids older than + + (default: 150 million) minus + + (default: 50 million). + As before, all such pages are processed that are + marked in the Visibility Map to potentially have unfrozen + rows. + + + + + + When an AUTOVACUUM process runs. Such a process acts + in one of two modes: + + + + + + In the normal mode he skips + pages with row versions that are younger than + + (default: 50 million) and works only on pages where + all xids are older. The skipping of jung xids prevents + work on such pages, which are likely to be changed + by one of the future SQL commands. + + + + + + The process switches + to an aggressive mode, if he recognize + that for the processed table their oldest xid exceeds + + (default: 200 million). The value of the oldest unfrozen + xid is stored per table in pg_class.relfrozenxid. + In this aggressive mode mode AUTOVACUUM + processes all such pages of the selected table that are marked + in the Visibility Map to potentially have bloat or unfrozen rows. + + + + + + + + + + In the first two cases and with autovacuum in + aggressive mode, the system knowns + to which value the oldest unfrozen xid has moved forward and + logs the value in pg_class.relfrozenxid. + The distance between this value and the 'txid_current' split + point becomes smaller and the distance to 'txid_current + 2^31' + larger than before. + + +
+ Freeze + + + + + + + + + + +
+ +
+ + + Protection against Wraparound Failure + + + The AUTOVACUUM processes are initiated by the constantly + running autovacuum daemon. If the + daemon detects that for a table + autovacuum_freeze_max_age is exceeded, it starts + an AUTOVACUUM process in the aggressive mode + (see above) - even if AUTOVACUUM is disabled. + + + + + Visibility Map and Free Space Map + + + The (VM) + contains two flags - stored as + two bits - for each page of the heap. If the first bit + is set, it indicates that the associated page does not + contain any bloat. If the second one is set, it indicates + that the page contains only frozen rows. + + + + Please consider two details. First, usually a page + contains many rows, each of them in many versions. + However, the flags are associated with the page, + not with a row or a row version! The flags are set + only under the condition that they are valid for ALL + row versions of the page. Second, since there + are only two bits per page, the VM is considerably + smaller than the heap. Therefore it is buffered + in RAM in almost all cases. + + + + The setting of the flags is silently done by VACUUM + and AUTOVACUUM during their bloat and freeze operations. + This is done to accelerate future vacuum actions, + regular accesses to heap pages, and some accesses to + the index. The flags are cleared with every data-modifying + operation on any row version of the page. + + + + The (FSM) + tracks the amount of free space per page. It is + organized as a highly condensed b-tree of (rounded) sizes. + As long as VACUUM or AUTOVACUUM change the free space + on any processed page, they log the new values in + the FSM in the same way as all other writing + processes. + + + + + + Statistics + + + Statistic information helps the Query Planner to take optimal + decisions for the generation of execution plans. This + information can be gathered with the SQL commands ANALYZE + or VACUUM ANALYZE. But also autovacuum processes gather + such information. Depending on the percentage of changed rows + per table + the autovacuum daemon starts autovacuum processes to collect + statistics per table. This dynamic invocation of analyze + operations allows PostgreSQL to + adopt queries to changing circumstances. + + + + For more details about vacuum operations, especially for its + numerous parameters, see . + + + +
+ + + +
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 68179f71cd..ca29c8bc4e 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -8,9 +8,10 @@ - - - + + + + diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml index 8c6cb6e942..ff2355a10c 100644 --- a/doc/src/sgml/glossary.sgml +++ b/doc/src/sgml/glossary.sgml @@ -141,7 +141,8 @@ Background Writer (process) - A process that continuously writes dirty pages from + A process that continuously writes + dirty buffers from shared memory to the file system. It wakes up periodically, but works only for a short period in order to distribute its expensive I/O @@ -155,6 +156,16 @@ + + Bloat + + + Bloat is disk or RAM space, which does not contain relevant + data, e.g.: free space within blocks or outdated row versions. + + + + Cast @@ -208,15 +219,6 @@ - - Checkpointer (process) - - - A specialized process responsible for executing checkpoints. - - - - Checkpoint @@ -244,6 +246,15 @@ + + Checkpointer (process) + + + A specialized process responsible for executing checkpoints. + + + + Class (archaic) @@ -446,6 +457,26 @@ + + Dirty Buffer + + + File pages are mirrored in RAM (Shared Memory). + Data changes happens in a first step in RAM and are + transfered to disk at a later point in time. During the period + from the buffer change to its tranfer to disk, he is + called a dirty buffer. Sometimes people use the term + dirty page. + + + + + + Dirty Page + + + Durability @@ -459,6 +490,23 @@ + + Epoch + + + In a time-related context: The number of seconds since + 00:00:00 UTC on 1 January 1970, minus leap seconds. + + + In reference to transaction IDs: The number of available + xids is limited + because they use a 32-bit integer. Sooner or later the + sequence must restart from the beginning. Every such roundtrip + is called an epoch. + + + + Extension @@ -564,6 +612,38 @@ + + Fork + + + Each table and index has three so-called forks to + store data, Free Space Map, and Visibility Map in + separate files. The forks are called + 'main', 'fsm', + and 'vm'. An unlogged table has + the additional fork 'init' to + initialize the table in case of a crash. + + + + + + Free Space Map + + + For each table and index exists an assigned file, the + Free Space Map. It tracks the amount of free space + per page. It is organized as a highly condensed b-tree + of (rounded) size. + + + + For more information, see + . + + + + Function @@ -761,25 +841,6 @@ - - Logger (process) - - - If activated, the - Logger process - writes information about database events into the current - log file. - When reaching certain time- or - volume-dependent criteria, a new log file is created. - Also called syslogger. - - - For more information, see - . - - - - Log Record @@ -803,6 +864,25 @@ + + Logger (process) + + + If activated, the + Logger process + writes information about database events into the current + log file. + When reaching certain time- or + volume-dependent criteria, a new log file is created. + Also called syslogger. + + + For more information, see + . + + + + Master (server) @@ -929,8 +1009,8 @@ - - Postmaster (process) + + Postgres (process) The very first process of an instance. @@ -945,6 +1025,16 @@ + + Postmaster (process) + + + An outdated term for the Postgres process. + + + + Primary (server) @@ -1188,11 +1278,13 @@ A schema is a namespace for SQL objects, which all reside in the same - database. Each + database. Each SQL object must reside in exactly one schema. - The names of SQL objects of the same type in the same schema are enforced unique. + The names of SQL objects in the same schema are enforced unique. + This applies across all object types with a single exception: + Unique constraints and the according unique index use the same name. There is no restriction on reusing a name in multiple schemas. @@ -1651,6 +1743,24 @@ + + Visiblility Map + + + For each table exists an assigned file, the Visibliltiy Map, + which contains two bit per file page. If the first bit + is set, it indicates that the associated page does not + contain any bloat. If the second one is set, it indicates + that the page contains only frozen rows. + + + + + + WAL + + + WAL Archiver (process) @@ -1696,11 +1806,6 @@ - - WAL - - - WAL Record @@ -1728,8 +1833,8 @@ A process that writes WAL records - from shared memory to - WAL files. + from shared memory to + WAL files. For more information, see @@ -1771,5 +1876,45 @@ + + + Transaction ID (xid) + + + Like other objects, transactions need to have an unique + ID to identify them. The transaction manager creates + the sequence of xids. Because xids are stored in a 32-bit + integer, the available range is limited. Sooner or later + the sequence of xids must restart from the beginning. + + + There are three special xids: + + + + + 0 (InvalidTransactionId): is used for + xids of relations which are not a table. + + + + + 1 (BootstrapTransactionId): represents + transactions used during the first phase of cluster + initialization, see: + + + + + 2 (FrozenTransactionId): indicates a + frozen transaction. This value is no longer used. Instead, + the system uses a flag in the header of each row version + to mark them as frozen. + + + + + + diff --git a/doc/src/sgml/images/cluster-db-schema-ink-svgo.svg b/doc/src/sgml/images/cluster-db-schema-ink-svgo.svg new file mode 100644 index 0000000000..7e13753d48 --- /dev/null +++ b/doc/src/sgml/images/cluster-db-schema-ink-svgo.svg @@ -0,0 +1,160 @@ + + + Server (Hardware, Container, or VM) + + + + + + + + schema 'public' + + + tables, views, ... + + + + (more system schemas) + + + + + + + schema 'public' + + + tables, views, ... + + + + 'my_schema' (optional) + + + tables, views, ... + + + + (more system schemas) + + + + + UML Note + + + + + + + + + + Server (Hardware, Container, or VM) + + + + cluster 'data' (default, managed by one instance) + + + + cluster 'cluster_2' (optional, managed by a different instance) + + + + + database 'template0' + + + + + + database 'template1' + + + + + + database 'postgres' + + + + + + database 'my_db' (optional) + + + + + + Global SQL objects + + + + + + + + + + 1) + + + By default, you work in the cluster 'data', database 'postgres', + + + schema 'public'. + + + 2) + + + More system schemas: pg_catalog, information_schema, + + + pg_temp, pg_toast. + + + 3) + + + Global SQL objects: Some SQL objects are automatically active + + + and known database- or even cluster-wide. + + + 4) + + + The command 'initdb' creates a new cluster with the three + + + databases 'template0', 'template1', and 'postgres'. The command + + + 'createdb' creates a new database. + + + 5) + + + If multiple clusters are active on one server at the same time, + + + each one is managed by an individual instance. Each such instance + + + uses a different port. + + + 6) + + + No client application is allowed to connect to 'template0'. + + + diff --git a/doc/src/sgml/images/cluster-db-schema-ink.svg b/doc/src/sgml/images/cluster-db-schema-ink.svg new file mode 100644 index 0000000000..58405ae685 --- /dev/null +++ b/doc/src/sgml/images/cluster-db-schema-ink.svg @@ -0,0 +1,482 @@ + + + + + + image/svg+xml + + Server (Hardware, Container, or VM) + + + + + Server (Hardware, Container, or VM) + + + + + + + + + + schema 'public' + tables, views, ... + + (more system schemas) + + + + + + + + schema 'public' + tables, views, ... + + 'my_schema' (optional) + tables, views, ... + + (more system schemas) + + + UML Note + + + + + + + + + + + + + + Server (Hardware, Container, or VM) + + + + cluster 'data' (default, managed by one instance) + + + + cluster 'cluster_2' (optional, managed by a different instance) + + + + + database 'template0' + + + + + database 'template1' + + + + + database 'postgres' + + + + + database 'my_db' (optional) + + + + + Global SQL objects + + + + + + + + + 1) + By default, you work in the cluster 'data', database 'postgres', + schema 'public'. + 2) + More system schemas: pg_catalog, information_schema, + pg_temp, pg_toast. + 3) + Global SQL objects: Some SQL objects are automatically active + and known database- or even cluster-wide. + 4) + The command 'initdb' creates a new cluster with the three + databases 'template0', 'template1', and 'postgres'. The command + 'createdb' creates a new database. + 5) + If multiple clusters are active on one server at the same time, + each one is managed by an individual instance. Each such instance + uses a different port. + 6) + No client application is allowed to connect to 'template0'. + + diff --git a/doc/src/sgml/images/cluster-db-schema-raw.svg b/doc/src/sgml/images/cluster-db-schema-raw.svg new file mode 100644 index 0000000000..af50c07330 --- /dev/null +++ b/doc/src/sgml/images/cluster-db-schema-raw.svg @@ -0,0 +1,173 @@ + + + + Server (Hardware, Container, or VM) + + + + + + + + + + + + + + schema 'public' + tables, views, ... + + + (more system schemas) + + + + + + + + + + schema 'public' + tables, views, ... + + + 'my_schema' (optional) + tables, views, ... + + + (more system schemas) + + + + UML Note + + + + + + + + + + + + + + + + + + Server (Hardware, Container, or VM) + + + + + cluster 'data' (default, managed by one instance) + + + + cluster 'cluster_2' (optional, managed by a different instance) + + + + + + + database 'template0' + + + + + + database 'template1' + + + + + + database 'postgres' + + + + + + database 'my_db' (optional) + + + + + + Global SQL objects + + + + + + + + + + + 1) + By default, you work in the cluster 'data', database 'postgres', + schema 'public'. + + 2) + More system schemas: pg_catalog, information_schema, + pg_temp, pg_toast. + + 3) + Global SQL objects: Some SQL objects are automatically active + and known database- or even cluster-wide. + + 4) + The command 'initdb' creates a new cluster with the three + databases 'template0', 'template1', and 'postgres'. The command + 'createdb' creates a new database. + + 5) + If multiple clusters are active on one server at the same time, + each one is managed by an individual instance. Each such instance + uses a different port. + + 6) + No client application is allowed to connect to 'template0'. + + + + diff --git a/doc/src/sgml/images/directories-ink-svgo.svg b/doc/src/sgml/images/directories-ink-svgo.svg new file mode 100644 index 0000000000..95fa76b9c6 --- /dev/null +++ b/doc/src/sgml/images/directories-ink-svgo.svg @@ -0,0 +1,164 @@ + + + Directory structure of a cluster + + + + + + Directory + + + + + + + File + + + + + + + + + + + Directory Structure + + + + + ... /pg/ + + + An arbitrary directory + + + + + + data/ + + + Root of cluster 'data' (see: PGDATA) + + + + + + base/ + + + Subdirectory containing per-database subdirectories + + + + + + 1/ + + + Subdirectory for data of first database 'template0' + + + + + + 12992/ + + + Subdirectory for data of second database 'template1' + + + + + + 12999/ + + + Subdirectory for data of third database 'postgres' + + + + + + nnnnn/ + + + Optional: more subdirectories for databases, e.g. 'my_db' + + + + + + global/ + + + Subdirectory with information about Global SQL Objects + + + + + + pg_wal/ + + + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + + + + + + pg_xact/ + + + Subdirectory for transaction commit status ('pg_clog' before version 10) + + + + + + pg_tblspc/ + + + Subdirectory containing symbolic links to tablespaces + + + + + + pg_... / + + + Some more subdirectories + + + + + + + 'postmaster.pid' and other files with cluster-wide relevance + + + + + + ... /xyz/ + + + Same or another arbitrary directory + + + + + + cluster_2/ + + + Root of another cluster 'cluster_2' + + + diff --git a/doc/src/sgml/images/directories-ink.svg b/doc/src/sgml/images/directories-ink.svg new file mode 100644 index 0000000000..5cc9d48e20 --- /dev/null +++ b/doc/src/sgml/images/directories-ink.svg @@ -0,0 +1,397 @@ + + + + + + image/svg+xml + + Directory structure of a cluster + + + + + Directory structure of a cluster + + + + + Directory + + + + + + File + + + + + + + + + + + + Directory Structure + + + + ... /pg/ + An arbitrary directory + + + + data/ + Root of cluster 'data' (see: PGDATA) + + + + base/ + Subdirectory containing per-database subdirectories + + + + + 1/ + Subdirectory for data of first database 'template0' + + + + 12992/ + Subdirectory for data of second database 'template1' + + + + 12999/ + Subdirectory for data of third database 'postgres' + + + + nnnnn/ + Optional: more subdirectories for databases, e.g. 'my_db' + + + + global/ + Subdirectory with information about Global SQL Objects + + + + pg_wal/ + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + + + + pg_xact/ + Subdirectory for transaction commit status ('pg_clog' before version 10) + + + + pg_tblspc/ + Subdirectory containing symbolic links to tablespaces + + + + pg_... / + Some more subdirectories + + + + + 'postmaster.pid' and other files with cluster-wide relevance + + + + + ... /xyz/ + Same or another arbitrary directory + + + + cluster_2/ + Root of another cluster 'cluster_2' + + diff --git a/doc/src/sgml/images/directories-raw.svg b/doc/src/sgml/images/directories-raw.svg new file mode 100644 index 0000000000..6d16a03169 --- /dev/null +++ b/doc/src/sgml/images/directories-raw.svg @@ -0,0 +1,144 @@ + + + + Directory structure of a cluster + + + + + + + + Directory + + + + + + + File + + + + + + + + + + + + + + + + Directory Structure + + + + + ... /pg/ + An arbitrary directory + + + + + data/ + Root of cluster 'data' (see: PGDATA) + + + + + base/ + Subdirectory containing per-database subdirectories + + + + + + 1/ + Subdirectory for data of first database 'template0' + + + + 12992/ + Subdirectory for data of second database 'template1' + + + + 12999/ + Subdirectory for data of third database 'postgres' + + + + nnnnn/ + Optional: more subdirectories for databases, e.g. 'my_db' + + + + + global/ + Subdirectory with information about Global SQL Objects + + + + + pg_wal/ + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + + + + + pg_xact/ + Subdirectory for transaction commit status ('pg_clog' before version 10) + + + + + pg_tblspc/ + Subdirectory containing symbolic links to tablespaces + + + + + pg_... / + Some more subdirectories + + + + + + 'postmaster.pid' and other files with cluster-wide relevance + + + + + + ... /xyz/ + Same or another arbitrary directory + + + + + cluster_2/ + Root of another cluster 'cluster_2' + + + diff --git a/doc/src/sgml/images/freeze-ink-svgo.svg b/doc/src/sgml/images/freeze-ink-svgo.svg new file mode 100644 index 0000000000..158a2e731c --- /dev/null +++ b/doc/src/sgml/images/freeze-ink-svgo.svg @@ -0,0 +1,84 @@ + + + Freeze + + + + + + + + + + + + Freeze to keep alive + + + + + | (0) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > (1) (5) | (2) | (3) | (4) + + + + PAST + + + FUTURE + + + + + + + + + + + + + + + + + + + + 0: 0 .. 2 ^ 32 - 1 + + + 1: txid_current + 2 ^ 31 (splitt-point) + + + 2: autovacuum_freeze_max_age (200 mio.) + + + 3: vacuum_freeze_table_age (150 mio.) + + + 4: vacuum_freeze_min_age (50 mio.) + + + 5: txid_current (splitt-point, jungest xid) + + + per table: pg_class.relfrozenxid must be between (1) and (5); + + + normally it is between (3) and (4) + + + + Unfrozen xid + + + + Frozen xid + + + (figure is out of scale) + + + diff --git a/doc/src/sgml/images/freeze-ink.svg b/doc/src/sgml/images/freeze-ink.svg new file mode 100644 index 0000000000..9c16fc7b5b --- /dev/null +++ b/doc/src/sgml/images/freeze-ink.svg @@ -0,0 +1,365 @@ + + + + + + image/svg+xml + + Freeze + + + + + Freeze + + + + + + + + + + + + + + + + Freeze + to keep alive + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + (1) + (5) + | + (2) + | + (3) + | + (4) + + + + PAST + FUTURE + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0: 0 .. 2 ^ +32 + - 1 + 1: txid_current + 2 ^ 31 (splitt-point) + 2: autovacuum_freeze_max_age (200 mio.) + 3: vacuum_freeze_table_age (150 mio.) + 4: vacuum_freeze_min_age (50 mio.) + 5: txid_current (splitt-point, jungest xid) + per table: pg_class.relfrozenxid + must + be between (1) and (5); + normally it is between (3) and (4) + + Unfrozen xid + + Frozen xid + (figure is out of scale) + + diff --git a/doc/src/sgml/images/freeze-raw.svg b/doc/src/sgml/images/freeze-raw.svg new file mode 100644 index 0000000000..4816569fbb --- /dev/null +++ b/doc/src/sgml/images/freeze-raw.svg @@ -0,0 +1,123 @@ + + + + Freeze + + + + + + + + + + + + + + + + + + + + + + Freeze to keep alive + + + + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + (1) + (5) + + | + (2) + + | + (3) + + | + (4) + + + + + + PAST + FUTURE + + + + + + + + + + + + + + + + + + + + + + 0: 0 .. 2 ^ 32 - 1 + 1: txid_current + 2 ^ 31 (splitt-point) + 2: autovacuum_freeze_max_age (200 mio.) + 3: vacuum_freeze_table_age (150 mio.) + 4: vacuum_freeze_min_age (50 mio.) + 5: txid_current (splitt-point, jungest xid) + per table: pg_class.relfrozenxid + must be between (1) and (5); + normally it is between (3) and (4) + + + Unfrozen xid + + + Frozen xid + + (figure is out of scale) + + + diff --git a/doc/src/sgml/images/internal-objects-hierarchy-ink-svgo.svg b/doc/src/sgml/images/internal-objects-hierarchy-ink-svgo.svg new file mode 100644 index 0000000000..26bce6176d --- /dev/null +++ b/doc/src/sgml/images/internal-objects-hierarchy-ink-svgo.svg @@ -0,0 +1,83 @@ + + + Hierarchy of Internal Objects + + + + + Hierarchy of internal Objects + + + + + + Cluster + + + + + Database Names + + + + + + Tablespace + + + + + + Replication Origins + + + + + + Subscription for + + + Logical Replication + + + + + + Role + + + + + + + Database + + + + + Extension + + + + + + Collation + + + + + + Schema + + + + + Table, View, ... + + + + + + diff --git a/doc/src/sgml/images/internal-objects-hierarchy-ink.svg b/doc/src/sgml/images/internal-objects-hierarchy-ink.svg new file mode 100644 index 0000000000..638ba83c4f --- /dev/null +++ b/doc/src/sgml/images/internal-objects-hierarchy-ink.svg @@ -0,0 +1,255 @@ + + + + + + image/svg+xml + + Hierarchy of Internal Objects + + + + + + Hierarchy of Internal Objects + + + + + + Hierarchy of internal Objects + + + + + Cluster + + + Database Names + + + + Tablespace + + + + Replication Origins + + + + Subscription for + Logical Replication + + + + Role + + + + + Database + + + + Extension + + + + Collation + + + + Schema + + + Table, View, ... + + + + + + diff --git a/doc/src/sgml/images/internal-objects-hierarchy-raw.svg b/doc/src/sgml/images/internal-objects-hierarchy-raw.svg new file mode 100644 index 0000000000..f0dc890f6b --- /dev/null +++ b/doc/src/sgml/images/internal-objects-hierarchy-raw.svg @@ -0,0 +1,95 @@ + + + + Hierarchy of Internal Objects + + + + + + + + Hierarchy of internal Objects + + + + + + + + Cluster + + + + Database Names + + + + + Tablespace + + + + + Replication Origins + + + + + Subscription for + Logical Replication + + + + + Role + + + + + + + Database + + + + + Extension + + + + + Collation + + + + + Schema + + + + Table, View, ... + + + + + + + diff --git a/doc/src/sgml/images/mvcc-ink-svgo.svg b/doc/src/sgml/images/mvcc-ink-svgo.svg new file mode 100644 index 0000000000..6a1aa9ffdf --- /dev/null +++ b/doc/src/sgml/images/mvcc-ink-svgo.svg @@ -0,0 +1,151 @@ + + + MVCC + + + + + + + + + + + + + + + + + T 123 : INSERT + + + + + 123 + + + 0 + + + 'x' + + + + + T 135 : UPDATE + + + + + 135 + + + 0 + + + 'y' + + + + 123 + + + 135 + + + 'x' + + + + + T 142 : UPDATE + + + + + 142 + + + 0 + + + 'z' + + + + 135 + + + 142 + + + 'y' + + + + 123 + + + 135 + + + 'x' + + + + + T 821 : DELTE + + + + + 142 + + + 821 + + + 'z' + + + + 135 + + + 142 + + + 'y' + + + + 123 + + + 135 + + + 'x' + + + + + + Legend + + + + xmin + + + xmax + + + data + + + + diff --git a/doc/src/sgml/images/mvcc-ink.svg b/doc/src/sgml/images/mvcc-ink.svg new file mode 100644 index 0000000000..aeb499f674 --- /dev/null +++ b/doc/src/sgml/images/mvcc-ink.svg @@ -0,0 +1,398 @@ + + + + + + image/svg+xml + + MVCC + + + + + MVCC + + + + + + + + + + + + + + + + + + + + + + + T + 123 +: INSERT + + + + 123 + 0 + 'x' + + + + T + 135 +: UPDATE + + + + 135 + 0 + 'y' + + 123 + 135 + 'x' + + + + T + 142 +: UPDATE + + + + 142 + 0 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + T + 821 +: DELTE + + + + 142 + 821 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + + Legend + + xmin + xmax + data + + + diff --git a/doc/src/sgml/images/mvcc-raw.svg b/doc/src/sgml/images/mvcc-raw.svg new file mode 100644 index 0000000000..c5005db37d --- /dev/null +++ b/doc/src/sgml/images/mvcc-raw.svg @@ -0,0 +1,145 @@ + + + + MVCC + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + T + 123 + : INSERT + + + + + 123 + 0 + 'x' + + + + + T + 135 + : UPDATE + + + + + 135 + 0 + 'y' + + 123 + 135 + 'x' + + + + + T + 142 + : UPDATE + + + + + 142 + 0 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + + T + 821 + : DELTE + + + + + 142 + 821 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + + + Legend + + xmin + xmax + data + + + + + diff --git a/doc/src/sgml/images/ram-proc-file-ink-svgo.svg b/doc/src/sgml/images/ram-proc-file-ink-svgo.svg new file mode 100644 index 0000000000..15353bf307 --- /dev/null +++ b/doc/src/sgml/images/ram-proc-file-ink-svgo.svg @@ -0,0 +1,285 @@ + + + PG Overall Server Architecture + + + + + + UML Note (200 x 20 px) + + + + + + UML Note (250 x 20 px) + + + + + + UML Note (100 x 35 px) + + + + + + UML Note (170 x 50 px) + + + + + + UML State (300x120) + + + + + + UML State (350x120) + + + + + + Disc + + + + + + + + + Laptop + + + + + + + + + + + + + + + + + + + Client + + + Server + + + + + + maintenance_work_mem (per connection) + + + work_mem (per query operation) + + + autovacuum_work_mem (per worker process) + + + temp_buffer (per connection) + + + ... + + + + Individual Memory + + + + + + shared_buffers (heap and index) + + + wal_buffers (WAL records) + + + ... + + + + Shared Memory (per Cluster) + + + + + Postgres (outdated term: Postmaster) + + + + + + 1 + + + + + Backend processes (one per connection) + + + + + + + + 3 + + + + + + + Creates backend processes + + + + + + 2 + + + + + + + + WAL Writer + + + + + + Checkpointer + + + + + + + + Checkpoint + + + Record + + + + + Background Writer + + + + + + WAL Archiver + + + + + + AutoVacuum + + + + + + Log Writer + + + + Stats Collector + + + + + Log + + + text lines, + + + sequential + + + + + + + Heap and + + + Index + + + binary blocks, + + + random + + + + + + + Read heap and index + + + pages and transfer + + + them to shared_buffers + + + + + + WAL + + + binary records, + + + sequential + + + + + + Archived + + + WAL + + + + + + + Via TCP/IP or socket + + + + RAM + + + PROCESSES + + + FILES + + diff --git a/doc/src/sgml/images/ram-proc-file-ink.svg b/doc/src/sgml/images/ram-proc-file-ink.svg new file mode 100644 index 0000000000..6fe6ea7ef1 --- /dev/null +++ b/doc/src/sgml/images/ram-proc-file-ink.svg @@ -0,0 +1,841 @@ + + + + + + image/svg+xml + + PG Overall Server Architecture + + + + + PG Overall Server Architecture + + + + + UML Note (200 x 20 px) + + + + UML Note (250 x 20 px) + + + + UML Note (100 x 35 px) + + + + UML Note (170 x 50 px) + + + + + UML State (300x120) + + + + UML State (350x120) + + + + + Disc + + + + + + + + + + + + Laptop + + + + + + + + + + + + + + + + + + + + + + + Client + Server + + + + + maintenance_work_mem (per connection) + work_mem (per query operation) + autovacuum_work_mem (per worker process) + temp_buffer (per connection) + ... + + Individual Memory + + + + + shared_buffers (heap and index) + wal_buffers (WAL records) + ... + + Shared Memory (per Cluster) + + + + + Postgres (outdated term: Postmaster) + + + + + 1 + + + + + Backend processes (one per connection) + + + + + + + 3 + + + + + + Creates backend processes + + + + 2 + + + + + + + + + + WAL Writer + + + + + + + Checkpointer + + + + + + + Checkpoint + Record + + + + + Background Writer + + + + + + + WAL Archiver + + + + + + + AutoVacuum + + + + + + + Log Writer + + + + + Stats Collector + + + + + + + Log + text lines, + sequential + + + + + Heap and + Index + binary blocks, + random + + + + + Read heap and index + pages and transfer + them to shared_buffers + + + + WAL + binary records, + sequential + + + + Archived + WAL + + + + + + Via TCP/IP or socket + + + + RAM + PROCESSES + FILES + + diff --git a/doc/src/sgml/images/ram-proc-file-raw.svg b/doc/src/sgml/images/ram-proc-file-raw.svg new file mode 100644 index 0000000000..814ff6dff1 --- /dev/null +++ b/doc/src/sgml/images/ram-proc-file-raw.svg @@ -0,0 +1,301 @@ + + + + PG Overall Server Architecture + + + + + + + + UML Note (200 x 20 px) + + + + UML Note (250 x 20 px) + + + + UML Note (100 x 35 px) + + + + UML Note (170 x 50 px) + + + + + + UML State (300x120) + + + + UML State (350x120) + + + + + + Disc + + + + + + + + + Laptop + + + + + + + + + + + + + + + + + + + + + + + + + + + + Client + Server + + + + + + + maintenance_work_mem (per connection) + work_mem (per query operation) + autovacuum_work_mem (per worker process) + temp_buffer (per connection) + ... + + Individual Memory + + + + + + shared_buffers (heap and index) + wal_buffers (WAL records) + ... + + Shared Memory (per Cluster) + + + + + + Postgres (outdated term: Postmaster) + + + + + 1 + + + + + + Backend processes (one per connection) + + + + + + + + 3 + + + + + + + Creates backend processes + + + + 2 + + + + + + + + + + + + WAL Writer + + + + + + + + Checkpointer + + + + + + + Checkpoint + Record + + + + + + Background Writer + + + + + + + + WAL Archiver + + + + + + + + AutoVacuum + + + + + + + + Log Writer + + + + + + Stats Collector + + + + + + + + Log + text lines, + sequential + + + + + + Heap and + Index + binary blocks, + random + + + + + + Read heap and index + pages and transfer + them to shared_buffers + + + + + WAL + binary records, + sequential + + + + + Archived + WAL + + + + + + + Via TCP/IP or socket + + + + + RAM + PROCESSES + FILES + + + diff --git a/doc/src/sgml/images/wraparound-ink-svgo.svg b/doc/src/sgml/images/wraparound-ink-svgo.svg new file mode 100644 index 0000000000..9882d2be23 --- /dev/null +++ b/doc/src/sgml/images/wraparound-ink-svgo.svg @@ -0,0 +1,40 @@ + + + Cyclic usage of XIDs + + + + + + + + + + Cyclic usage of XIDs modulo 2 ^ 32 + + + + + | (0) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | (1) | (2) | (3) | (4) + + + + + 0: 0 .. 2 ^ 32 - 1 + + + 1: oldest active xid (pg_stat_activity.backend_xmin) + + + 2: xmin of one row version + + + 3: xmax of the same row version + + + 4: jungest xid (txid_current) + + + diff --git a/doc/src/sgml/images/wraparound-ink.svg b/doc/src/sgml/images/wraparound-ink.svg new file mode 100644 index 0000000000..ca5e45ac2b --- /dev/null +++ b/doc/src/sgml/images/wraparound-ink.svg @@ -0,0 +1,198 @@ + + + + + + image/svg+xml + + Cyclic usage of XIDs + + + + + Cyclic usage of XIDs + + + + + + + + + + + + Cyclic usage of XIDs modulo 2 + ^ +32 + + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + | + (1) + | + (2) + | + (3) + | + (4) + + + + 0: 0 .. 2 ^ +32 + - 1 + 1: oldest active + xid (pg_stat_activity.backend_xmin) + 2: xmin of one row version + 3: xmax of the same row version + 4: jungest xid (txid_current) + + diff --git a/doc/src/sgml/images/wraparound-raw.svg b/doc/src/sgml/images/wraparound-raw.svg new file mode 100644 index 0000000000..9406f52970 --- /dev/null +++ b/doc/src/sgml/images/wraparound-raw.svg @@ -0,0 +1,79 @@ + + + + Cyclic usage of XIDs + + + + + + + + + + + + + + + + Cyclic usage of XIDs modulo 2 + ^ 32 + + + + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + | + (1) + | + (2) + | + (3) + | + (4) + + + + + 0: 0 .. 2 ^ 32 - 1 + 1: oldest active xid (pg_stat_activity.backend_xmin) + 2: xmin of one row version + 3: xmax of the same row version + 4: jungest xid (txid_current) + + + diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index c41ce9499b..6254bf9376 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -60,6 +60,7 @@ break is not needed in a wider output rendering. &start; + &architecture; &query; &advanced; diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml index c0889743c4..870349780f 100644 --- a/doc/src/sgml/query.sgml +++ b/doc/src/sgml/query.sgml @@ -1,7 +1,7 @@ - The <acronym>SQL</acronym> Language + Basics of the <acronym>SQL</acronym> Language Introduction diff --git a/doc/src/sgml/start.sgml b/doc/src/sgml/start.sgml index 5b73557835..b9184feb39 100644 --- a/doc/src/sgml/start.sgml +++ b/doc/src/sgml/start.sgml @@ -53,7 +53,7 @@ - Architectural Fundamentals + Client/Server Model Before we proceed, you should understand the basic @@ -65,37 +65,59 @@ In database jargon, PostgreSQL uses a - client/server model. A PostgreSQL + client/server model. A PostgreSQL session consists of the following cooperating processes (programs): + - - - - A server process, which manages the database files, accepts - connections to the database from client applications, and - performs database actions on behalf of the clients. The - database server program is called - postgres. - postgres - - + + + + A process at the server site with the name + + postgres (outdated term: Postmaster) + postgres. + It accepts connection requests from client applications, starts + (forks) a new + Backend process for each of them, and passes + the connection to it. From that point on, the client and the new + Backend process + communicate directly without intervention by the original + postgres process. Thus, the + postgres process is always running, waiting + for new client connections, whereas clients and associated + Backend processes come and go. + (All of this is of course invisible to the user. We only mention it + here for completeness.) + + - - - The user's client (frontend) application that wants to perform - database operations. Client applications can be very diverse - in nature: a client could be a text-oriented tool, a graphical - application, a web server that accesses the database to - display web pages, or a specialized database maintenance tool. - Some client applications are supplied with the - PostgreSQL distribution; most are - developed by users. - - + + + A group of processes at the server site, the instance, to which also the + postgres process belongs. Their duties are + handling of central, common database activities like file access, + vacuum, checkpoints, + replication, and more. The mentioned Backend processes + delegate those actions to the instance. + + - - + + + The user's client (frontend) application that wants to perform + database operations. Client applications can be very diverse + in nature: a client could be a text-oriented tool, a graphical + application, a web server that accesses the database to + display web pages, or a specialized database maintenance tool. + Some client applications are supplied with the + PostgreSQL distribution; most are + developed by users. + + + + As is typical of client/server applications, the client and the @@ -106,18 +128,6 @@ file name) on the database server machine. - - The PostgreSQL server can handle - multiple concurrent connections from clients. To achieve this it - starts (forks) a new process for each connection. - From that point on, the client and the new server process - communicate without intervention by the original - postgres process. Thus, the - master server process is always running, waiting for - client connections, whereas client and associated server processes - come and go. (All of this is of course invisible to the user. We - only mention it here for completeness.) - --------------7C8283DFD83D0B16D0FFF65A--