Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1YuuGl-0003P5-UO for pgsql-docs@arkaria.postgresql.org; Wed, 20 May 2015 03:02:08 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.80) (envelope-from ) id 1YuuGl-0004H6-4K for pgsql-docs@arkaria.postgresql.org; Wed, 20 May 2015 03:02:07 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1YuuGj-0004GS-NV for pgsql-docs@postgresql.org; Wed, 20 May 2015 03:02:06 +0000 Received: from lists.commandprompt.com ([69.168.55.30]) by makus.postgresql.org with esmtp (Exim 4.84) (envelope-from ) id 1YuuGe-00086r-Tt for pgsql-docs@postgresql.org; Wed, 20 May 2015 03:02:03 +0000 Received: from [192.168.1.3] (spinlock.commandprompt.com [69.168.55.30]) by lists.commandprompt.com (Postfix) with ESMTPA id 50FCB2A4 for ; Tue, 19 May 2015 20:01:59 -0700 (PDT) Received: from [192.168.1.3] ([67.168.169.165] helo=[192.168.1.3]) by assp.commandprompt.com with ESMTPS(AES128-SHA) (2.1.1); 19 May 2015 20:01:58 -0700 Message-ID: <555BF933.3050200@commandprompt.com> Date: Tue, 19 May 2015 20:02:11 -0700 From: "Joshua D. Drake" Organization: Command Prompt, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: pgsql-docs Subject: backup.sgml WIP (patch included) Content-Type: multipart/mixed; boundary="------------080400060803000200050004" X-Assp-Version: 2.1.1(11364) on assp.commandprompt.com X-Assp-Client-SSL: yes X-Assp-ID: assp.commandprompt.com m1-90919-10302 X-Assp-Envelope-From: jd@commandprompt.com X-Assp-Intended-For: pgsql-docs@postgresql.org X-Assp-Original-Subject: backup.sgml WIP (patch included) X-Pg-Spam-Score: -1.9 (-) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-docs Precedence: bulk Sender: pgsql-docs-owner@postgresql.org This is a multi-part message in MIME format. --------------080400060803000200050004 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hey, I originally sent this to -hackers but Bruce kindly pointed me this way. Attached is my rework of the backup.sgml. I have only worked on the SQL DUMP and related sections. I left archiving etc... alone. I also didn't do much with pg_dumpall. I focused on adding up to date information and removing conversational tone. The rest is pretty obvious. I am interested feedback before I continue. Yes, I added -C to the basic pg_dump command. Without it, a newbie will get an error on restore. I also removed all redirection etc... I know it is the unix/linux way but flags are what the normies are going to understand. Thanks! JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you. --------------080400060803000200050004 Content-Type: text/x-patch; name="backup.sgml.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="backup.sgml.patch" diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index def43a2..cdc288b 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -6,19 +6,17 @@ backup - As with everything that contains valuable data, PostgreSQL - databases should be backed up regularly. While the procedure is - essentially simple, it is important to have a clear understanding of + PostgreSQL databases should be backed up regularly. + The procedure is simple but it is important to have a clear understanding of the underlying techniques and assumptions. - There are three fundamentally different approaches to backing up + There are four different approaches to backing up PostgreSQL data: SQL dump - File system level backup - Continuous archiving + PITR, Point in Time Recovery Each has its own strengths and weaknesses; each is discussed in turn in the following sections. @@ -28,141 +26,176 @@ <acronym>SQL</> Dump - The idea behind this dump method is to generate a file with SQL - commands that, when fed back to the server, will recreate the - database in the same state as it was at the time of the dump. - PostgreSQL provides the utility program - for this purpose. The basic usage of this - command is: + PostgreSQL provides the program for + generating a backup file with SQL commands that, when fed back to the server, + will recreate the database in the same state as it was at the time of the dump. + The basic usage of is: -pg_dump dbname > outfile +pg_dump -C -F -f - As you see, pg_dump writes its result to the - standard output. We will see below how this can be useful. - While the above command creates a text file, pg_dump - can create files in other formats that allow for parallism and more - fine-grained control of object restoration. + + The use of ensures that the dump file will + contain the requisite command within the dump file. The use of + -F ensures that you are using the plain + text format and the use of -f allows you + to specify the name of the file the dump will be written to. It is also possible + for pg_dump to create files in other formats that allow for parallelism + and fine-grained control of object backup or restoration. For more details on all options + available to pg_dump please refer to the reference page. - pg_dump is a regular PostgreSQL - client application (albeit a particularly clever one). This means - that you can perform this backup procedure from any remote host that has - access to the database. But remember that pg_dump - does not operate with special permissions. In particular, it must - have read access to all tables that you want to back up, so in order - to back up the entire database you almost always have to run it as a - database superuser. (If you do not have sufficient privileges to back up - the entire database, you can still back up portions of the database to which - you do have access using options such as - - or .) + The pg_dump application requires read access to all objects within the + database that it will be operating with. This generally requires database + super-user access. It is possible for any database user to use + pg_dump to backup the objects that they own regardless of super-user access. This + can be achieved using options such as -n + or -t . + - To specify which database server pg_dump should - contact, use the command line options + + + The text files created by pg_dump are internally consistent, + meaning, the dump represents a snapshot of the database at the time + pg_dump began running. pg_dump does not + block other operations on the database while it is working. + (Exceptions are those operations that need to operate with an + exclusive lock, such as most forms of ALTER TABLE.) + Like any other PostgreSQL client application, pg_dump will by default connect with the database user name that is equal to the current operating system user name. To override - this, either specify the option or set the - environment variable PGUSER. Remember that - pg_dump connections are subject to the normal - client authentication mechanisms (which are described in ). + this, either specify the -U option or set the + environment variable PGUSER. + + + Advanced pg_dump + - An important advantage of pg_dump over the other backup - methods described later is that pg_dump's output can - generally be re-loaded into newer versions of PostgreSQL, - whereas file-level backups and continuous archiving are both extremely - server-version-specific. pg_dump is also the only method - that will work when transferring a database to a different machine - architecture, such as going from a 32-bit to a 64-bit server. + The pg_dump application provides other formats. The most notable + are the use of -F or + -F . The use of the + custom format (-F ) + is an excellent option for smaller databases when you need fine grained control of the objects + you chose to restore. The use of the directory format (-F ) allows for parallel connection based backups. + If you are performing a backup with many objects and using pg_dump then the + directory format will be the most efficient. This option also allows for fine + grained control of the objects you chose to restore. - Dumps created by pg_dump are internally consistent, - meaning, the dump represents a snapshot of the database at the time - pg_dump began running. pg_dump does not - block other operations on the database while it is working. - (Exceptions are those operations that need to operate with an - exclusive lock, such as most forms of ALTER TABLE.) + If PostgreSQL was built on a system with the + zlib compression library installed, the custom dump + format will compress data as it writes it to the output file. This will + produce dump file sizes similar to using gzip, but it + has the added advantage that tables can be restored selectively. + + + Backup a single table + + + pg_dump -U -h -F -f -t + + + Using wildcards with table list + + + pg_dump -U -h -F -f -t + + + + + Using parallelism and a wildcard table list + + + pg_dump -U -h -F -f -t -j + + + + + + + The use of the custom or directory pg_dump formats + requires the use of and will not work with . There is + more information on using pg_restore in section . + + + + Restoring the Dump - Text files created by pg_dump are intended to - be read in by the psql program. The - general command form to restore a dump is + The psql application is the default client that ships with + PostgreSQL. It is also the default application used when restoring + text based dumps created by the pg_dump application. For + details information on psql please see . + For the purposes of restoring a dump the basic usage is: + + -psql dbname < infile +psql -f -d - where infile is the - file output by the pg_dump command. The database dbname will not be created by this - command, so you must create it yourself from template0 - before executing psql (e.g., with - createdb -T template0 dbname). psql - supports options similar to pg_dump for specifying - the database server to connect to and the user name to use. See - the reference page for more information. - Non-text file dumps are restored using the utility. - - - Before restoring an SQL dump, all the users who own objects or were - granted permissions on objects in the dumped database must already - exist. If they do not, the restore will fail to recreate the - objects with the original ownership and/or permissions. - (Sometimes this is what you want, but usually it is not.) - - - - By default, the psql script will continue to - execute after an SQL error is encountered. You might wish to run - psql with - the ON_ERROR_STOP variable set to alter that - behavior and have psql exit with an - exit status of 3 if an SQL error occurs: - -psql --set ON_ERROR_STOP=on dbname < infile - - Either way, you will only have a partially restored database. - Alternatively, you can specify that the whole dump should be - restored as a single transaction, so the restore is either fully - completed or fully rolled back. This mode can be specified by - passing the - - - The ability of pg_dump and psql to - write to or read from pipes makes it possible to dump a database - directly from one server to another, for example: - -pg_dump -h host1 dbname | psql -h host2 dbname - + + + If you omitted -C when + executing pg_dump the CREATE DATABASE + command will not be in the text file. You will need to create the database + yourself from template0 before the executing the restore + (e.g., with + createdb -T ). + + + + + + pg_dump does not backup users, roles and + other global objects. To properly backup global objects you must use + with the -g + parameter. If you do not restore the globals before the + text based dump, the database will implicitly restore all objects as the + owner passed by -U + . If -U + is not passed then the operating system user executing psql + will be used. + @@ -177,7 +210,7 @@ pg_dump -h host1 dbname | psql -h h - After restoring a backup, it is wise to run on each database so the query optimizer has useful statistics; see @@ -187,7 +220,60 @@ pg_dump -h host1 dbname | psql -h h linkend="populate">. - + + + Advanced restore + + Using pipes to restore to new server + + + pg_dump -h -d | psql -h -d + + + + + + + If one is using the custom, directory or tar formats the restore command is . + The pg_restore program has many benefits over the use psql including fine grained object restore and parallelism. + + Extracting a text dump from a custom format backup + The following will extract the backup to the standard output. The use + of -F is optional as + pg_restore should be able to detect the format. + + + pg_restore -F + + + + Restoring a single table + + pg_restore -U -h -d -t + + + + Using parallelism to restore databases + The use of parallelism will normally allow databases to restore much + faster than a single connection based restore. The restore will only execute + as quickly as it can restore your largest table but for databases with many + objects it is the fastest pg_dump based restore. + + + pg_restore -U -h -d -t -j + + + + Using <application>pg_dumpall</> @@ -207,13 +293,9 @@ pg_dumpall > outfile psql -f infile postgres - (Actually, you can specify any existing database name to start from, - but if you are loading into an empty cluster then postgres - should usually be used.) It is always necessary to have - database superuser access when restoring a pg_dumpall - dump, as that is required to restore the role and tablespace information. - If you use tablespaces, make sure that the tablespace paths in the - dump are appropriate for the new installation. + It is necessary to have database superuser access when using a + pg_dumpall dump. The superuser acess is required + to restore the role and tablespace information. @@ -230,231 +312,47 @@ psql -f infile postgres This is necessary to fully backup the cluster if running the pg_dump command on individual databases. - + + + If you use tablespaces, make sure that the tablespace paths in the + dump are appropriate for the new installation. + + + Handling Large Databases - - - Some operating systems have maximum file size limits that cause - problems when creating large pg_dump output files. - Fortunately, pg_dump can write to the standard - output, so you can use standard Unix tools to work around this - potential problem. There are several possible methods: - - - - Use compressed dumps. - - You can use your favorite compression program, for example - gzip: - - -pg_dump dbname | gzip > filename.gz - - - Reload with: - - -gunzip -c filename.gz | psql dbname - - - or: - - -cat filename.gz | gunzip | psql dbname - - - - - - Use <command>split</>. - - The split command - allows you to split the output into smaller files that are - acceptable in size to the underlying file system. For example, to - make chunks of 1 megabyte: - - -pg_dump dbname | split -b 1m - filename - - - Reload with: - - -cat filename* | psql dbname - - - - - - Use <application>pg_dump</>'s custom dump format. - - If PostgreSQL was built on a system with the - zlib compression library installed, the custom dump - format will compress data as it writes it to the output file. This will - produce dump file sizes similar to using gzip, but it - has the added advantage that tables can be restored selectively. The - following command dumps a database using the custom dump format: - - -pg_dump -Fc dbname > filename - - - A custom-format dump is not a script for psql, but - instead must be restored with pg_restore, for example: - - -pg_restore -d dbname filename - - - See the and reference pages for details. - - - - For very large databases, you might need to combine split - with one of the other two approaches. + The act of backing up a normal sized database is relatively simple. + The act of backing up a large database (>500GB) can be challenging. + Fortunately, PostgreSQL is very flexible + in its ability to provide a reliable backup. Here is a list of things + you might want to consider when backing up a large database. - - - Use <application>pg_dump</>'s parallel dump feature. - - To speed up the dump of a large database, you can use - pg_dump's parallel mode. This will dump - multiple tables at the same time. You can control the degree of - parallelism with the -j parameter. Parallel dumps - are only supported for the "directory" archive format. - - -pg_dump -j num -F d -f out.dir dbname - - - You can use pg_restore -j to restore a dump in parallel. - This will work for any archive of either the "custom" or the "directory" - archive mode, whether or not it has been created with pg_dump -j. - - - - - - - File System Level Backup - - - An alternative backup strategy is to directly copy the files that - PostgreSQL uses to store the data in the database; - explains where these files - are located. You can use whatever method you prefer - for doing file system backups; for example: - - -tar -cf backup.tar /usr/local/pgsql/data - - - - - There are two restrictions, however, which make this method - impractical, or at least inferior to the pg_dump - method: - - - - The database server must be shut down in order to - get a usable backup. Half-way measures such as disallowing all - connections will not work - (in part because tar and similar tools do not take - an atomic snapshot of the state of the file system, - but also because of internal buffering within the server). - Information about stopping the server can be found in - . Needless to say, you - also need to shut down the server before restoring the data. - - - - - - If you have dug into the details of the file system layout of the - database, you might be tempted to try to back up or restore only certain - individual tables or databases from their respective files or - directories. This will not work because the - information contained in these files is not usable without - the commit log files, - pg_clog/*, which contain the commit status of - all transactions. A table file is only usable with this - information. Of course it is also impossible to restore only a - table and the associated pg_clog data - because that would render all other tables in the database - cluster useless. So file system backups only work for complete - backup and restoration of an entire database cluster. - - + Use the directory + format and the -j + option. This will ensure the quickest and most flexible pg_dump style backup. + + Use continuous archiving as described in + . You can then backup the replica without putting + load on the master. + + Use pg_basebackupas described in . - - - - An alternative file-system backup approach is to make a - consistent snapshot of the data directory, if the - file system supports that functionality (and you are willing to - trust that it is implemented correctly). The typical procedure is - to make a frozen snapshot of the volume containing the - database, then copy the whole data directory (not just parts, see - above) from the snapshot to a backup device, then release the frozen - snapshot. This will work even while the database server is running. - However, a backup created in this way saves - the database files in a state as if the database server was not - properly shut down; therefore, when you start the database server - on the backed-up data, it will think the previous server instance - crashed and will replay the WAL log. This is not a problem; just - be aware of it (and be sure to include the WAL files in your backup). - You can perform a CHECKPOINT before taking the - snapshot to reduce recovery time. - - - - If your database is spread across multiple file systems, there might not - be any way to obtain exactly-simultaneous frozen snapshots of all - the volumes. For example, if your data files and WAL log are on different - disks, or if tablespaces are on different file systems, it might - not be possible to use snapshot backup because the snapshots - must be simultaneous. - Read your file system documentation very carefully before trusting - the consistent-snapshot technique in such situations. - - - - If simultaneous snapshots are not possible, one option is to shut down - the database server long enough to establish all the frozen snapshots. - Another option is to perform a continuous archiving base backup () because such backups are immune to file - system changes during the backup. This requires enabling continuous - archiving just during the backup process; restore is done using - continuous archive recovery (). - - - - Another option is to use rsync to perform a file - system backup. This is done by first running rsync - while the database server is running, then shutting down the database - server long enough to do an rsync --checksum. - ( - - - Note that a file system backup will typically be larger - than an SQL dump. (pg_dump does not need to dump - the contents of indexes for example, just the commands to recreate - them.) However, taking a file system backup might be faster. - + + + The pg_dump methods utilize at least one connection if not many connections + (via -j). They also utilize long running + transactions. This can cause problems with maintenance. If you find that your database + contains a lot of growing bloat consider using a backup method on the master that does + not require pg_dump. + + + Continuous Archiving and Point-in-Time Recovery (PITR) --------------080400060803000200050004 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs --------------080400060803000200050004--