From: Chris Browne <cbbrowne@acm.org>
Subject: Re: [HACKERS] Replication documentation addition
Date: Mon, 30 Oct 2006 12:23:18 -0500
Organization: cbbrowne Computing Inc
Lines: 102
Message-ID: <60k62h4t15.fsf@dba2.int.libertyrms.com>
References: <200610261553.k9QFr9V23851@momjian.us>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
User-Agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.4.19 (linux)
Cancel-Lock: sha1:aAEpJ1hYHDtGD1cuzcWNo/gWh10=
To: pgsql-docs@postgresql.org

bruce@momjian.us (Bruce Momjian) writes:
> With no new additions submitted today, I have moved my text into our
> SGML documentation:
>
> 	http://momjian.us/main/writings/pgsql/sgml/failover.html
>
> Please let me know what additional changes are needed.

It's looking a lot improved to me...

There are still numerous places where it needs s/Slony/Slony-I/g
because there is more than one thing out there called "Slony," only
one of which is the single-master-to-multiple-subscribers-asynchronous
replication system...

<http://momjian.us/main/writings/pgsql/sgml/query-broadcast-load-balancing.html>

"This can be complex to set up because functions like random() and
CURRENT_TIMESTAMP will have different values on different servers, and
sequences should be consistent across servers."

It doesn't make sense to call this "complex to set up."  This problem
isn't about complexity of setup; it is about whether updates are
processed identically on different hosts.  

Perhaps better:

"Query broadcasting can break down such that servers fall out of sync
if the queries have nondeterministic behavior.  For instance,
functions like random(), CURRENT_TIMESTAMP, and
nextval('some_sequence') will take on different values on different
servers.  Care must be taken at the application level to make sure
that queries are all fully deterministic and that they either COMMIT
or ABORT on all servers."

<http://momjian.us/main/writings/pgsql/sgml/clustering-for-load-balancing.html>
"24.6. Clustering For Load Balancing

In clustering, each server can accept write requests, and these write
requests are broadcast from the original server to all other servers
before each transaction commits. Under heavy load, this can cause
excessive locking and performance degradation. It is implemented by
Oracle in their RAC product. PostgreSQL does not offer this type of
load balancing, though PostgreSQL two-phase commit can be used to
implement this in application code or middleware."

Something doesn't feel entirely right here...

How about...

"24.6. Multimaster Replication For Load Balancing

In this scenario, each server can accept write requests, which are
broadcast from the original server to all other servers before each
transaction commits in order to ensure consistency.  Unfortunately,
under heavy load, the cost of distributing locks across servers can
lead to substantial performance degradation. It is implemented by
Oracle in their RAC product. PostgreSQL does not offer this type of
load balancing, though PostgreSQL two-phase commit using <xref
linkend="sql-prepare-transaction-title"> and <xref linkend=
"sql-commit-prepared-title"> may be used to implement this in
application code or middleware.

The communications costs involved in distributing locks and writes
have the result that write operations are considerably more expensive
than they would be on a single server.  In general, the cost of
distributed locking means that this clustering approach is only usable
across a cluster of servers at a local site.  

There will only be a performance "win" if the cluster mostly processes
read-only traffic that the cluster can distribute across a larger
number of database servers.  Write performance generally degrades a
fair bit as compared to using a single database server.  Reliability
should be enhanced since the cluster should be able to continue work
even if some of the members of the cluster should fail."

<http://momjian.us/main/writings/pgsql/sgml/clustering-for-parallel-query-execution.html>

"24.7. Clustering For Parallel Query Execution

This allows multiple servers to work on a single query. One possible
way this could work is for the data to be split among servers and for
each server to execute its part of the query and results sent to a
central server to be combined and returned to the user. There
currently is no PostgreSQL open source solution for this."

This seems a bit thin.

"24.7. Clustering For Parallel Query Execution

This allows multiple servers to work concurrently on a single query,
analagous to the way RAID permits multiple disk drives to respond
concurrently to disk I/O requests.

One way this could work is for the data to be partitioned across the
servers, where each server executes its part of the query, submitting
results to a central server to be combined and returned to the user.
There currently is no PostgreSQL open source solution for this."
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://cbbrowne.com/info/advocacy.html
Why do we put suits in a garment bag, and put garments in a suitcase?