Replication documentation addition

public inbox for [email protected]  
help / color / mirror / Atom feed

Replication documentation addition
117+ messages / 25 participants
[nested] [flat]

* Replication documentation addition
@ 2006-10-24 03:39  Bruce Momjian <[email protected]>
  0 siblings, 3 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-24 03:39 UTC (permalink / raw)
  To: pgsql-docs; +Cc: PostgreSQL-development <[email protected]>

Here is my first draft of a new replication section for our
documentation.  I am looking for any comments.

---------------------------------------------------------------------------

Replication
===========

Database replication allows multiple computers to work together, making
them appear as a single computer to user applications.  This might
involve allowing a backup server to take over if the primary server
fails, or it might involve allowing several computers to work together
at the same time.

It would be ideal if database servers could be combined seamlessly.  Web
servers serving static web pages can be combined quite easily by merely
load-balancing web requests to multiple machines.  In fact, most
read-only servers can be combined relatively easily.

Unfortunately, most database servers have a read/write mix of requests,
and read/write servers are much harder to combine.  This is because
though read-only data has to be placed on each each server only once, a
write to any server has to be seen by all other servers so that future
read requests to those servers return consistent results.  

This "sync problem" is the fundamental difficulty of doing database
replication.  Because there is no single solution that limits the impact
of the sync problem for all workloads, there are multiple replication
solutions.  Each solution addresses the sync problem in a different way,
and minimizes its impact for a specific workload.  

This section first outlines two important replication capabilities, and
then outlines various replication solutions.

Sychronous vs. Asynchronous Replication
---------------------------------------

The term sychronous replication means that a query is not considered
committed unless all servers have access to the committed records.  In
that case, a failover to a backup server will lose no data records. 
Asynchronous replication has a small delay between the time of commit
and its propogation to backup servers, opening the possibility that some
transactions might be lost in a switch to a backup server.  Asynchronous
is used when sychronous replication would be too slow.

Full vs. Partial Replication
----------------------------

The term full replication means only a full database cluster can be
replicated, while partial replication means more fine-grained control
over replicated objects is possible.

Shared Disk Failover
-------------------- 

This replication solution avoids the sync problem by having only one
copy of the database.  This is possible because a single disk array is
shared by multiple servers.  If the main database server fails, the
backup server is able to mount and start the database as though it was
restarting after a database crash.  This shared hardware functionality
is common in network storage devices.  This allows sychronous, full
replication.

Warm Standby Using Point-In-Time Recovery
-----------------------------------------

A warm standby server (add doc xref) can be kept current by reading a
stream of WAL records.  If the main server fails, the warm standby
contains almost all of the data as the main server, and can be used as
the new database server.  This allows asychronous, full replication.

Point-In-Time Recovery  [Asychronous, Full]
----------------------

A Point-In-Time Recovery is the same as a Warm Standby server except
that the standby server must go though a full restore and archive
recovery operation, delaying how quickly it can be used as the main
database server.  This allows asychronous, full replication.

Continuously Running Failover Server
------------------------------------

A continuously running failover server allows the backup server to
answer read-only queries while the master server is running.  It
receives a continuous stream of write activity from the master server. 
Because the failover server can be used for read-only database requests,
it is ideal for data warehouse queries. Slony offers this as
asychronous, partial replication.

Data Partitioning
-----------------

Data partitioning partitions the database into data sets.  To achieve
replication, each data set can only be modified by one server.  For
example, data can be partitioned by main office, e.g. London and Paris. 
While London and Paris servers have all data records, only London can
modify London records, and Paris can only modify Paris records.  Such
partitioning is usually accomplished in application code, though rules
and triggers can help enforce such partitioning and keep the read-only
data sets current.  Slony can also be used in such a setup.  While Slony
replicates only entire tables, London and Paris can be placed in
separate tables, and inheritance can be used to pull from both tables at
the same time.

Query Broadcast Replication
---------------------------

This involves sending write queries to multiple servers.  Read-only
queries can be sent to a single server because there is no need for all
servers to process it.   This can be complex to setup because functions
like random() and CURRENT_TIMESTAMP will have different values on
different servers, and sequences should be consistent across servers.
Pgpool implements this type of replication.

Multi-Master Replication
------------------------

In multi-master replication, each server can accept write requests, and
these write requests are broadcast to all other servers before the
transaction commits.  Under heavy load, this type of replication can
cause excessive locking and performance degradation.  It is implemented
by Oracle in their RAC product.  PostgreSQL does not offer this type of
replication, though PostgreSQL two-phase commit can be used to implement
this in application code.

Performance
-----------
Performance must be considered in any repliacation choice.  There is
usually a tradeoff between functionality and performance.  For example,
full sychronousreplication over a slow network might cut performance by
more than half, while asynchronous replication might have a minimal
performance imact.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-24 03:55  Bruce Momjian <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-24 03:55 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>


Please disregard.  I am redoing it and will post a URL with the most
recent version.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> 
> Here is my first draft of a new replication section for our
> documentation.  I am looking for any comments.
> 
> ---------------------------------------------------------------------------
> 
> Replication
> ===========
> 
> Database replication allows multiple computers to work together, making
> them appear as a single computer to user applications.  This might
> involve allowing a backup server to take over if the primary server
> fails, or it might involve allowing several computers to work together
> at the same time.
> 
> It would be ideal if database servers could be combined seamlessly.  Web
> servers serving static web pages can be combined quite easily by merely
> load-balancing web requests to multiple machines.  In fact, most
> read-only servers can be combined relatively easily.
> 
> Unfortunately, most database servers have a read/write mix of requests,
> and read/write servers are much harder to combine.  This is because
> though read-only data has to be placed on each each server only once, a
> write to any server has to be seen by all other servers so that future
> read requests to those servers return consistent results.  
> 
> This "sync problem" is the fundamental difficulty of doing database
> replication.  Because there is no single solution that limits the impact
> of the sync problem for all workloads, there are multiple replication
> solutions.  Each solution addresses the sync problem in a different way,
> and minimizes its impact for a specific workload.  
> 
> This section first outlines two important replication capabilities, and
> then outlines various replication solutions.
> 
> Sychronous vs. Asynchronous Replication
> ---------------------------------------
> 
> The term sychronous replication means that a query is not considered
> committed unless all servers have access to the committed records.  In
> that case, a failover to a backup server will lose no data records. 
> Asynchronous replication has a small delay between the time of commit
> and its propogation to backup servers, opening the possibility that some
> transactions might be lost in a switch to a backup server.  Asynchronous
> is used when sychronous replication would be too slow.
> 
> Full vs. Partial Replication
> ----------------------------
> 
> The term full replication means only a full database cluster can be
> replicated, while partial replication means more fine-grained control
> over replicated objects is possible.
> 
> Shared Disk Failover
> -------------------- 
> 
> This replication solution avoids the sync problem by having only one
> copy of the database.  This is possible because a single disk array is
> shared by multiple servers.  If the main database server fails, the
> backup server is able to mount and start the database as though it was
> restarting after a database crash.  This shared hardware functionality
> is common in network storage devices.  This allows sychronous, full
> replication.
> 
> Warm Standby Using Point-In-Time Recovery
> -----------------------------------------
> 
> A warm standby server (add doc xref) can be kept current by reading a
> stream of WAL records.  If the main server fails, the warm standby
> contains almost all of the data as the main server, and can be used as
> the new database server.  This allows asychronous, full replication.
> 
> Point-In-Time Recovery  [Asychronous, Full]
> ----------------------
> 
> A Point-In-Time Recovery is the same as a Warm Standby server except
> that the standby server must go though a full restore and archive
> recovery operation, delaying how quickly it can be used as the main
> database server.  This allows asychronous, full replication.
> 
> Continuously Running Failover Server
> ------------------------------------
> 
> A continuously running failover server allows the backup server to
> answer read-only queries while the master server is running.  It
> receives a continuous stream of write activity from the master server. 
> Because the failover server can be used for read-only database requests,
> it is ideal for data warehouse queries. Slony offers this as
> asychronous, partial replication.
> 
> Data Partitioning
> -----------------
> 
> Data partitioning partitions the database into data sets.  To achieve
> replication, each data set can only be modified by one server.  For
> example, data can be partitioned by main office, e.g. London and Paris. 
> While London and Paris servers have all data records, only London can
> modify London records, and Paris can only modify Paris records.  Such
> partitioning is usually accomplished in application code, though rules
> and triggers can help enforce such partitioning and keep the read-only
> data sets current.  Slony can also be used in such a setup.  While Slony
> replicates only entire tables, London and Paris can be placed in
> separate tables, and inheritance can be used to pull from both tables at
> the same time.
> 
> Query Broadcast Replication
> ---------------------------
> 
> This involves sending write queries to multiple servers.  Read-only
> queries can be sent to a single server because there is no need for all
> servers to process it.   This can be complex to setup because functions
> like random() and CURRENT_TIMESTAMP will have different values on
> different servers, and sequences should be consistent across servers.
> Pgpool implements this type of replication.
> 
> Multi-Master Replication
> ------------------------
> 
> In multi-master replication, each server can accept write requests, and
> these write requests are broadcast to all other servers before the
> transaction commits.  Under heavy load, this type of replication can
> cause excessive locking and performance degradation.  It is implemented
> by Oracle in their RAC product.  PostgreSQL does not offer this type of
> replication, though PostgreSQL two-phase commit can be used to implement
> this in application code.
> 
> Performance
> -----------
> Performance must be considered in any repliacation choice.  There is
> usually a tradeoff between functionality and performance.  For example,
> full sychronousreplication over a slow network might cut performance by
> more than half, while asynchronous replication might have a minimal
> performance imact.
> 
> -- 
>   Bruce Momjian   [email protected]
>   EnterpriseDB    http://www.enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Replication documentation addition
@ 2006-10-24 04:20  Bruce Momjian <[email protected]>
  0 siblings, 4 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-24 04:20 UTC (permalink / raw)
  To: pgsql-docs; +Cc: PostgreSQL-development <[email protected]>

Here is a new replication documentation section I want to add for 8.2:

	ftp://momjian.us/pub/postgresql/mypatches/replication

Comments welcomed.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-24 08:26  Markus Schiltknecht <[email protected]>
  parent: Bruce Momjian <[email protected]>
  3 siblings, 1 reply; 117+ messages in thread

From: Markus Schiltknecht @ 2006-10-24 08:26 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

Hello Bruce,

Bruce Momjian wrote:
> Here is a new replication documentation section I want to add for 8.2:
> 
> 	ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> Comments welcomed.

Thank you, that sounds good. It's targeted to production use and 
currently available solutions, which makes sense in the official manual.

You are explaining the sync vs. async categorization, but I sort of 
asked myself where the explanation of single vs multi-master has gone. I 
then realized, that you are talking about read-only and a "read/write 
mix of servers". Then again, you are mentioning 'Multi-Master 
Replication' as one type of replication solutions. I think we should be 
consistent in our naming. As Single- and Multi-Master are the more 
common terms among database replication experts, I'd recommend to use 
them and explain what they mean instead of introducing new names.

Along with that, I'd argue that this Single- or Multi-Master is a 
categorization as Sync vs Async. In that sense, the last chapter should 
probably be named 'Distributed-Shared-Memory Replication' or something 
like that instead of 'Multi-Master Replication', because as we know, 
there are several ways of doing Multi-Master Replication (Slony-II / 
Postgres-R, Distributed Shared Memory, 2PC in application code or the 
above mentioned 'Query Broadcast Replication', which would fall into a 
Multi-Master Replication model as well)

Also in the last chapter, instead of just saying that "PostgreSQL does 
not offer this type of replication", we could probably say that 
different projects are trying to come up with better replication 
solutions. And there are several proprietary products based on 
PostgreSQL which do solve some kinds of Multi-Master Replication. Not 
that I want to advertise for any of them, but it just sounds better than 
the current "no, we don't offer that".

As this documentation mainly covers production-quality solutions (which 
is absolutely perfect), can we document the status of current projects 
somewhere, probably in a wiki? Or at least mention them somewhere and 
point to their websites? It would help to get rid of all those rumors 
and uncertainties. Or are those intentional?

Just my two cents.

Regards

Markus

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-24 13:29  Hannu Krosing <[email protected]>
  parent: Bruce Momjian <[email protected]>
  3 siblings, 3 replies; 117+ messages in thread

From: Hannu Krosing @ 2006-10-24 13:29 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

Ühel kenal päeval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> Here is a new replication documentation section I want to add for 8.2:
> 
> 	ftp://momjian.us/pub/postgresql/mypatches/replication

This is how data partitioning is currently described there

> Data Partitioning
> -----------------
> 
> Data partitioning splits the database into data sets.  To achieve
> replication, each data set can only be modified by one server.  For
> example, data can be partitioned by offices, e.g. London and Paris. 
> While London and Paris servers have all data records, only London can
> modify London records, and Paris can only modify Paris records.  Such
> partitioning is usually accomplished in application code, though rules
> and triggers can help enforce partitioning and keep the read-only data
> sets current.  Slony can also be used in such a setup.  While Slony
> replicates only entire tables, London and Paris can be placed in
> separate tables, and inheritance can be used to access from both tables
> using a single table name.

Maybe another use of partitioning should also be mentioned. That is ,
when partitioning is used to overcome limitations of single servers
(especially IO and memory, but also CPU), and only a subset of data is
stored and processed on each server.

As an example of this type of partitioning you could mention Bizgres MPP
(a PG-based commercial product, http://www.greenplum.com ), which
partitions data to use I/O and CPU of several DB servers for processing
complex OLAP queries, and Pl_Proxy
( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
loads.

I think the "official" term for this kind of "replication" is
Shared-Nothing Clustering.

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com






^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 13:54  Markus Schiltknecht <[email protected]>
  parent: Hannu Krosing <[email protected]>
  2 siblings, 0 replies; 117+ messages in thread

From: Markus Schiltknecht @ 2006-10-24 13:54 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; +Cc: Bruce Momjian <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

Hannu Krosing wrote:
> I think the "official" term for this kind of "replication" is
> Shared-Nothing Clustering.

Well, that's just another distinction for clusters. Most of the time 
it's between Shared-Disk vs. Shared-Nothing. You could also see the very 
Big Irons as a Shared-Everything Cluster.

While it's certainly true, that any kind of data partitioning for 
databases only make sense for Shared-Nothing Clusters, I don't think 
it's a 'kind of replication'. AFAIK most database replication solutions 
are built for Shared-Nothing Clusters. (With the exception of 
PgCluster-II, I think).

Regards

Markus

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 18:37  Josh Berkus <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 1 reply; 117+ messages in thread

From: Josh Berkus @ 2006-10-24 18:37 UTC (permalink / raw)
  To: [email protected]; +Cc: Bruce Momjian <[email protected]>; pgsql-docs

Bruce,

> Here is my first draft of a new replication section for our
> documentation.  I am looking for any comments.

Hmmm ... while the primer on different types of replication is fine, I 
think what users were really looking for is a listing of the different 
replication solutions which are available for PostgreSQL and how to get 
them.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 19:23  Markus Schiltknecht <[email protected]>
  parent: Josh Berkus <[email protected]>
  0 siblings, 2 replies; 117+ messages in thread

From: Markus Schiltknecht @ 2006-10-24 19:23 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

Hello Josh,

Josh Berkus wrote:
> Hmmm ... while the primer on different types of replication is fine, I 
> think what users were really looking for is a listing of the different 
> replication solutions which are available for PostgreSQL and how to get 
> them.

Well, let's see what we have:

* Shared Disk Fail Over
* Warm Standby Using Point-In-Time Recovery
* Point-In-Time Recovery

these first three require quite some configuration, AFAIK there is no 
tool or single solution you can download, install and be happy with. I 
probably wouldn't even call them 'replication solutions'. For me those 
are more like backups with fail-over capability.

* Continuously Running Fail-Over Server

(BTW, what is 'partial replication' supposed to mean here?)
Here we could link to Slony.

* Data Partitioning

Here we can't provide a link, it's just a way to handle the problem in 
the application code.

* Query Broadcast Replication

Here we could link to PgPool.

* Multi-Master Replication
   (or better: Distributed Shared Memory Replication)

No existing solution for PostgreSQL.

Looking at that, I'm a) missing PgCluster and b) arguing that we have to 
admit that we simply can not 'list .. replication solutions ... and how 
to get them' because all of the solutions mentioned need quite some 
knowledge and require a more or less complex installation and configuration.

Regards

Markus

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 19:34  Joshua D. Drake <[email protected]>
  parent: Markus Schiltknecht <[email protected]>
  1 sibling, 2 replies; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-24 19:34 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

> Looking at that, I'm a) missing PgCluster and b) arguing that we have to
> admit that we simply can not 'list .. replication solutions ... and how
> to get them' because all of the solutions mentioned need quite some
> knowledge and require a more or less complex installation and
> configuration.

There is also the question if we should have a sub section:

Closed Source replication solutions:

Mammoth Replicator
Continuent P/Cluster
ExtenDB
Greenplum MPP (although this is kind of horizontal partitioning)

Joshua D. Drake

> 
> Regards
> 
> Markus
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>               http://archives.postgresql.org
> 

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 22:05  Simon Riggs <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Simon Riggs @ 2006-10-24 22:05 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

On Tue, 2006-10-24 at 12:34 -0700, Joshua D. Drake wrote:
> > Looking at that, I'm a) missing PgCluster and b) arguing that we have to
> > admit that we simply can not 'list .. replication solutions ... and how
> > to get them' because all of the solutions mentioned need quite some
> > knowledge and require a more or less complex installation and
> > configuration.
> 
> There is also the question if we should have a sub section:
> 
> Closed Source replication solutions:
> 
> Mammoth Replicator
> Continuent P/Cluster
> ExtenDB
> Greenplum MPP (although this is kind of horizontal partitioning)

Where do you draw the line? You maybe surprised about what other options
that includes. I'm happy to include a whole range of things, but please
be very careful and precise about what you wish for.

There's enough good solutions for open source PostgreSQL that it is easy
and straightforward to limit it to just that. New contributions welcome,
of course.

-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 22:13  Joshua D. Drake <[email protected]>
  parent: Simon Riggs <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-24 22:13 UTC (permalink / raw)
  To: Simon Riggs <[email protected]>; +Cc: Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

Simon Riggs wrote:
> On Tue, 2006-10-24 at 12:34 -0700, Joshua D. Drake wrote:
>>> Looking at that, I'm a) missing PgCluster and b) arguing that we have to
>>> admit that we simply can not 'list .. replication solutions ... and how
>>> to get them' because all of the solutions mentioned need quite some
>>> knowledge and require a more or less complex installation and
>>> configuration.
>> There is also the question if we should have a sub section:
>>
>> Closed Source replication solutions:
>>
>> Mammoth Replicator
>> Continuent P/Cluster
>> ExtenDB
>> Greenplum MPP (although this is kind of horizontal partitioning)
> 
> Where do you draw the line?

Well that is certainly a good question but we do include links to some
of the more prominent closed source software on the website as well.

> You maybe surprised about what other options
> that includes. I'm happy to include a whole range of things, but please
> be very careful and precise about what you wish for.

If it were me, I would say that the replication option has to be
specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
doesn't go in).

Sincerely,

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-24 22:14  Simon Riggs <[email protected]>
  parent: Bruce Momjian <[email protected]>
  3 siblings, 1 reply; 117+ messages in thread

From: Simon Riggs @ 2006-10-24 22:14 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

On Tue, 2006-10-24 at 00:20 -0400, Bruce Momjian wrote:
> Here is a new replication documentation section I want to add for 8.2:
> 
> 	ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> Comments welcomed.

It's a very good start to a complete minefield of competing solutions.

My first thought would be to differentiate between clustering and
replication, which will bring out many differences.

My second thought would be to differentiate between load balancing,
multi-threading, parallel query, high availability and recoverability,
which would probably sort out the true differences in the above mix. But
that wouldn't help most people and almost everybody would find fault.

IMHO most people I've spoken to take "replication" to mean an HA
solution, so perhaps we should cover it in those terms.

-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 22:20  Simon Riggs <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Simon Riggs @ 2006-10-24 22:20 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:

> If it were me, I would say that the replication option has to be
> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
> doesn't go in).

...and how do you define PostgreSQL exactly?

-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 22:33  Joshua D. Drake <[email protected]>
  parent: Simon Riggs <[email protected]>
  0 siblings, 2 replies; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-24 22:33 UTC (permalink / raw)
  To: Simon Riggs <[email protected]>; +Cc: Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

Simon Riggs wrote:
> On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
> 
>> If it were me, I would say that the replication option has to be
>> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
>> doesn't go in).
> 
> ...and how do you define PostgreSQL exactly?

I replication product or software defined to work with only PostgreSQL?

I know there are some other products out there that will work from one
db to another, but I am not sure if those would be considered HA
solutions or migration solutions (which we could certainly document).

Sincerely,

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 23:03  Simon Riggs <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Simon Riggs @ 2006-10-24 23:03 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

On Tue, 2006-10-24 at 15:33 -0700, Joshua D. Drake wrote:
> Simon Riggs wrote:
> > On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
> > 
> >> If it were me, I would say that the replication option has to be
> >> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
> >> doesn't go in).
> > 
> > ...and how do you define PostgreSQL exactly?
> 
> I replication product or software defined to work with only PostgreSQL?

(again)... how do you define PostgreSQL exactly?

-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 23:24  Joshua D. Drake <[email protected]>
  parent: Simon Riggs <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-24 23:24 UTC (permalink / raw)
  To: Simon Riggs <[email protected]>; +Cc: Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

Simon Riggs wrote:
> On Tue, 2006-10-24 at 15:33 -0700, Joshua D. Drake wrote:
>> Simon Riggs wrote:
>>> On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
>>>
>>>> If it were me, I would say that the replication option has to be
>>>> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
>>>> doesn't go in).
>>> ...and how do you define PostgreSQL exactly?
>> I replication product or software defined to work with only PostgreSQL?
> 
> (again)... how do you define PostgreSQL exactly?

What about PostgreSQL is unclear? Is your question do I consider
EnterpriseDB, PostgreSQL? I have no comment on that matter.

Sincerely,

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 23:50  Jim C. Nasby <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 0 replies; 117+ messages in thread

From: Jim C. Nasby @ 2006-10-24 23:50 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

On Mon, Oct 23, 2006 at 11:39:34PM -0400, Bruce Momjian wrote:
> Query Broadcast Replication
> ---------------------------
> 
> This involves sending write queries to multiple servers.  Read-only
> queries can be sent to a single server because there is no need for all
> servers to process it.   This can be complex to setup because functions
> like random() and CURRENT_TIMESTAMP will have different values on
> different servers, and sequences should be consistent across servers.
> Pgpool implements this type of replication.

Isn't there another active project that does this besides pgpool?

It's probably also worth mentioning the commercial replication schemes
that are out there.
-- 
Jim Nasby                                            [email protected]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-24 23:58  Jim C. Nasby <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Jim C. Nasby @ 2006-10-24 23:58 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Simon Riggs <[email protected]>; Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

On Tue, Oct 24, 2006 at 03:33:03PM -0700, Joshua D. Drake wrote:
> Simon Riggs wrote:
> > On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
> > 
> >> If it were me, I would say that the replication option has to be
> >> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
> >> doesn't go in).
> > 
> > ...and how do you define PostgreSQL exactly?
> 
> I replication product or software defined to work with only PostgreSQL?

AFAIK Continuent's product fails that test...

I don't see any reason to exclude things that work with databases other
than PostgreSQL, though I agree that replication that's actually in the
application space (ie: it ties you to TomCat or some other platform)
probably doesn't belong.

My feeling is that people reading this chapter are looking for solutions
and probably don't care as much about how exactly the solution works so
long as it meets their needs.

> I know there are some other products out there that will work from one
> db to another, but I am not sure if those would be considered HA
> solutions or migration solutions (which we could certainly document).
-- 
Jim Nasby                                            [email protected]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 00:13  Joshua D. Drake <[email protected]>
  parent: Jim C. Nasby <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-25 00:13 UTC (permalink / raw)
  To: Jim C. Nasby <[email protected]>; +Cc: Simon Riggs <[email protected]>; Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

Jim C. Nasby wrote:
> On Tue, Oct 24, 2006 at 03:33:03PM -0700, Joshua D. Drake wrote:
>> Simon Riggs wrote:
>>> On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
>>>
>>>> If it were me, I would say that the replication option has to be
>>>> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
>>>> doesn't go in).
>>> ...and how do you define PostgreSQL exactly?
>> I replication product or software defined to work with only PostgreSQL?
>  
> AFAIK Continuent's product fails that test...

To my knowledge, p/cluster only works with PostgreSQL but I could be wrong.

> 
> I don't see any reason to exclude things that work with databases other
> than PostgreSQL, though I agree that replication that's actually in the
> application space (ie: it ties you to TomCat or some other platform)
> probably doesn't belong.

I was just trying to have a defined criteria of some sort. We could fill
up pages and pages of possible replication solutions :)

Joshua D. Drake



-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 00:16  Bruce Momjian <[email protected]>
  parent: Hannu Krosing <[email protected]>
  2 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 00:16 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>


OK, I have updated the URL.  Please let me know how you like it.

---------------------------------------------------------------------------

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > 	ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> This is how data partitioning is currently described there
> 
> > Data Partitioning
> > -----------------
> > 
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris. 
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
> 
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.
> 
> As an example of this type of partitioning you could mention Bizgres MPP
> (a PG-based commercial product, http://www.greenplum.com ), which
> partitions data to use I/O and CPU of several DB servers for processing
> complex OLAP queries, and Pl_Proxy
> ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
> loads.
> 
> I think the "official" term for this kind of "replication" is
> Shared-Nothing Clustering.
> 
> -- 
> ----------------
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
> 
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com
> 

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 00:38  Jeff Frost <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Jeff Frost @ 2006-10-25 00:38 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Jim C. Nasby <[email protected]>; Simon Riggs <[email protected]>; Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; Bruce Momjian <[email protected]>; pgsql-docs

On Tue, 24 Oct 2006, Joshua D. Drake wrote:

>> AFAIK Continuent's product fails that test...
>
> To my knowledge, p/cluster only works with PostgreSQL but I could be wrong.
>

p/cluster was the old name for the PostgreSQL specific version.  It's been 
rebranded as uni/cluster and they have versions for both PostgreSQL and MySQL. 
One of my customers is trying it out currently.

-- 
Jeff Frost, Owner 	<[email protected]>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-25 00:56  Luke Lonergan <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Luke Lonergan @ 2006-10-25 00:56 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; Hannu Krosing <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

Bruce, 

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Bruce Momjian
> Sent: Tuesday, October 24, 2006 5:16 PM
> To: Hannu Krosing
> Cc: PostgreSQL-documentation; PostgreSQL-development
> Subject: Re: [HACKERS] Replication documentation addition
> 
> 
> OK, I have updated the URL.  Please let me know how you like it.

There's a typo on line 8, first paragraph:

"perhaps with only one server allowing write rwork together at the same
time."

Also, consider this wording of the last description:

"Single-Query Clustering..."

Replaced by:

"Shared Nothing Clustering
-----------------------

This allows multiple servers with separate disks to work together on a
each query.
In shared nothing clusters, the work of answering each query is
distributed among
the servers to increase the performance through parallelism.  These
systems will
typically feature high availability by using other forms of replication
internally.

While there are no open source options for this type of clustering,
there are several
commercial products available that implement this approach, making
PostgreSQL achieve
very high performance for multi-Terabyte business intelligence
databases."

- Luke




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-25 02:53  Bruce Momjian <[email protected]>
  parent: Markus Schiltknecht <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 02:53 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>


I have changed the text to reference "fail over" and "load balancing". 
I think it makes it clearer.  Let me know what you think.  I am hesitant
to mention commercial PostgreSQL products in our documentation.

---------------------------------------------------------------------------

Markus Schiltknecht wrote:
> Hello Bruce,
> 
> Bruce Momjian wrote:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > 	ftp://momjian.us/pub/postgresql/mypatches/replication
> > 
> > Comments welcomed.
> 
> Thank you, that sounds good. It's targeted to production use and 
> currently available solutions, which makes sense in the official manual.
> 
> You are explaining the sync vs. async categorization, but I sort of 
> asked myself where the explanation of single vs multi-master has gone. I 
> then realized, that you are talking about read-only and a "read/write 
> mix of servers". Then again, you are mentioning 'Multi-Master 
> Replication' as one type of replication solutions. I think we should be 
> consistent in our naming. As Single- and Multi-Master are the more 
> common terms among database replication experts, I'd recommend to use 
> them and explain what they mean instead of introducing new names.
> 
> Along with that, I'd argue that this Single- or Multi-Master is a 
> categorization as Sync vs Async. In that sense, the last chapter should 
> probably be named 'Distributed-Shared-Memory Replication' or something 
> like that instead of 'Multi-Master Replication', because as we know, 
> there are several ways of doing Multi-Master Replication (Slony-II / 
> Postgres-R, Distributed Shared Memory, 2PC in application code or the 
> above mentioned 'Query Broadcast Replication', which would fall into a 
> Multi-Master Replication model as well)
> 
> Also in the last chapter, instead of just saying that "PostgreSQL does 
> not offer this type of replication", we could probably say that 
> different projects are trying to come up with better replication 
> solutions. And there are several proprietary products based on 
> PostgreSQL which do solve some kinds of Multi-Master Replication. Not 
> that I want to advertise for any of them, but it just sounds better than 
> the current "no, we don't offer that".
> 
> As this documentation mainly covers production-quality solutions (which 
> is absolutely perfect), can we document the status of current projects 
> somewhere, probably in a wiki? Or at least mention them somewhere and 
> point to their websites? It would help to get rid of all those rumors 
> and uncertainties. Or are those intentional?
> 
> Just my two cents.
> 
> Regards
> 
> Markus
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-25 02:54  Bruce Momjian <[email protected]>
  parent: Simon Riggs <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 02:54 UTC (permalink / raw)
  To: Simon Riggs <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

Simon Riggs wrote:
> On Tue, 2006-10-24 at 00:20 -0400, Bruce Momjian wrote:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > 	ftp://momjian.us/pub/postgresql/mypatches/replication
> > 
> > Comments welcomed.
> 
> It's a very good start to a complete minefield of competing solutions.
> 
> My first thought would be to differentiate between clustering and
> replication, which will bring out many differences.

I have gone with "fail-over" and "load balancing" in the updated text.

> My second thought would be to differentiate between load balancing,
> multi-threading, parallel query, high availability and recoverability,
> which would probably sort out the true differences in the above mix. But
> that wouldn't help most people and almost everybody would find fault.

Yep.

> IMHO most people I've spoken to take "replication" to mean an HA
> solution, so perhaps we should cover it in those terms.

Yes, I removed any reference to replication.  It seemed too general.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 02:55  Bruce Momjian <[email protected]>
  parent: Hannu Krosing <[email protected]>
  2 siblings, 3 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 02:55 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>


I have updated the text.  Please let me know what else I should change. 
I am unsure if I should be mentioning commercial PostgreSQL products in
our documentation.

---------------------------------------------------------------------------

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > 	ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> This is how data partitioning is currently described there
> 
> > Data Partitioning
> > -----------------
> > 
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris. 
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
> 
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.
> 
> As an example of this type of partitioning you could mention Bizgres MPP
> (a PG-based commercial product, http://www.greenplum.com ), which
> partitions data to use I/O and CPU of several DB servers for processing
> complex OLAP queries, and Pl_Proxy
> ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
> loads.
> 
> I think the "official" term for this kind of "replication" is
> Shared-Nothing Clustering.
> 
> -- 
> ----------------
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
> 
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 02:56  Bruce Momjian <[email protected]>
  parent: Markus Schiltknecht <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 02:56 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: [email protected]; [email protected]; pgsql-docs

Markus Schiltknecht wrote:
> Looking at that, I'm a) missing PgCluster and b) arguing that we have to 
> admit that we simply can not 'list .. replication solutions ... and how 
> to get them' because all of the solutions mentioned need quite some 
> knowledge and require a more or less complex installation and configuration.

Where is pgcluster in terms of usability?  Should I mention it?

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 02:57  Bruce Momjian <[email protected]>
  parent: Luke Lonergan <[email protected]>
  0 siblings, 2 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 02:57 UTC (permalink / raw)
  To: Luke Lonergan <[email protected]>; +Cc: Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>


I don't think the PostgreSQL documentation should be mentioning
commercial solutions.

---------------------------------------------------------------------------

Luke Lonergan wrote:
> Bruce, 
> 
> > -----Original Message-----
> > From: [email protected] 
> > [mailto:[email protected]] On Behalf Of Bruce Momjian
> > Sent: Tuesday, October 24, 2006 5:16 PM
> > To: Hannu Krosing
> > Cc: PostgreSQL-documentation; PostgreSQL-development
> > Subject: Re: [HACKERS] Replication documentation addition
> > 
> > 
> > OK, I have updated the URL.  Please let me know how you like it.
> 
> There's a typo on line 8, first paragraph:
> 
> "perhaps with only one server allowing write rwork together at the same
> time."
> 
> Also, consider this wording of the last description:
> 
> "Single-Query Clustering..."
> 
> Replaced by:
> 
> "Shared Nothing Clustering
> -----------------------
> 
> This allows multiple servers with separate disks to work together on a
> each query.
> In shared nothing clusters, the work of answering each query is
> distributed among
> the servers to increase the performance through parallelism.  These
> systems will
> typically feature high availability by using other forms of replication
> internally.
> 
> While there are no open source options for this type of clustering,
> there are several
> commercial products available that implement this approach, making
> PostgreSQL achieve
> very high performance for multi-Terabyte business intelligence
> databases."
> 
> - Luke

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 03:02  Bruce Momjian <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 03:02 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Markus Schiltknecht <[email protected]>; [email protected]; [email protected]; pgsql-docs

Joshua D. Drake wrote:
> 
> > Looking at that, I'm a) missing PgCluster and b) arguing that we have to
> > admit that we simply can not 'list .. replication solutions ... and how
> > to get them' because all of the solutions mentioned need quite some
> > knowledge and require a more or less complex installation and
> > configuration.
> 
> There is also the question if we should have a sub section:
> 
> Closed Source replication solutions:
> 
> Mammoth Replicator
> Continuent P/Cluster
> ExtenDB
> Greenplum MPP (although this is kind of horizontal partitioning)

I vote no.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 03:05  Josh Berkus <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 1 reply; 117+ messages in thread

From: Josh Berkus @ 2006-10-25 03:05 UTC (permalink / raw)
  To: [email protected]; +Cc: Bruce Momjian <[email protected]>; Hannu Krosing <[email protected]>; pgsql-docs

Bruce,

> I have updated the text.  Please let me know what else I should change.
> I am unsure if I should be mentioning commercial PostgreSQL products in
> our documentation.

I think you should mention the postgresql-only ones, but just briefly with a 
link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 03:08  Joshua D. Drake <[email protected]>
  parent: Josh Berkus <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-25 03:08 UTC (permalink / raw)
  To: Josh Berkus <[email protected]>; +Cc: [email protected]; Bruce Momjian <[email protected]>; Hannu Krosing <[email protected]>; pgsql-docs

Josh Berkus wrote:
> Bruce,
> 
>> I have updated the text.  Please let me know what else I should change.
>> I am unsure if I should be mentioning commercial PostgreSQL products in
>> our documentation.
> 
> I think you should mention the postgresql-only ones, but just briefly with a 
> link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.

And to further this I would expect that it would be a subsection.. e.g;
a <sect2> or <sect3>. I think the open source version should absolutely
get top billing though.

Sincerely,

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 03:48  Bruce Momjian <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  0 siblings, 2 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 03:48 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Josh Berkus <[email protected]>; [email protected]; Hannu Krosing <[email protected]>; pgsql-docs

Joshua D. Drake wrote:
> Josh Berkus wrote:
> > Bruce,
> > 
> >> I have updated the text.  Please let me know what else I should change.
> >> I am unsure if I should be mentioning commercial PostgreSQL products in
> >> our documentation.
> > 
> > I think you should mention the postgresql-only ones, but just briefly with a 
> > link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.
> 
> And to further this I would expect that it would be a subsection.. e.g;
> a <sect2> or <sect3>. I think the open source version should absolutely
> get top billing though.

I am not inclined to add commercial offerings.  If people wanted
commercial database offerings, they can get them from companies that
advertize.  People are coming to PostgreSQL for open source solutions,
and I think mentioning commercial ones doesn't make sense.

If we are to add them, I need to hear that from people who haven't
worked in PostgreSQL commerical replication companies.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [DOCS] Replication documentation addition
@ 2006-10-25 04:10  Steve Atkins <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Steve Atkins @ 2006-10-25 04:10 UTC (permalink / raw)
  To: [email protected]; pgsql-docs

On Oct 24, 2006, at 8:48 PM, Bruce Momjian wrote:

> Joshua D. Drake wrote:
>> Josh Berkus wrote:
>>> Bruce,
>>>
>>>> I have updated the text.  Please let me know what else I should  
>>>> change.
>>>> I am unsure if I should be mentioning commercial PostgreSQL  
>>>> products in
>>>> our documentation.
>>>
>>> I think you should mention the postgresql-only ones, but just  
>>> briefly with a
>>> link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.
>>
>> And to further this I would expect that it would be a subsection..  
>> e.g;
>> a <sect2> or <sect3>. I think the open source version should  
>> absolutely
>> get top billing though.
>
> I am not inclined to add commercial offerings.  If people wanted
> commercial database offerings, they can get them from companies that
> advertize.  People are coming to PostgreSQL for open source solutions,
> and I think mentioning commercial ones doesn't make sense.
>
> If we are to add them, I need to hear that from people who haven't
> worked in PostgreSQL commerical replication companies.

I'm not coming to PostgreSQL for open source solutions. I'm coming
to PostgreSQL for _good_ solutions.

I want to see what solutions might be available for a problem I have.
I certainly want to know whether they're freely available, commercial
or some flavour of open source, but I'd like to know about all of them.

A big part of the value of Postgresql is the applications and extensions
that support it. Hiding the existence of some subset of those just
because of the way they're licensed is both underselling postgresql
and doing something of a disservice to the user of the document.

Cheers,
   Steve

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 04:20  Bruce Momjian <[email protected]>
  parent: Steve Atkins <[email protected]>
  0 siblings, 3 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 04:20 UTC (permalink / raw)
  To: Steve Atkins <[email protected]>; +Cc: [email protected]; pgsql-docs

Steve Atkins wrote:
> > If we are to add them, I need to hear that from people who haven't
> > worked in PostgreSQL commerical replication companies.
> 
> I'm not coming to PostgreSQL for open source solutions. I'm coming
> to PostgreSQL for _good_ solutions.
> 
> I want to see what solutions might be available for a problem I have.
> I certainly want to know whether they're freely available, commercial
> or some flavour of open source, but I'd like to know about all of them.
> 
> A big part of the value of Postgresql is the applications and extensions
> that support it. Hiding the existence of some subset of those just
> because of the way they're licensed is both underselling postgresql
> and doing something of a disservice to the user of the document.

OK, does that mean we mention EnterpriseDB in the section about Oracle
functions?  Why not mention MS SQL if they have a better solution?  I
just don't see where that line can clearly be drawn on what to include.
Do we mention Netiza, which is loosely based on PostgreSQL?   It just
seems very arbitrary to include commercial software.  If someone wants
to put in on a wiki, I think that would be fine because that doesn't
seems as official.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [DOCS] Replication documentation addition
@ 2006-10-25 04:27  Steve Atkins <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 1 reply; 117+ messages in thread

From: Steve Atkins @ 2006-10-25 04:27 UTC (permalink / raw)
  To: pgsql-docs; [email protected]


On Oct 24, 2006, at 9:20 PM, Bruce Momjian wrote:

> Steve Atkins wrote:
>>> If we are to add them, I need to hear that from people who haven't
>>> worked in PostgreSQL commerical replication companies.
>>
>> I'm not coming to PostgreSQL for open source solutions. I'm coming
>> to PostgreSQL for _good_ solutions.
>>
>> I want to see what solutions might be available for a problem I have.
>> I certainly want to know whether they're freely available, commercial
>> or some flavour of open source, but I'd like to know about all of  
>> them.
>>
>> A big part of the value of Postgresql is the applications and  
>> extensions
>> that support it. Hiding the existence of some subset of those just
>> because of the way they're licensed is both underselling postgresql
>> and doing something of a disservice to the user of the document.
>
> OK, does that mean we mention EnterpriseDB in the section about Oracle
> functions?  Why not mention MS SQL if they have a better solution?  I
> just don't see where that line can clearly be drawn on what to  
> include.
> Do we mention Netiza, which is loosely based on PostgreSQL?   It just
> seems very arbitrary to include commercial software.  If someone wants
> to put in on a wiki, I think that would be fine because that doesn't
> seems as official.

Good question. The line needs to be drawn somewhere. It's basically
your judgement, tempered by other peoples feedback, though. If it
were me, I'd ask myself "Would I mention this product if it were open
source? Would mentioning it help people using the document?".

Cheers,
   Steve




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [DOCS] Replication documentation addition
@ 2006-10-25 06:51  Cesar Suga <[email protected]>
  parent: Steve Atkins <[email protected]>
  0 siblings, 2 replies; 117+ messages in thread

From: Cesar Suga @ 2006-10-25 06:51 UTC (permalink / raw)
  To: Steve Atkins <[email protected]>; +Cc: pgsql-docs; [email protected]

Hi,

I also wrote Bruce about that.

It happens that, if you 'freely advertise' commercial solutions (rather 
than they doing so by other vehicles) you will always happen to be an 
'updater' to the docs if they change their product lines, if they change 
their business model, if and if.

If you cite a commercial solution, as a fair game you should cite *all* 
of them. If one enterprise has the right to be listed in the 
documentation, all of them might, as you will never be favouring one of 
them.

That's the main motivation to write this. Moreover, if there are also 
commercial solutions for high-end installs and they are cited as 
providers to those solutions, it (to a point) disencourages those of 
gathering themselves and writing open source extensions to PostgreSQL.

As Bruce stated, then should the documentation contemplate 
EnterpriseDB's Oracle functions? Should PostgreSQL also come with it? 
Wouldn't it be painful to make, say, another description for an 
alternate product other than EnterpriseDB if it arises?

If people (who read the documentation) professionally work with 
PostgreSQL, they may already have been briefed by those commercial 
offerings in some way.

I think only the source and its tightly coupled (read: can compile along 
with, free as PostgreSQL) components should be packaged into the tarball.

However, I find Bruce's unofficial wiki idea a good one for comparisons.

Regards,
Cesar

Steve Atkins wrote:
>
> On Oct 24, 2006, at 9:20 PM, Bruce Momjian wrote:
>
>> Steve Atkins wrote:
>>>> If we are to add them, I need to hear that from people who haven't
>>>> worked in PostgreSQL commerical replication companies.
>>>
>>> I'm not coming to PostgreSQL for open source solutions. I'm coming
>>> to PostgreSQL for _good_ solutions.
>>>
>>> I want to see what solutions might be available for a problem I have.
>>> I certainly want to know whether they're freely available, commercial
>>> or some flavour of open source, but I'd like to know about all of them.
>>>
>>> A big part of the value of Postgresql is the applications and 
>>> extensions
>>> that support it. Hiding the existence of some subset of those just
>>> because of the way they're licensed is both underselling postgresql
>>> and doing something of a disservice to the user of the document.
>>
>> OK, does that mean we mention EnterpriseDB in the section about Oracle
>> functions?  Why not mention MS SQL if they have a better solution?  I
>> just don't see where that line can clearly be drawn on what to include.
>> Do we mention Netiza, which is loosely based on PostgreSQL?   It just
>> seems very arbitrary to include commercial software.  If someone wants
>> to put in on a wiki, I think that would be fine because that doesn't
>> seems as official.
>
> Good question. The line needs to be drawn somewhere. It's basically
> your judgement, tempered by other peoples feedback, though. If it
> were me, I'd ask myself "Would I mention this product if it were open
> source? Would mentioning it help people using the document?".
>
> Cheers,
>   Steve
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-25 07:37  Hannu Krosing <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Hannu Krosing @ 2006-10-25 07:37 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Luke Lonergan <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

Ühel kenal päeval, T, 2006-10-24 kell 22:57, kirjutas Bruce Momjian:
> I don't think the PostgreSQL documentation should be mentioning
> commercial solutions.

IMNSHO, having commercial solutions based on postgresql which extend
postgres in directions not (yet?) done by core postgres is nothing to be
ashamed of.

And we should at least mention the OSS version of Bizgres as a place
where quite a lot of initial development is done on performance
improvements considered too risky for mainline postgresql.

And if you need a more technical reason, you can use free libpq and psql
to connect to even Bizgres MPP ;)


> ---------------------------------------------------------------------------
> 
> Luke Lonergan wrote:
> > Bruce, 
> > 
> > > -----Original Message-----
> > > From: [email protected] 
> > > [mailto:[email protected]] On Behalf Of Bruce Momjian
> > > Sent: Tuesday, October 24, 2006 5:16 PM
> > > To: Hannu Krosing
> > > Cc: PostgreSQL-documentation; PostgreSQL-development
> > > Subject: Re: [HACKERS] Replication documentation addition
> > > 
> > > 
> > > OK, I have updated the URL.  Please let me know how you like it.
> > 
> > There's a typo on line 8, first paragraph:
> > 
> > "perhaps with only one server allowing write rwork together at the same
> > time."
> > 
> > Also, consider this wording of the last description:
> > 
> > "Single-Query Clustering..."
> > 
> > Replaced by:
> > 
> > "Shared Nothing Clustering
> > -----------------------
> > 
> > This allows multiple servers with separate disks to work together on a
> > each query.
> > In shared nothing clusters, the work of answering each query is
> > distributed among
> > the servers to increase the performance through parallelism.  These
> > systems will
> > typically feature high availability by using other forms of replication
> > internally.
> > 
> > While there are no open source options for this type of clustering,
> > there are several
> > commercial products available that implement this approach, making
> > PostgreSQL achieve
> > very high performance for multi-Terabyte business intelligence
> > databases."
> > 
> > - Luke
> 
-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 09:38  Markus Schiltknecht <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 3 replies; 117+ messages in thread

From: Markus Schiltknecht @ 2006-10-25 09:38 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

Hi,

Bruce Momjian wrote:
> I have updated the text.  Please let me know what else I should change. 
> I am unsure if I should be mentioning commercial PostgreSQL products in
> our documentation.

I support your POV and vote for not including any pointers to commercial 
extensions in the official documentation. If at all, they should go to 
'external-projects.sgml', where PostGIS, PgAdmin and other projects are 
mentioned.

I can't really get excited about the exclusion of the term 
'replication', because it's what most people are looking for. It's a 
well known term. Sorry if it sounded that way, but I've not meant to 
avoid that term.

The newly created terms 'Query Broadcast Load Balancing' or even worse 
'Multi-Master Load Balancing' are more confusing than helpful, because 
these terms do not exist. (See the googlefight in [1])

Can we name the chapter "Fail-over, Load-Balancing and Replication 
Options"? That would fit everything and contain the necessary buzz words.

Also, I'm still missing Multi- vs Single-Master, which are also commonly 
used terms.

IMHO, it does not make sense to speak of a synchronous replication for a 
'Shared Disk Fail Over'. It's not replication, because there's no replica.

The Data Partitioning paragraph should probably mention it's close 
relation with data partitioning across table spaces (and make the 
differences clear).

What you call 'Query Broadcast Load Balancing' is also a multi-master 
replication, thus naming only the later 'Multi-Master Load Balancing' 
misleading.

I'd propose to add a subsection 'Synchronous, Multi-Master Replication' 
and explain the different possibilities on how to do that:

* Query-Based
* with 2PC
* Distributed SHMEM
* (perhaps mention the optimized Postgres-R algorithm ;-)

What you called 'Single-Query Clustering' is probably better known as 
'Parallel Query Execution'. It can be combined with all types of 
replication (every combination of async / sync and Single- / 
Multi-Master). It's maybe load balancing, but it depends on some form of 
replication to distribute the data first.

I liked Chris Browns documentation in [2] which was clearer regarding 
replication (which can be used to do fail-over, load-balancing, 
data-partitioning or parallel query execution). I'd like to keep all 
those things a little more separate to get them clear.

Regards

Markus

[1]: Googlefight: "Multi-Master Load Balancing" vs "Multi-Master 
Replication": http://tinyurl.com/y3k76r

[2]: Chris Browns proposal for a replication documentation:
http://archives.postgresql.org/pgsql-patches/2006-08/msg00026.php

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 10:36  Magnus Hagander <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Magnus Hagander @ 2006-10-25 10:36 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; Luke Lonergan <[email protected]>; +Cc: Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

> I don't think the PostgreSQL documentation should be 
> mentioning commercial solutions.

I think maybe the PostgreSQL documentation should be careful about
trying to list a "complete list" of commercial *or* free solutions.
Instead linking to something on the main website or on techdocs that can
more easily be updated.

//Magnus



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [DOCS] Replication documentation addition
@ 2006-10-25 10:52  Shane Ambler <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 1 reply; 117+ messages in thread

From: Shane Ambler @ 2006-10-25 10:52 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Steve Atkins <[email protected]>; [email protected]; pgsql-docs

Bruce Momjian wrote:

> OK, does that mean we mention EnterpriseDB in the section about Oracle
> functions?  Why not mention MS SQL if they have a better solution?  I
> just don't see where that line can clearly be drawn on what to include.
> Do we mention Netiza, which is loosely based on PostgreSQL?   It just
> seems very arbitrary to include commercial software.  If someone wants
> to put in on a wiki, I think that would be fine because that doesn't
> seems as official.

I agree that the commercial offerings shouldn't be named directly in the 
docs, but it should be mentioned that some commercial options are 
available and a starting point to find more information.

If potential new users look through the docs and it says no options 
available for what they want or consider they will need in the future 
then they go elsewhere, if they know that some options are available 
then they will look further if they want that feature.

something like
"There are currently no open source solutions available for this option 
but there are some commercial offerings. More details of some available 
solutions can be found at postgresql.org/support/...."

-- 

Shane Ambler
[email protected]

Get Sheeky @ http://Sheeky.Biz

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 13:52  Jim C. Nasby <[email protected]>
  parent: Shane Ambler <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Jim C. Nasby @ 2006-10-25 13:52 UTC (permalink / raw)
  To: Shane Ambler <[email protected]>; +Cc: Bruce Momjian <[email protected]>; Steve Atkins <[email protected]>; [email protected]; pgsql-docs

On Wed, Oct 25, 2006 at 08:22:25PM +0930, Shane Ambler wrote:
> Bruce Momjian wrote:
> 
> >OK, does that mean we mention EnterpriseDB in the section about Oracle
> >functions?  Why not mention MS SQL if they have a better solution?  I
> >just don't see where that line can clearly be drawn on what to include.
> >Do we mention Netiza, which is loosely based on PostgreSQL?   It just
> >seems very arbitrary to include commercial software.  If someone wants
> >to put in on a wiki, I think that would be fine because that doesn't
> >seems as official.
> 
> I agree that the commercial offerings shouldn't be named directly in the 
> docs, but it should be mentioned that some commercial options are 
> available and a starting point to find more information.
> 
> If potential new users look through the docs and it says no options 
> available for what they want or consider they will need in the future 
> then they go elsewhere, if they know that some options are available 
> then they will look further if they want that feature.
> 
> something like
> "There are currently no open source solutions available for this option 
> but there are some commercial offerings. More details of some available 
> solutions can be found at postgresql.org/support/...."

I think this is probably the best compromise. Keep in mind that many
people who are looking at us will also be looking at MySQL, which is
itself a commercial offering. It's good to let folks know that with
PostgreSQL, they have more control over how much money they spend for
commercial add-ons and support.
-- 
Jim Nasby                                            [email protected]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 13:57  Jim C. Nasby <[email protected]>
  parent: Markus Schiltknecht <[email protected]>
  2 siblings, 2 replies; 117+ messages in thread

From: Jim C. Nasby @ 2006-10-25 13:57 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: Bruce Momjian <[email protected]>; Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote:
> I can't really get excited about the exclusion of the term 
> 'replication', because it's what most people are looking for. It's a 
> well known term. Sorry if it sounded that way, but I've not meant to 
> avoid that term.
<snip> 
> IMHO, it does not make sense to speak of a synchronous replication for a 
> 'Shared Disk Fail Over'. It's not replication, because there's no replica.

Those to statements are at odds with each other, at least based on
everyone I've ever talked to in a commercial setting. People will use
terms like 'replication', 'HA' or 'clustering' fairly interchangably.
Usually what these folks want is some kind of high-availability
solution. A few are more concerned with scalability. Sometimes it's a
combination of both. That's why I think it's good for the chapter to
deal with both aspects of this.
-- 
Jim Nasby                                            [email protected]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:08  Joshua D. Drake <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-25 14:08 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Josh Berkus <[email protected]>; [email protected]; Hannu Krosing <[email protected]>; pgsql-docs

> 
> I am not inclined to add commercial offerings.  If people wanted
> commercial database offerings, they can get them from companies that
> advertize.  People are coming to PostgreSQL for open source solutions,
> and I think mentioning commercial ones doesn't make sense.
> 
> If we are to add them, I need to hear that from people who haven't
> worked in PostgreSQL commerical replication companies.
> 

You did, Josh Berkus. Secondly, as many people have stated in the past
not one replication suits everyone's needs and as PostgreSQL has many
replication solutions, it only makes sense to list the more prominent
ones, commercial or not.

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:13  Joshua D. Drake <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 1 reply; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-25 14:13 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Steve Atkins <[email protected]>; [email protected]; pgsql-docs

>> A big part of the value of Postgresql is the applications and extensions
>> that support it. Hiding the existence of some subset of those just
>> because of the way they're licensed is both underselling postgresql
>> and doing something of a disservice to the user of the document.
> 
> OK, does that mean we mention EnterpriseDB in the section about Oracle
> functions?

Way to compare apples to houses their Bruce. We are talking about
*PostgreSQL* replication solutions. Not *Oracle* compatibility
functions, However, *if* we had an Oracle compatibility section, I would
say, "Yes it does make sense to list EnterpriseDB as a Proprietary
Commercial solution to migrating from Oracle.

>  Why not mention MS SQL if they have a better solution?

Because we aren't talking about MS SQL, we are talking about PostgreSQL.

>  I
> just don't see where that line can clearly be drawn on what to include.
> Do we mention Netiza, which is loosely based on PostgreSQL?   It just
> seems very arbitrary to include commercial software.

It is no more arbitrary than including *any* information on PostgreSQL
replication solutions, because PostgreSQL doesn't have any.

PostgreSQL doesn't do replication, except for PITR (and that is pushing
it as a replication solution).

Now.. there are *projects* that enable PostgreSQL to do replication.
Some of them are Open Source, some of them are commercial products.

Sincerely,

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:16  Markus Schaber <[email protected]>
  parent: Cesar Suga <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Markus Schaber @ 2006-10-25 14:16 UTC (permalink / raw)
  To: ; +Cc: pgsql-docs; [email protected]

Hi, Cesar,

Cesar Suga wrote:
> If people (who read the documentation) professionally work with
> PostgreSQL, they may already have been briefed by those commercial
> offerings in some way.
> 
> I think only the source and its tightly coupled (read: can compile along
> with, free as PostgreSQL) components should be packaged into the tarball.
> 
> However, I find Bruce's unofficial wiki idea a good one for comparisons.

My suggestion is that the docs should mention only the pure existence of
important third-party packages and projects in those places where it
talks about the deficits that are supposedly fixed by those.

E. G. "There are some third-party packages and projects that aim to
provide multi-master replication, you can search for more information at
http://[unofficial wiki page url] or your favourite search engine.

This way, the docs stay neutral, but point the user to possible
solutions of his problem.

HTH,
Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Attachments:

  [application/pgp-signature] signature.asc (252B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:20  Joshua D. Drake <[email protected]>
  parent: Cesar Suga <[email protected]>
  1 sibling, 2 replies; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-25 14:20 UTC (permalink / raw)
  To: Cesar Suga <[email protected]>; +Cc: Steve Atkins <[email protected]>; pgsql-docs; [email protected]

Cesar Suga wrote:
> Hi,
> 
> I also wrote Bruce about that.
> 
> It happens that, if you 'freely advertise' commercial solutions (rather
> than they doing so by other vehicles) you will always happen to be an
> 'updater' to the docs if they change their product lines, if they change
> their business model, if and if.

That is no different than the open source offerings. We have had several
open source offerings that have died over the years. Replicator, for
example has always been Replicator and has been around longer than any
of the current replication solutions.

> 
> If you cite a commercial solution, as a fair game you should cite *all*
> of them.

No. That doesn't make any sense either. I assume we aren't going to list
all PostgreSQL OSS replication solutions (there are at least a dozen or
more).

You list the ones that are stable in their existence (commercial or not).

> If one enterprise has the right to be listed in the
> documentation, all of them might, as you will never be favouring one of
> them.

You are looking at this the wrong way. This isn't about *any*
enterprise. It is about a PostgreSQL Solution. There happens to be two
or three known working open source solutions, and two or three known
working commercial solutions.

> 
> That's the main motivation to write this. Moreover, if there are also
> commercial solutions for high-end installs and they are cited as
> providers to those solutions, it (to a point) disencourages those of
> gathering themselves and writing open source extensions to PostgreSQL.
>

No it doesn't. Because there is always the, "It want's to be free!" crowd.

> If people (who read the documentation) professionally work with
> PostgreSQL, they may already have been briefed by those commercial
> offerings in some way.

Maybe, maybe not.

Sincerely,

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:21  Bruce Momjian <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 14:21 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Steve Atkins <[email protected]>; [email protected]; pgsql-docs


I would think that companies that sell closed-source solutions for
PostgreSQL would be modest enough not to push their own agenda for the
documentation.  I think they should just sit back and hope others
suggest it.

[ Josh Berkus recently left Green Plum for Sun. ]

---------------------------------------------------------------------------

Joshua D. Drake wrote:
> 
> >> A big part of the value of Postgresql is the applications and extensions
> >> that support it. Hiding the existence of some subset of those just
> >> because of the way they're licensed is both underselling postgresql
> >> and doing something of a disservice to the user of the document.
> > 
> > OK, does that mean we mention EnterpriseDB in the section about Oracle
> > functions?
> 
> Way to compare apples to houses their Bruce. We are talking about
> *PostgreSQL* replication solutions. Not *Oracle* compatibility
> functions, However, *if* we had an Oracle compatibility section, I would
> say, "Yes it does make sense to list EnterpriseDB as a Proprietary
> Commercial solution to migrating from Oracle.
> 
> >  Why not mention MS SQL if they have a better solution?
> 
> Because we aren't talking about MS SQL, we are talking about PostgreSQL.
> 
> >  I
> > just don't see where that line can clearly be drawn on what to include.
> > Do we mention Netiza, which is loosely based on PostgreSQL?   It just
> > seems very arbitrary to include commercial software.
> 
> It is no more arbitrary than including *any* information on PostgreSQL
> replication solutions, because PostgreSQL doesn't have any.
> 
> PostgreSQL doesn't do replication, except for PITR (and that is pushing
> it as a replication solution).
> 
> Now.. there are *projects* that enable PostgreSQL to do replication.
> Some of them are Open Source, some of them are commercial products.
> 
> Sincerely,
> 
> Joshua D. Drake
> 
> 
> -- 
> 
>       === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive  PostgreSQL solutions since 1997
>              http://www.commandprompt.com/
> 
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to [email protected] so that your
>        message can get through to the mailing list cleanly

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:28  Bruce Momjian <[email protected]>
  parent: Hannu Krosing <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 14:28 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; +Cc: Luke Lonergan <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>


I have added this text:

	Commercial Solutions
	--------------------
	
	Because PostgreSQL is open source and easily extended, a number of
	companies have taken PostgreSQL and created commercial closed-source
	solutions with unique failover, replication, and load balancing
	capabilities.


---------------------------------------------------------------------------

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-10-24 kell 22:57, kirjutas Bruce Momjian:
> > I don't think the PostgreSQL documentation should be mentioning
> > commercial solutions.
> 
> IMNSHO, having commercial solutions based on postgresql which extend
> postgres in directions not (yet?) done by core postgres is nothing to be
> ashamed of.
> 
> And we should at least mention the OSS version of Bizgres as a place
> where quite a lot of initial development is done on performance
> improvements considered too risky for mainline postgresql.
> 
> And if you need a more technical reason, you can use free libpq and psql
> to connect to even Bizgres MPP ;)
> 
> 
> > ---------------------------------------------------------------------------
> > 
> > Luke Lonergan wrote:
> > > Bruce, 
> > > 
> > > > -----Original Message-----
> > > > From: [email protected] 
> > > > [mailto:[email protected]] On Behalf Of Bruce Momjian
> > > > Sent: Tuesday, October 24, 2006 5:16 PM
> > > > To: Hannu Krosing
> > > > Cc: PostgreSQL-documentation; PostgreSQL-development
> > > > Subject: Re: [HACKERS] Replication documentation addition
> > > > 
> > > > 
> > > > OK, I have updated the URL.  Please let me know how you like it.
> > > 
> > > There's a typo on line 8, first paragraph:
> > > 
> > > "perhaps with only one server allowing write rwork together at the same
> > > time."
> > > 
> > > Also, consider this wording of the last description:
> > > 
> > > "Single-Query Clustering..."
> > > 
> > > Replaced by:
> > > 
> > > "Shared Nothing Clustering
> > > -----------------------
> > > 
> > > This allows multiple servers with separate disks to work together on a
> > > each query.
> > > In shared nothing clusters, the work of answering each query is
> > > distributed among
> > > the servers to increase the performance through parallelism.  These
> > > systems will
> > > typically feature high availability by using other forms of replication
> > > internally.
> > > 
> > > While there are no open source options for this type of clustering,
> > > there are several
> > > commercial products available that implement this approach, making
> > > PostgreSQL achieve
> > > very high performance for multi-Terabyte business intelligence
> > > databases."
> > > 
> > > - Luke
> > 
> -- 
> ----------------
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
> 
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:30  Joshua D. Drake <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-25 14:30 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Steve Atkins <[email protected]>; [email protected]; pgsql-docs

Bruce Momjian wrote:
> I would think that companies that sell closed-source solutions for
> PostgreSQL would be modest enough not to push their own agenda for the
> documentation.  I think they should just sit back and hope others
> suggest it.
> 
> [ Josh Berkus recently left Green Plum for Sun. ]

Bruce, you are making an idiot of yourself. With this statement you have
implied that Josh Berkus, are core member somehow has his own agenda
that is not in the interests of the PostgreSQL community.

Further that, you are suggesting that I as a member of Command Prompt
has an agenda that is not in the interests of the PostgreSQL community.

It was rude, uncalled for, inaccurate, and frankly disgusting.

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:31  Magnus Hagander <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  1 sibling, 2 replies; 117+ messages in thread

From: Magnus Hagander @ 2006-10-25 14:31 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; Cesar Suga <[email protected]>; +Cc: Steve Atkins <[email protected]>; pgsql-docs; [email protected]

> > I also wrote Bruce about that.
> > 
> > It happens that, if you 'freely advertise' commercial solutions 
> > (rather than they doing so by other vehicles) you will 
> always happen 
> > to be an 'updater' to the docs if they change their product 
> lines, if 
> > they change their business model, if and if.
> 
> That is no different than the open source offerings. We have 
> had several open source offerings that have died over the 
> years. Replicator, for example has always been Replicator and 
> has been around longer than any of the current replication solutions.

I think this is a good reason not to list *any* of the products by name
in the documentation, but instead refer to a page on say techdocs that
can be more easily updated. And that can contain both free and non-free
projects, under clear headlines showing the difference.

The documentation is about PostgreSQL, not about third-party products,
be they free or commercial. Our *website*, however, should give guidance
on which specific products we (as a community) know are stable and
usable along with PostgreSQL (as we do today under downloads, but could
very well do based on specific uses like replication as well)

//Magnus

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:35  Joshua D. Drake <[email protected]>
  parent: Magnus Hagander <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-25 14:35 UTC (permalink / raw)
  To: Magnus Hagander <[email protected]>; +Cc: Cesar Suga <[email protected]>; Steve Atkins <[email protected]>; pgsql-docs; [email protected]


>>> they change their business model, if and if.
>> That is no different than the open source offerings. We have 
>> had several open source offerings that have died over the 
>> years. Replicator, for example has always been Replicator and 
>> has been around longer than any of the current replication solutions.
> 
> I think this is a good reason not to list *any* of the products by name
> in the documentation, but instead refer to a page on say techdocs that
> can be more easily updated. And that can contain both free and non-free
> projects, under clear headlines showing the difference.
> 
> The documentation is about PostgreSQL, not about third-party products,
> be they free or commercial. Our *website*, however, should give guidance
> on which specific products we (as a community) know are stable and
> usable along with PostgreSQL (as we do today under downloads, but could
> very well do based on specific uses like replication as well)
> 

I can agree with this :)

Sincerely,

Joshua D. Drake



-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 14:58  Tom Lane <[email protected]>
  parent: Magnus Hagander <[email protected]>
  1 sibling, 2 replies; 117+ messages in thread

From: Tom Lane @ 2006-10-25 14:58 UTC (permalink / raw)
  To: Magnus Hagander <[email protected]>; +Cc: Joshua D. Drake <[email protected]>; pgsql-docs; [email protected]

"Magnus Hagander" <[email protected]> writes:
> I think this is a good reason not to list *any* of the products by name
> in the documentation, but instead refer to a page on say techdocs that
> can be more easily updated.

I agree with that.  If we have statements about other projects in our
docs, we will have a problem with not being able to update those
statements in a timely fashion when the other projects change.

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 15:35  Joshua D. Drake <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-25 15:35 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Magnus Hagander <[email protected]>; pgsql-docs; [email protected]

Tom Lane wrote:
> "Magnus Hagander" <[email protected]> writes:
>> I think this is a good reason not to list *any* of the products by name
>> in the documentation, but instead refer to a page on say techdocs that
>> can be more easily updated.
> 
> I agree with that.  If we have statements about other projects in our
> docs, we will have a problem with not being able to update those
> statements in a timely fashion when the other projects change.

This being said, I would say that the replication documentation needs to
be on Techdocs or some place similar and that we should have a link in
the PostgreSQL docs that points to the techdocs article and possibly:
http://www.postgresql.org/download/ .

Sincerely,

Joshua D. Drake

> 
> 			regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 15:40  Bruce Momjian <[email protected]>
  parent: Markus Schiltknecht <[email protected]>
  2 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 15:40 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

Markus Schiltknecht wrote:
> Hi,
> 
> Bruce Momjian wrote:
> > I have updated the text.  Please let me know what else I should change. 
> > I am unsure if I should be mentioning commercial PostgreSQL products in
> > our documentation.
> 
> I support your POV and vote for not including any pointers to commercial 
> extensions in the official documentation. If at all, they should go to 
> 'external-projects.sgml', where PostGIS, PgAdmin and other projects are 
> mentioned.
> 
> I can't really get excited about the exclusion of the term 
> 'replication', because it's what most people are looking for. It's a 
> well known term. Sorry if it sounded that way, but I've not meant to 
> avoid that term.

OK, I have re-added the term "replication" as appropriate.

> The newly created terms 'Query Broadcast Load Balancing' or even worse 
> 'Multi-Master Load Balancing' are more confusing than helpful, because 
> these terms do not exist. (See the googlefight in [1])

OK, renamed.

> Can we name the chapter "Fail-over, Load-Balancing and Replication 
> Options"? That would fit everything and contain the necessary buzz words.

Yes. Done, "cluster" added too.

> Also, I'm still missing Multi- vs Single-Master, which are also commonly 
> used terms.

Yea, not sure how to get those in because it somewhat confuses the
"purpose" of the solution.

> IMHO, it does not make sense to speak of a synchronous replication for a 
> 'Shared Disk Fail Over'. It's not replication, because there's no replica.

Agreed.  Modified.

> The Data Partitioning paragraph should probably mention it's close 
> relation with data partitioning across table spaces (and make the 
> differences clear).

Uh, so you I/O load with table spaces.  Uh, that seems too far a reach
to mention here.

> What you call 'Query Broadcast Load Balancing' is also a multi-master 
> replication, thus naming only the later 'Multi-Master Load Balancing' 
> misleading.

Renamed.

> I'd propose to add a subsection 'Synchronous, Multi-Master Replication' 
> and explain the different possibilities on how to do that:
> 
> * Query-Based
> * with 2PC
> * Distributed SHMEM
> * (perhaps mention the optimized Postgres-R algorithm ;-)
> 
> What you called 'Single-Query Clustering' is probably better known as 
> 'Parallel Query Execution'. It can be combined with all types of 
> replication (every combination of async / sync and Single- / 
> Multi-Master). It's maybe load balancing, but it depends on some form of 
> replication to distribute the data first.

Good term.  Added.

> I liked Chris Browns documentation in [2] which was clearer regarding 
> replication (which can be used to do fail-over, load-balancing, 
> data-partitioning or parallel query execution). I'd like to keep all 
> those things a little more separate to get them clear.

Please let me know how you like the new version at the ftp URL.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 15:41  Bruce Momjian <[email protected]>
  parent: Jim C. Nasby <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 15:41 UTC (permalink / raw)
  To: Jim C. Nasby <[email protected]>; +Cc: Markus Schiltknecht <[email protected]>; Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

Jim C. Nasby wrote:
> On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote:
> > I can't really get excited about the exclusion of the term 
> > 'replication', because it's what most people are looking for. It's a 
> > well known term. Sorry if it sounded that way, but I've not meant to 
> > avoid that term.
> <snip> 
> > IMHO, it does not make sense to speak of a synchronous replication for a 
> > 'Shared Disk Fail Over'. It's not replication, because there's no replica.
> 
> Those to statements are at odds with each other, at least based on
> everyone I've ever talked to in a commercial setting. People will use
> terms like 'replication', 'HA' or 'clustering' fairly interchangably.
> Usually what these folks want is some kind of high-availability
> solution. A few are more concerned with scalability. Sometimes it's a
> combination of both. That's why I think it's good for the chapter to
> deal with both aspects of this.

OK, I did break it out somewhat for clarity.  Let me know how it looks
now.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 15:43  Markus Schiltknecht <[email protected]>
  parent: Jim C. Nasby <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Markus Schiltknecht @ 2006-10-25 15:43 UTC (permalink / raw)
  To: Jim C. Nasby <[email protected]>; +Cc: Bruce Momjian <[email protected]>; Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

Hi,

Jim C. Nasby wrote:
> Those to statements are at odds with each other, at least based on
> everyone I've ever talked to in a commercial setting. People will use
> terms like 'replication', 'HA' or 'clustering' fairly interchangably.
> Usually what these folks want is some kind of high-availability
> solution. A few are more concerned with scalability. Sometimes it's a
> combination of both. That's why I think it's good for the chapter to
> deal with both aspects of this.

Yabut... at least the PostgreSQL manual should uses the terms correctly.

And while I do perfectly agree that it's a fail-over solution and it 
should be mentioned in that section, I'm arguing that it's not replication.

Regards

Markus

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 15:44  Bruce Momjian <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 15:44 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Magnus Hagander <[email protected]>; Joshua D. Drake <[email protected]>; pgsql-docs; [email protected]

Tom Lane wrote:
> "Magnus Hagander" <[email protected]> writes:
> > I think this is a good reason not to list *any* of the products by name
> > in the documentation, but instead refer to a page on say techdocs that
> > can be more easily updated.
> 
> I agree with that.  If we have statements about other projects in our
> docs, we will have a problem with not being able to update those
> statements in a timely fashion when the other projects change.

I mention only Slony and pgpool as examples of replication types.  They
seem to have risen to high enough visiblity to do that. I have not
mentioned any other solutions.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 16:00  Joshua D. Drake <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-25 16:00 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Tom Lane <[email protected]>; Magnus Hagander <[email protected]>; pgsql-docs; [email protected]

Bruce Momjian wrote:
> Tom Lane wrote:
>> "Magnus Hagander" <[email protected]> writes:
>>> I think this is a good reason not to list *any* of the products by name
>>> in the documentation, but instead refer to a page on say techdocs that
>>> can be more easily updated.
>> I agree with that.  If we have statements about other projects in our
>> docs, we will have a problem with not being able to update those
>> statements in a timely fashion when the other projects change.
> 
> I mention only Slony and pgpool as examples of replication types.  They
> seem to have risen to high enough visiblity to do that. I have not
> mentioned any other solutions.

What about Slony-II or pgpool2? Which are fundamentally different from
their v1 counterparts (o.k. slony-ii isn't out yet but still).

I +1 that we move to have all of the replication documentation pushed to
techdocs or other facility and just have a link from the docs.

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 16:02  Bruce Momjian <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 16:02 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Tom Lane <[email protected]>; Magnus Hagander <[email protected]>; pgsql-docs; [email protected]

Joshua D. Drake wrote:
> Bruce Momjian wrote:
> > Tom Lane wrote:
> >> "Magnus Hagander" <[email protected]> writes:
> >>> I think this is a good reason not to list *any* of the products by name
> >>> in the documentation, but instead refer to a page on say techdocs that
> >>> can be more easily updated.
> >> I agree with that.  If we have statements about other projects in our
> >> docs, we will have a problem with not being able to update those
> >> statements in a timely fashion when the other projects change.
> > 
> > I mention only Slony and pgpool as examples of replication types.  They
> > seem to have risen to high enough visiblity to do that. I have not
> > mentioned any other solutions.
> 
> What about Slony-II or pgpool2? Which are fundamentally different from
> their v1 counterparts (o.k. slony-ii isn't out yet but still).
> 
> I +1 that we move to have all of the replication documentation pushed to
> techdocs or other facility and just have a link from the docs.

What I did was to mention Slony and pgpool as "examples", so people
realize there are many other soluions.  It would be good to have a
companion web site that could list them all, both open source and
commercial.  That is going to take a lot more work, but I think would
have great value, especially since our documentation will clearly
outline the terms.  What you don't want to do is to throw up a list and
have people try to figure out what solutions they cover.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 16:20  David Fetter <[email protected]>
  parent: Markus Schiltknecht <[email protected]>
  2 siblings, 1 reply; 117+ messages in thread

From: David Fetter @ 2006-10-25 16:20 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: Bruce Momjian <[email protected]>; Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote:

> Can we name the chapter "Fail-over, Load-Balancing and Replication 
> Options"? That would fit everything and contain the necessary buzz words.
...

> IMHO, it does not make sense to speak of a synchronous replication for a 
> 'Shared Disk Fail Over'. It's not replication, because there's no replica.

As you point out, there is no replica of the data, but there is some
protection against machine failure, which puts it firmly in the
"Fail-over" part above.

Cheers,
D
-- 
David Fetter <[email protected]> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-25 16:24  Richard Troy <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 1 reply; 117+ messages in thread

From: Richard Troy @ 2006-10-25 16:24 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; Bruce Momjian <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

Hi Hannu, everyone,

I apologize for not having read the document in question - will do
shortly. My comments are brought about by the dialogue I read on list this
morning...

> > Here is a new replication documentation section I want to add for 8.2:
> >
> >     ftp://momjian.us/pub/postgresql/mypatches/replication
>

> > Data Partitioning
> > -----------------
> >
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris.
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
>
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.

> > I think the "official" term for this kind of "replication" is
> > Shared-Nothing Clustering.

"Data partitioning" has two fundamental flavors, "horizontal" and
"vertical", quite a handful of implementations, and even more motivations
behind why one uses either strategy and whatever implementation. The same
is true for "clustering" - a few fundamental strategies, with a larger
number of implementations and yet more motivations. Replication,
meanwhile, is yet another beast altogether, sharing the same fundamentals
of multiple flavors, implementations and motivations. … I strongly urge
keeping any documentation on these (and related) topics strictly distinct
and separate.

In my view, one should define the terms first, separately, distinctly, and
as succinctly as possible, and, following this, a dialogue on how these
may be combined can be entertained. The definitions of each should be both
complete and academic in flavor and may include implementation and
motivational  information, but never "muddy the water" by mixing with
other concepts - not yet, not until after all the fundamentals have been
introduced.

I don't know much about what PostgreSql has been doing in these areas of
late - nothing, I gather from someone's post this morning - but I'll try
to help out as I can with a paragraph or two - whatever you want,
whatever's welcome - as "I was there" when Randy Eash created the first
commercial RDBMS replicator - for Ingres - and since I created the first
commercial RDBMS front-end failover technology, also for Ingres, so I have
a pretty good handle on all the issues.

Also, I liked what Markus Schiltknecht wrote, but will have to read the
original before I can comment on his specific points.

>> I am not inclined to add commercial offerings.  If people wanted
>> commercial database offerings, they can get them from companies that
>> advertize.  People are coming to PostgreSQL for open source solutions,
>> and I think mentioning commercial ones doesn't make sense.
>>
>> If we are to add them, I need to hear that from people who haven't
>> worked in PostgreSQL commerical replication companies.
>
> I'm not coming to PostgreSQL for open source solutions. I'm coming
> to PostgreSQL for _good_ solutions.
>
> I want to see what solutions might be available for a problem I have.
> I certainly want to know whether they're freely available, commercial
> or some flavour of open source, but I'd like to know about all of them.
>
> A big part of the value of Postgresql is the applications and extensions
> that support it. Hiding the existence of some subset of those just
> because of the way they're licensed is both underselling postgresql
> and doing something of a disservice to the user of the document.

> If potential new users look through the docs and it says no options
> available for what they want or consider they will need in the future
> then they go elsewhere, if they know that some options are available
> then they will look further if they want that feature.

I agree that people look through the materials on the web site,
documentation especially, and make choices based upon what they see. Many
of us don't have time to spend a day searching the web for things we don't
even know exist. By including more information, more users will be
attracted to PostgreSql, whether it be in the documentation or web site. I
have been SURE that certain things must exist in the PG world, but haven't
known about them with certainty due to time constraints, but would gladly
point our customers at Postgres solutions if only I knew about them. Count
this paragraph as praise for doing _something_more_ to help get more
information to (prospective) users.

Consider someone like me; my company supports five RDBMSes, one of them
being Postgres. We are probably not unique in that we've written an SQL
dialect translator so we could write our own code in one code line to run
anywhere, against any RDBMS (it can learn new dialects) - or perhaps
others keep multiple code lines containing varriant dialects. Either way,
we "don't care" whether our customer has Oracle, or PostgreSql, so long as
they buy our stuff. But when our customers - or prospects - come to us
with a given scenario, the more we know about Postgres - and its community
- the more likely we can steer them to a PG solution, which we would
prefer anyway, for lots of reasons, historical, personal, and technical -
not to mention cost. The trouble is, Oracle, for example, has already told
them (sold them?) on whatever, and we need a rebuttal ready at hand or
they'll go with Oracle. We just don't have the time to fight that battle,
nor do we wish to risk the sale when we can work with Oracle just fine.

In sum, I agree with Tom Lane and the others who chimed in with "keep the
docs clean, use the web site for mentioning other projects/products." And
again I applaud this new effort.

Regards,
Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[email protected], http://ScienceTools.com/

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 16:28  Bruce Momjian <[email protected]>
  parent: David Fetter <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 16:28 UTC (permalink / raw)
  To: David Fetter <[email protected]>; +Cc: Markus Schiltknecht <[email protected]>; Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

David Fetter wrote:
> On Wed, Oct 25, 2006 at 11:38:11AM +0200, Markus Schiltknecht wrote:
> 
> > Can we name the chapter "Fail-over, Load-Balancing and Replication 
> > Options"? That would fit everything and contain the necessary buzz words.
> ...
> 
> > IMHO, it does not make sense to speak of a synchronous replication for a 
> > 'Shared Disk Fail Over'. It's not replication, because there's no replica.
> 
> As you point out, there is no replica of the data, but there is some
> protection against machine failure, which puts it firmly in the
> "Fail-over" part above.

Right, but his point was not to call it synchronous.  I have fixed that
in the current version.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [DOCS] Replication documentation addition
@ 2006-10-25 16:49  Casey Duncan <[email protected]>
  parent: Magnus Hagander <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Casey Duncan @ 2006-10-25 16:49 UTC (permalink / raw)
  To: Magnus Hagander <[email protected]>; +Cc: Bruce Momjian <[email protected]>; Luke Lonergan <[email protected]>; Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

Totally agree. The docs will tend to outlive whatever projects or  
websites they mention. Best to not bake that into stone.

-Casey

On Oct 25, 2006, at 3:36 AM, Magnus Hagander wrote:

>> I don't think the PostgreSQL documentation should be
>> mentioning commercial solutions.
>
> I think maybe the PostgreSQL documentation should be careful about
> trying to list a "complete list" of commercial *or* free solutions.
> Instead linking to something on the main website or on techdocs  
> that can
> more easily be updated.
>
> //Magnus
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq





^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-25 18:33  Alexey Klyukin <[email protected]>
  parent: Bruce Momjian <[email protected]>
  3 siblings, 1 reply; 117+ messages in thread

From: Alexey Klyukin @ 2006-10-25 18:33 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

Hi,

A typo:
("a write to any server has to be _propogated_")
s/propogated/propagated

Bruce Momjian wrote:
> Here is a new replication documentation section I want to add for 8.2:
>
> 	ftp://momjian.us/pub/postgresql/mypatches/replication
>
> Comments welcomed.
>
>   
-- 
Regards,

Alexey Klyukin		alexk(at)vollmond.org.ua
Simferopol, Crimea, Ukraine.




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-25 18:40  Richard Troy <[email protected]>
  parent: Richard Troy <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Richard Troy @ 2006-10-25 18:40 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; Bruce Momjian <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

> Here is a new replication documentation section I want to add for 8.2:
>
>     ftp://momjian.us/pub/postgresql/mypatches/replication
>

...Read the document, as promissed...

First paragraph, "(fail over)" is inconsistent with title, "failover", as
are other spots throughout the document. The whole document should be
consistent and I vote for "failover" and not "fail over."

Fourth paragraph, "This "sync problem" is the fundamental difficulty for
servers working together"; "Sync problem" hasn't been defined. Actually,
you're talking about the consistent attribute of the "acid" properties of
all competent databases: Atomic, Consistency, Isolation, and Durability.
At least define the term you are using - probably most easily done in the
preceeding paragraph.

The fifth paragraph needs a lot more help, I think. Howabout this
alternative:

So called "two phaised commit" was developed as a strategy in which two or
more databases are updated simultaneously and none of the data is
committed until all are committed. This guarantees consistency between the
databases with all propagation delay being absorbed by the writer at write
time. There are times when this propagation delay is large, so sometimes
alternatives are worked out which we'll call here "asynchronous updates,"
however, in these cases, there is always a window of time in which some
transaction can be lost should a failure occurr. For this reason,
asynchronous updates are only used when the possibility of such losses is
acceptible.

Paragraphs six through to "shared disk failover" seem very awkward to me.
I don't like them at all.

"Shared disk failover" has nothing to do with "the sync problem" as it's
not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue.
Further, it also has nothing to do with disk arrays, though it is often
used with RAID to help avoid disk based corruption problems.

The point about Warm Standby needs to include a warning about WAL that it
MUST be sensitive to the semantics of the database design or else it's
fatally flawed. I'm talking about "referential integrety". That is to say,
it's inappropriate to capture updates on a table by table basis, as some
such systems do, (I have no idea what's done by anyone in the PG world on
this right now) because an update to one table (esp. inserts) very often
go hand in glove with updates in other tables and to get one without the
other can corrupt a database.

The description of "Continuously running replication server" should
include the critical caveat - repeated if you think it's already said
elsewhere - that it is ONLY suitable for applications in which a loss of
(missing) update data doesn't matter. For example, an airline reservation
system would be an inappropriate application for such a "solution" because
what seats are available cannot be guaranteed to be correct.

Regarding data partitioning, I strongly disagree with the opening sentence
in that it doesn't split a database into sets, it splits tables into sets.
Data partitioning is often done within a single database on a single
server and therefore, as a concept, has nothing whatsoever to do with
different servers. Similarly, the second paragraph of this section is
problematic. Please define your term first, then talk about some
implementations - this is muddying the water. Further, there are both
vertical and horizontal partitioning - you mention neither - and each has
its own distinct uses. If partitioning is mentioned, it should be more
complete.

Next, Query Broadcast Load Balancing... also needs a lot of work. First,
it's foremost in my memory that sending read queries everywhere and
returning the first result set back is a key way to improve application
performance at the cost of additional load on other systems - I guess
that's not at all what the document is after here, but it's a worthy part
of a dialogue on broadcasting queries. In other words, this has more parts
to it than just what the document now entertains. Secondly, the document
doesn't address _at_all_ whether this is a two-phaise-commit environment
or not. If not, how are updates managed? If each server operates
independently and one of them fails, what do you do then? How do you know
_any_ server got an insert/update? ...  Each server _can't_ operate
independently unless the application does its own insert/update commits to
every one of them - and that can't be fast, nor does it load balance,
though it may contribute to superior uptime performance by the
application.

Next up; I'm not aware of any current products or projects that provide
parallel query execution, though Informix might - I can ask a colleague or
two. Either way, it's probably best to simply define the term (perhaps in
a little more detail), and not mention solutions - they change with time
anyway.

While I've never used Oracle's clustering tools, I've read up on them and
have customers who use them, and I think this description of Oracle
clustering is a mis-read on what the Oracle system actually does. A check
with a true Oracle clustering expert is in order here.

Hope this helps. If asked, I'm willing to (re)write some of the bits
discussed above.

Regards,
Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[email protected], http://ScienceTools.com/

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 18:41  Bruce Momjian <[email protected]>
  parent: Alexey Klyukin <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 18:41 UTC (permalink / raw)
  To: Alexey Klyukin <[email protected]>; +Cc: pgsql-docs; PostgreSQL-development <[email protected]>

Alexey Klyukin wrote:
> Hi,
> 
> A typo:
> ("a write to any server has to be _propogated_")
> s/propogated/propagated

Thanks, fixed.

---------------------------------------------------------------------------


> 
> Bruce Momjian wrote:
> > Here is a new replication documentation section I want to add for 8.2:
> >
> > 	ftp://momjian.us/pub/postgresql/mypatches/replication
> >
> > Comments welcomed.
> >
> >   
> -- 
> Regards,
> 
> Alexey Klyukin		alexk(at)vollmond.org.ua
> Simferopol, Crimea, Ukraine.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 18:59  Josh Berkus <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Josh Berkus @ 2006-10-25 18:59 UTC (permalink / raw)
  To: [email protected]; +Cc: Bruce Momjian <[email protected]>; Alexey Klyukin <[email protected]>; pgsql-docs

Bruce,

> > > 	ftp://momjian.us/pub/postgresql/mypatches/replication

I'm still not seeing anything in this patch that tells users where they can 
get replication solutions for PostgreSQL, either OSS or commercial.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 19:31  Bruce Momjian <[email protected]>
  parent: Richard Troy <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 19:31 UTC (permalink / raw)
  To: Richard Troy <[email protected]>; +Cc: Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

Richard Troy wrote:
> 
> > Here is a new replication documentation section I want to add for 8.2:
> >
> >     ftp://momjian.us/pub/postgresql/mypatches/replication
> >
> 
> ...Read the document, as promissed...
> 
> First paragraph, "(fail over)" is inconsistent with title, "failover", as
> are other spots throughout the document. The whole document should be
> consistent and I vote for "failover" and not "fail over."

OK.  Fixed to "failover"

> Fourth paragraph, "This "sync problem" is the fundamental difficulty for
> servers working together"; "Sync problem" hasn't been defined. Actually,
> you're talking about the consistent attribute of the "acid" properties of
> all competent databases: Atomic, Consistency, Isolation, and Durability.
> At least define the term you are using - probably most easily done in the
> preceeding paragraph.

OK, "sync problem" term removed, and spelled out fully.

> The fifth paragraph needs a lot more help, I think. Howabout this
> alternative:
> 
> So called "two phaised commit" was developed as a strategy in which two or
> more databases are updated simultaneously and none of the data is
> committed until all are committed. This guarantees consistency between the
> databases with all propagation delay being absorbed by the writer at write
> time. There are times when this propagation delay is large, so sometimes
> alternatives are worked out which we'll call here "asynchronous updates,"
> however, in these cases, there is always a window of time in which some
> transaction can be lost should a failure occurr. For this reason,
> asynchronous updates are only used when the possibility of such losses is
> acceptible.

I have modified the paragraph to use some of your terms.

> Paragraphs six through to "shared disk failover" seem very awkward to me.
> I don't like them at all.
> 
> "Shared disk failover" has nothing to do with "the sync problem" as it's
> not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue.
> Further, it also has nothing to do with disk arrays, though it is often
> used with RAID to help avoid disk based corruption problems.

Yes, please see updated version.  I removed the sync problem term from
there.

> The point about Warm Standby needs to include a warning about WAL that it
> MUST be sensitive to the semantics of the database design or else it's
> fatally flawed. I'm talking about "referential integrety". That is to say,
> it's inappropriate to capture updates on a table by table basis, as some
> such systems do, (I have no idea what's done by anyone in the PG world on
> this right now) because an update to one table (esp. inserts) very often
> go hand in glove with updates in other tables and to get one without the
> other can corrupt a database.

We don't have that problem.  We recover only full transactions.

> The description of "Continuously running replication server" should
> include the critical caveat - repeated if you think it's already said
> elsewhere - that it is ONLY suitable for applications in which a loss of
> (missing) update data doesn't matter. For example, an airline reservation
> system would be an inappropriate application for such a "solution" because
> what seats are available cannot be guaranteed to be correct.

I have added note about data loss for the Slony item.

> Regarding data partitioning, I strongly disagree with the opening sentence
> in that it doesn't split a database into sets, it splits tables into sets.

OK, changed.

> Data partitioning is often done within a single database on a single
> server and therefore, as a concept, has nothing whatsoever to do with
> different servers. Similarly, the second paragraph of this section is

Uh, why would someone split things up like that on a single server?

> problematic. Please define your term first, then talk about some
> implementations - this is muddying the water. Further, there are both
> vertical and horizontal partitioning - you mention neither - and each has
> its own distinct uses. If partitioning is mentioned, it should be more
> complete.

Uh, what exactly needs to be defined.

> Next, Query Broadcast Load Balancing... also needs a lot of work. First,
> it's foremost in my memory that sending read queries everywhere and
> returning the first result set back is a key way to improve application
> performance at the cost of additional load on other systems - I guess
> that's not at all what the document is after here, but it's a worthy part
> of a dialogue on broadcasting queries. In other words, this has more parts
> to it than just what the document now entertains. Secondly, the document

Uh, do we want to go into that here?  I guess I could.

> doesn't address _at_all_ whether this is a two-phaise-commit environment
> or not. If not, how are updates managed? If each server operates
> independently and one of them fails, what do you do then? How do you know
> _any_ server got an insert/update? ...  Each server _can't_ operate
> independently unless the application does its own insert/update commits to
> every one of them - and that can't be fast, nor does it load balance,
> though it may contribute to superior uptime performance by the
> application.

I think having the application middle layer do the commits is how it
works now.  Can someone explain how pgpool works, or should we mention
how two-phase commit has to be done here?  pgpool2 has additional
features.

> Next up; I'm not aware of any current products or projects that provide
> parallel query execution, though Informix might - I can ask a colleague or
> two. Either way, it's probably best to simply define the term (perhaps in
> a little more detail), and not mention solutions - they change with time
> anyway.

Actually, Bizgres MPP, based on PostgreSQL, does this, but mostly for
read-only queries.

> While I've never used Oracle's clustering tools, I've read up on them and
> have customers who use them, and I think this description of Oracle
> clustering is a mis-read on what the Oracle system actually does. A check
> with a true Oracle clustering expert is in order here.

OK, would someone please comment?

> Hope this helps. If asked, I'm willing to (re)write some of the bits
> discussed above.

Yes, please review the URL and let me know what else to change.  Thanks.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 19:32  Bruce Momjian <[email protected]>
  parent: Josh Berkus <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 19:32 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]; Alexey Klyukin <[email protected]>; pgsql-docs

Josh Berkus wrote:
> Bruce,
> 
> > > > 	ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> I'm still not seeing anything in this patch that tells users where they can 
> get replication solutions for PostgreSQL, either OSS or commercial.

It isn't designed for that.  It is designed for people to understand
what they want, and then they can look around for solutions.  I think
most agree we don't want a list of solutions in the documentation,
though I have a few as examples.  Also, some of the solutions don't
require software, but just configuration or special hardware.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [DOCS] Replication documentation addition
@ 2006-10-25 20:34  Dawid Kuroczko <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Dawid Kuroczko @ 2006-10-25 20:34 UTC (permalink / raw)
  To: pgsql-docs; [email protected]

On 10/25/06, Bruce Momjian <[email protected]> wrote:
> Joshua D. Drake wrote:
> > Bruce Momjian wrote:
> > > Tom Lane wrote:
> > >> "Magnus Hagander" <[email protected]> writes:
> > >>> I think this is a good reason not to list *any* of the products by name
> > >>> in the documentation, but instead refer to a page on say techdocs that
> > >>> can be more easily updated.
> > >> I agree with that.  If we have statements about other projects in our
> > >> docs, we will have a problem with not being able to update those
> > >> statements in a timely fashion when the other projects change.
> > >
> > > I mention only Slony and pgpool as examples of replication types.  They
> > > seem to have risen to high enough visiblity to do that. I have not
> > > mentioned any other solutions.
> >
> > What about Slony-II or pgpool2? Which are fundamentally different from
> > their v1 counterparts (o.k. slony-ii isn't out yet but still).
> >
> > I +1 that we move to have all of the replication documentation pushed to
> > techdocs or other facility and just have a link from the docs.
>
> What I did was to mention Slony and pgpool as "examples", so people
> realize there are many other soluions.  It would be good to have a
> companion web site that could list them all, both open source and
> commercial.  That is going to take a lot more work, but I think would
> have great value, especially since our documentation will clearly
> outline the terms.  What you don't want to do is to throw up a list and
> have people try to figure out what solutions they cover.

I'm in quite an unique situation right now, working with a few DBAs
who have deep knowledge but no PostgreSQL background, so I have
a good view how PostgreSQL is perceived by people with fair knowledge
of other databases.

What I have noticed is a deep respect for community.  If they ask about
replication solution, and I tell about Slony, they ask if Slony is provided
with the postgresql-contrib. Well... no, and it won't be.  Then they look
back, think a while and say somethig on the lines of: well, $SOME_OTHER
_DATABASE was using external replication solutions so it is all right.

But then, before I talked with them, they did some quick research on
PostgreSQL and their perception was that there's no replication / replication
is shady in PostgreSQL.  It would be quite convenient to tell them:
"No replication? Did you actually read the manual? <here goes URL>"
Well, pointing them to slony page is a solution but of a lesser caliber
(how should they know about Slony anyway? They are newbies).
Pointing them at The Documentation is a Good Argument (and it may
cause them to look for some other information, like SQL syntax or
PostgreSQL-specific catalog views there, which is Good).

Enough background.

Bruce, I've read Your documentation and I was left a bit with a feeling
that it's a bit too generic.  It's almost as if it could be about just about
any major database, not PostgreSQL specific.  I feel that, when I'm
reading PostgreSQL docs I would like to know how to set up multi-master
replication with PostgreSQL not an explanation what a multi-master
replication is. It's not about the actual documentation content, but rather
on accents distribution.  Now it is something like: "These are the types
of replication solutions possible, some of them can be done with PostgreSQL",
I think it should be rather: "With PostgreSQL and some third-party tools you
can achieve such and such replication solutions, oh and by the way, research
is done on such and such replication method, but it's not a production quality
yet".

And I try to think as my DBA-mates would do if they read the documentation,
I'm not sure they would end up enlighted after reading the docs -- thay would
probably say: "hey, I knew that, it's well structured there, but I
still don't know
what should I use", or maybe "where can I read something about this slony
thing anyway?".

It may be my "closed thinking schema" though.  What I feel is that such
outsider, after reading these docs should end with "Aha! I should be using
Slony for my purposes".  Or pgpool, if it's what she needs.  I believe Tom's
remark that it does NOT belong in the PostgreSQL documentation is quite
right (though I wish there IS some reference to external replication packages,
mainly because over and over again I need to prove PostgreSQL CAN be
replicated, and it's not uncommon).  However I'm still unconvinced about
TechDocs -- TechDocs are good but still they are a bit scattered and
unorganised.  I am a PostgreSQL enthusiast, but it took me a while to
learn about them, and for newbies not biased towards PostgreSQL it may
take even more time.  If it is linked from within the documentation, random
DBAs might read it, and I wish they do.

Right now I am more and more biased towards an additional "documentation
book" for PostgreSQL, something like "DBA guide" or handbook.  In format
similar to the PostgreSQL documentation, but inside oriented around
configuring other tools around and together with PostgreSQL.  I shall send
here some drafts withing 10-days time to seed a discussion.  After all,
PostgreSQL is too big for just one documentation book. [1]

   Regards,
      Dawid

[1]: Then, later, a programmer's handbook?  Deeper knowledge about fancy
stuff with Python, Perl and PgSQL? ;-)

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 20:36  Josh Berkus <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 2 replies; 117+ messages in thread

From: Josh Berkus @ 2006-10-25 20:36 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: [email protected]; Alexey Klyukin <[email protected]>; pgsql-docs

Bruce,

> It isn't designed for that.  It is designed for people to understand
> what they want, and then they can look around for solutions.  I think
> most agree we don't want a list of solutions in the documentation,
> though I have a few as examples.  

Do they?   I've seen no discussion of the matter.  I think we should have 
them.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 20:37  Bruce Momjian <[email protected]>
  parent: Josh Berkus <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 20:37 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]; Alexey Klyukin <[email protected]>; pgsql-docs

Josh Berkus wrote:
> Bruce,
> 
> > It isn't designed for that.  It is designed for people to understand
> > what they want, and then they can look around for solutions.  I think
> > most agree we don't want a list of solutions in the documentation,
> > though I have a few as examples.  
> 
> Do they?   I've seen no discussion of the matter.  I think we should have 
> them.

Most people didn't want a list because there is no way to keep it
current in the docs, and a secondary web site was suggested for the
list.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 20:42  Bruce Momjian <[email protected]>
  parent: Dawid Kuroczko <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 20:42 UTC (permalink / raw)
  To: Dawid Kuroczko <[email protected]>; +Cc: pgsql-docs; [email protected]

Dawid Kuroczko wrote:
> Bruce, I've read Your documentation and I was left a bit with a feeling
> that it's a bit too generic.  It's almost as if it could be about just about
> any major database, not PostgreSQL specific.  I feel that, when I'm
> reading PostgreSQL docs I would like to know how to set up multi-master
> replication with PostgreSQL not an explanation what a multi-master
> replication is. It's not about the actual documentation content, but rather
> on accents distribution.  Now it is something like: "These are the types
> of replication solutions possible, some of them can be done with PostgreSQL",
> I think it should be rather: "With PostgreSQL and some third-party tools you
> can achieve such and such replication solutions, oh and by the way, research
> is done on such and such replication method, but it's not a production quality
> yet".
> 
> And I try to think as my DBA-mates would do if they read the documentation,
> I'm not sure they would end up enlighted after reading the docs -- thay would
> probably say: "hey, I knew that, it's well structured there, but I
> still don't know
> what should I use", or maybe "where can I read something about this slony
> thing anyway?".

Well, the idea is to have a web site that lists all the solutions that
can be updated regularly, perhaps using the categories from the
documentation.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 20:59  Josh Berkus <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Josh Berkus @ 2006-10-25 20:59 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: [email protected]; Alexey Klyukin <[email protected]>; pgsql-docs

Bruce,

> Most people didn't want a list because there is no way to keep it
> current in the docs, and a secondary web site was suggested for the
> list.

So, like www.postgresql.org/docs/techdocs/replication?   That would work.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 21:46  Bruce Momjian <[email protected]>
  parent: Josh Berkus <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-25 21:46 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]; Alexey Klyukin <[email protected]>; pgsql-docs

Josh Berkus wrote:
> Bruce,
> 
> > Most people didn't want a list because there is no way to keep it
> > current in the docs, and a secondary web site was suggested for the
> > list.
> 
> So, like www.postgresql.org/docs/techdocs/replication?   That would work.

Yes.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-25 23:49  Jim C. Nasby <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Jim C. Nasby @ 2006-10-25 23:49 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Dawid Kuroczko <[email protected]>; pgsql-docs; [email protected]

On Wed, Oct 25, 2006 at 04:42:17PM -0400, Bruce Momjian wrote:
> Dawid Kuroczko wrote:
> > Bruce, I've read Your documentation and I was left a bit with a feeling
> > that it's a bit too generic.  It's almost as if it could be about just about
> > any major database, not PostgreSQL specific.  I feel that, when I'm
> > reading PostgreSQL docs I would like to know how to set up multi-master
> > replication with PostgreSQL not an explanation what a multi-master
> > replication is. It's not about the actual documentation content, but rather
> > on accents distribution.  Now it is something like: "These are the types
> > of replication solutions possible, some of them can be done with PostgreSQL",
> > I think it should be rather: "With PostgreSQL and some third-party tools you
> > can achieve such and such replication solutions, oh and by the way, research
> > is done on such and such replication method, but it's not a production quality
> > yet".
> > 
> > And I try to think as my DBA-mates would do if they read the documentation,
> > I'm not sure they would end up enlighted after reading the docs -- thay would
> > probably say: "hey, I knew that, it's well structured there, but I
> > still don't know
> > what should I use", or maybe "where can I read something about this slony
> > thing anyway?".
> 
> Well, the idea is to have a web site that lists all the solutions that
> can be updated regularly, perhaps using the categories from the
> documentation.

And the docs should point to that page, prominently (presumably that
will happen after the page actually exists).

Something else worth doing though is to have a paragraph explaining why
there's no built-in replication. I don't have time to write something
right now, but I can do it later tonight if no one beats me to it.
-- 
Jim Nasby                                            [email protected]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-26 00:42  Bruce Momjian <[email protected]>
  parent: Jim C. Nasby <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-26 00:42 UTC (permalink / raw)
  To: Jim C. Nasby <[email protected]>; +Cc: Dawid Kuroczko <[email protected]>; pgsql-docs; [email protected]

Jim C. Nasby wrote:
> Something else worth doing though is to have a paragraph explaining why
> there's no built-in replication. I don't have time to write something
> right now, but I can do it later tonight if no one beats me to it.

I thought that was implied in the early paragraph about why there are
many solutions.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [DOCS] Replication documentation addition
@ 2006-10-26 02:08  Cesar Suga <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Cesar Suga @ 2006-10-26 02:08 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; [email protected]

Joshua D. Drake wrote:
> Cesar Suga wrote:
>   
>> Hi,
>>
>> I also wrote Bruce about that.
>>
>> It happens that, if you 'freely advertise' commercial solutions (rather
>> than they doing so by other vehicles) you will always happen to be an
>> 'updater' to the docs if they change their product lines, if they change
>> their business model, if and if.
>>     
>
> That is no different than the open source offerings. We have had several
> open source offerings that have died over the years. Replicator, for
> example has always been Replicator and has been around longer than any
> of the current replication solutions.
>   
The documentation comes with the open source tarball.

I would welcome if the docs point to an unofficial wiki (maintained 
externally from authoritative PostgreSQL developers) or a website 
listing them and giving a brief of each solution.

postgresql.org already does this for events (commercial training!) and 
news. Point to postgresql.org/download/commercial as there *already* are 
brief descriptions, pricing and website links.
>> If you cite a commercial solution, as a fair game you should cite *all*
>> of them.
>>     
>
> No. That doesn't make any sense either. I assume we aren't going to list
> all PostgreSQL OSS replication solutions (there are at least a dozen or
> more).
>
> You list the ones that are stable in their existence (commercial or not).
>   
And how would you determine it? Years of existance? Contribution to 
PostgreSQL's source code? It is not easy and wouldn't be fair. There are 
ones that certainly will be listed, and other doubtful ones (which would 
perhaps complain, that's why I said 'all' - if they are not stable, 
either they stay out of the market or fix their problems).
>> If one enterprise has the right to be listed in the
>> documentation, all of them might, as you will never be favouring one of
>> them.
>>     
>
> You are looking at this the wrong way. This isn't about *any*
> enterprise. It is about a PostgreSQL Solution. There happens to be two
> or three known working open source solutions, and two or three known
> working commercial solutions.
>   
(see first three paragraphs)
>> That's the main motivation to write this. Moreover, if there are also
>> commercial solutions for high-end installs and they are cited as
>> providers to those solutions, it (to a point) disencourages those of
>> gathering themselves and writing open source extensions to PostgreSQL.
>>     
>
> No it doesn't. Because there is always the, "It want's to be free!" crowd.
>   
Yes, I agree there are. But also development in *that* cutting-edge is 
scarce. It feels that something had filled the gap if you list some 
commercial solution, mainly people in the trenches (DBAs). They would, 
obviously, firstly seek the commercial solutions as they are interested. 
So they click 'commercial products' in the main website.
>> If people (who read the documentation) professionally work with
>> PostgreSQL, they may already have been briefed by those commercial
>> offerings in some way.
>>     
>
> Maybe, maybe not.
>
> Sincerely,
>
> Joshua D. Drake
>   
And I agree with your point, still. However, that would open a precedent 
for people to have to maintain lists of stable software in every 
documentation area.

Regards,
Cesar




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-26 14:45  Andrew Sullivan <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Andrew Sullivan @ 2006-10-26 14:45 UTC (permalink / raw)
  To: [email protected]

On Wed, Oct 25, 2006 at 05:46:33PM -0400, Bruce Momjian wrote:
> Josh Berkus wrote:
> > So, like www.postgresql.org/docs/techdocs/replication?   That would work.
> 
> Yes.

I like that idea, but I think that the URL needs to be decided upon,
needs to be stable, and needs to be put into the docs.  (I don't see
it ATM, I guess because the URL isn't chosen yet?)  We get so many
questions about "what replication system" that I'm sure people are
looking for outlines.

A

-- 
Andrew Sullivan  | [email protected]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
                --Brad Holland



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-26 15:53  Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-10-26 15:53 UTC (permalink / raw)
  To: PostgreSQL-development <[email protected]>; pgsql-docs; +Cc: Richard Troy <[email protected]>; Hannu Krosing <[email protected]>


With no new additions submitted today, I have moved my text into our
SGML documentation:

	http://momjian.us/main/writings/pgsql/sgml/failover.html

Please let me know what additional changes are needed.

---------------------------------------------------------------------------

bruce wrote:
> Richard Troy wrote:
> > 
> > > Here is a new replication documentation section I want to add for 8.2:
> > >
> > >     ftp://momjian.us/pub/postgresql/mypatches/replication
> > >
> > 
> > ...Read the document, as promissed...
> > 
> > First paragraph, "(fail over)" is inconsistent with title, "failover", as
> > are other spots throughout the document. The whole document should be
> > consistent and I vote for "failover" and not "fail over."
> 
> OK.  Fixed to "failover"
> 
> > Fourth paragraph, "This "sync problem" is the fundamental difficulty for
> > servers working together"; "Sync problem" hasn't been defined. Actually,
> > you're talking about the consistent attribute of the "acid" properties of
> > all competent databases: Atomic, Consistency, Isolation, and Durability.
> > At least define the term you are using - probably most easily done in the
> > preceeding paragraph.
> 
> OK, "sync problem" term removed, and spelled out fully.
> 
> > The fifth paragraph needs a lot more help, I think. Howabout this
> > alternative:
> > 
> > So called "two phaised commit" was developed as a strategy in which two or
> > more databases are updated simultaneously and none of the data is
> > committed until all are committed. This guarantees consistency between the
> > databases with all propagation delay being absorbed by the writer at write
> > time. There are times when this propagation delay is large, so sometimes
> > alternatives are worked out which we'll call here "asynchronous updates,"
> > however, in these cases, there is always a window of time in which some
> > transaction can be lost should a failure occurr. For this reason,
> > asynchronous updates are only used when the possibility of such losses is
> > acceptible.
> 
> I have modified the paragraph to use some of your terms.
> 
> > Paragraphs six through to "shared disk failover" seem very awkward to me.
> > I don't like them at all.
> > 
> > "Shared disk failover" has nothing to do with "the sync problem" as it's
> > not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue.
> > Further, it also has nothing to do with disk arrays, though it is often
> > used with RAID to help avoid disk based corruption problems.
> 
> Yes, please see updated version.  I removed the sync problem term from
> there.
> 
> > The point about Warm Standby needs to include a warning about WAL that it
> > MUST be sensitive to the semantics of the database design or else it's
> > fatally flawed. I'm talking about "referential integrety". That is to say,
> > it's inappropriate to capture updates on a table by table basis, as some
> > such systems do, (I have no idea what's done by anyone in the PG world on
> > this right now) because an update to one table (esp. inserts) very often
> > go hand in glove with updates in other tables and to get one without the
> > other can corrupt a database.
> 
> We don't have that problem.  We recover only full transactions.
> 
> > The description of "Continuously running replication server" should
> > include the critical caveat - repeated if you think it's already said
> > elsewhere - that it is ONLY suitable for applications in which a loss of
> > (missing) update data doesn't matter. For example, an airline reservation
> > system would be an inappropriate application for such a "solution" because
> > what seats are available cannot be guaranteed to be correct.
> 
> I have added note about data loss for the Slony item.
> 
> > Regarding data partitioning, I strongly disagree with the opening sentence
> > in that it doesn't split a database into sets, it splits tables into sets.
> 
> OK, changed.
> 
> > Data partitioning is often done within a single database on a single
> > server and therefore, as a concept, has nothing whatsoever to do with
> > different servers. Similarly, the second paragraph of this section is
> 
> Uh, why would someone split things up like that on a single server?
> 
> > problematic. Please define your term first, then talk about some
> > implementations - this is muddying the water. Further, there are both
> > vertical and horizontal partitioning - you mention neither - and each has
> > its own distinct uses. If partitioning is mentioned, it should be more
> > complete.
> 
> Uh, what exactly needs to be defined.
> 
> > Next, Query Broadcast Load Balancing... also needs a lot of work. First,
> > it's foremost in my memory that sending read queries everywhere and
> > returning the first result set back is a key way to improve application
> > performance at the cost of additional load on other systems - I guess
> > that's not at all what the document is after here, but it's a worthy part
> > of a dialogue on broadcasting queries. In other words, this has more parts
> > to it than just what the document now entertains. Secondly, the document
> 
> Uh, do we want to go into that here?  I guess I could.
> 
> > doesn't address _at_all_ whether this is a two-phaise-commit environment
> > or not. If not, how are updates managed? If each server operates
> > independently and one of them fails, what do you do then? How do you know
> > _any_ server got an insert/update? ...  Each server _can't_ operate
> > independently unless the application does its own insert/update commits to
> > every one of them - and that can't be fast, nor does it load balance,
> > though it may contribute to superior uptime performance by the
> > application.
> 
> I think having the application middle layer do the commits is how it
> works now.  Can someone explain how pgpool works, or should we mention
> how two-phase commit has to be done here?  pgpool2 has additional
> features.
> 
> > Next up; I'm not aware of any current products or projects that provide
> > parallel query execution, though Informix might - I can ask a colleague or
> > two. Either way, it's probably best to simply define the term (perhaps in
> > a little more detail), and not mention solutions - they change with time
> > anyway.
> 
> Actually, Bizgres MPP, based on PostgreSQL, does this, but mostly for
> read-only queries.
> 
> > While I've never used Oracle's clustering tools, I've read up on them and
> > have customers who use them, and I think this description of Oracle
> > clustering is a mis-read on what the Oracle system actually does. A check
> > with a true Oracle clustering expert is in order here.
> 
> OK, would someone please comment?
> 
> > Hope this helps. If asked, I'm willing to (re)write some of the bits
> > discussed above.
> 
> Yes, please review the URL and let me know what else to change.  Thanks.
> 
> -- 
>   Bruce Momjian   [email protected]
>   EnterpriseDB    http://www.enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-26 15:55  Jim C. Nasby <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Jim C. Nasby @ 2006-10-26 15:55 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Dawid Kuroczko <[email protected]>; pgsql-docs; [email protected]

On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > Something else worth doing though is to have a paragraph explaining why
> > there's no built-in replication. I don't have time to write something
> > right now, but I can do it later tonight if no one beats me to it.
> 
> I thought that was implied in the early paragraph about why there are
> many solutions.

I think we should explicitely spell it out, especially considering how
many times people ask about it. How about...

 This multitude of choices is why PostgreSQL does not ship with a
 replication solution by default; any bundled solution would only
 satisfy a subset of replication needs.

(sorry for the non-standard patch, but anoncvs isn't sync'd up yet).
-- 
Jim Nasby                                            [email protected]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

*** failover.sgml.org	Thu Oct 26 10:32:45 2006
--- failover.sgml	Thu Oct 26 10:55:03 2006
***************
*** 29,35 ****
    working together.  Because there is no single solution that eliminates
    the impact of the sync problem for all use cases, there are multiple
    solutions.  Each solution addresses this problem in a different way, and
!   minimizes its impact for a specific workload.
   </para>
  
   <para>
--- 29,37 ----
    working together.  Because there is no single solution that eliminates
    the impact of the sync problem for all use cases, there are multiple
    solutions.  Each solution addresses this problem in a different way, and
!   minimizes its impact for a specific workload. This multitude of choices is
!   why PostgreSQL does not ship with a replication solution by default; any
!   bundled solution would only satisfy a subset of replication needs.
   </para>
  
   <para>


Attachments:

  [text/plain] patch (911B, 2-patch)
  download | inline:
*** failover.sgml.org	Thu Oct 26 10:32:45 2006
--- failover.sgml	Thu Oct 26 10:55:03 2006
***************
*** 29,35 ****
    working together.  Because there is no single solution that eliminates
    the impact of the sync problem for all use cases, there are multiple
    solutions.  Each solution addresses this problem in a different way, and
!   minimizes its impact for a specific workload.
   </para>
  
   <para>
--- 29,37 ----
    working together.  Because there is no single solution that eliminates
    the impact of the sync problem for all use cases, there are multiple
    solutions.  Each solution addresses this problem in a different way, and
!   minimizes its impact for a specific workload. This multitude of choices is
!   why PostgreSQL does not ship with a replication solution by default; any
!   bundled solution would only satisfy a subset of replication needs.
   </para>
  
   <para>

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-26 15:59  Bruce Momjian <[email protected]>
  parent: Jim C. Nasby <[email protected]>
  0 siblings, 2 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-26 15:59 UTC (permalink / raw)
  To: Jim C. Nasby <[email protected]>; +Cc: Dawid Kuroczko <[email protected]>; pgsql-docs; [email protected]

Jim C. Nasby wrote:
> On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote:
> > Jim C. Nasby wrote:
> > > Something else worth doing though is to have a paragraph explaining why
> > > there's no built-in replication. I don't have time to write something
> > > right now, but I can do it later tonight if no one beats me to it.
> > 
> > I thought that was implied in the early paragraph about why there are
> > many solutions.
> 
> I think we should explicitely spell it out, especially considering how
> many times people ask about it. How about...
> 
>  This multitude of choices is why PostgreSQL does not ship with a
>  replication solution by default; any bundled solution would only
>  satisfy a subset of replication needs.

The problem is that we do have some solutions in our code, like doing
data partitioning in the application, warm standby, or using a shared
disk for failover, so how do we spell that out?  I say there are
multiple solutions, but I don't see how I can say that all are external
and not included.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-26 16:19  Joshua D. Drake <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Joshua D. Drake @ 2006-10-26 16:19 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Jim C. Nasby <[email protected]>; Dawid Kuroczko <[email protected]>; pgsql-docs; [email protected]

Bruce Momjian wrote:
> Jim C. Nasby wrote:
>> On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote:
>>> Jim C. Nasby wrote:
>>>> Something else worth doing though is to have a paragraph explaining why
>>>> there's no built-in replication. I don't have time to write something
>>>> right now, but I can do it later tonight if no one beats me to it.
>>> I thought that was implied in the early paragraph about why there are
>>> many solutions.
>> I think we should explicitely spell it out, especially considering how
>> many times people ask about it. How about...
>>
>>  This multitude of choices is why PostgreSQL does not ship with a
>>  replication solution by default; any bundled solution would only
>>  satisfy a subset of replication needs.
> 
> The problem is that we do have some solutions in our code, like doing
> data partitioning in the application, warm standby, or using a shared
> disk for failover, so how do we spell that out?  I say there are
> multiple solutions, but I don't see how I can say that all are external
> and not included.

None of those are replication solutions. So I would have to agree with
Jim here.

This isn't about what people do with their app, so that is not relevant.

Warm standby is PITR which is a backup and recovery solution. It does
not include a failover solution and is *not* replication. It technically
does not provide an HA solution either as it will be almost always
farther behind than a replication solution.

Shared disk for failover could be used by anything it isn't special to a
replication scenario it is standard for many HA.

Sincerely,

Joshua D. Drake

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-26 16:21  Bruce Momjian <[email protected]>
  parent: Joshua D. Drake <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-26 16:21 UTC (permalink / raw)
  To: Joshua D. Drake <[email protected]>; +Cc: Jim C. Nasby <[email protected]>; Dawid Kuroczko <[email protected]>; pgsql-docs; [email protected]

Joshua D. Drake wrote:
> Bruce Momjian wrote:
> > Jim C. Nasby wrote:
> >> On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote:
> >>> Jim C. Nasby wrote:
> >>>> Something else worth doing though is to have a paragraph explaining why
> >>>> there's no built-in replication. I don't have time to write something
> >>>> right now, but I can do it later tonight if no one beats me to it.
> >>> I thought that was implied in the early paragraph about why there are
> >>> many solutions.
> >> I think we should explicitely spell it out, especially considering how
> >> many times people ask about it. How about...
> >>
> >>  This multitude of choices is why PostgreSQL does not ship with a
> >>  replication solution by default; any bundled solution would only
> >>  satisfy a subset of replication needs.
> > 
> > The problem is that we do have some solutions in our code, like doing
> > data partitioning in the application, warm standby, or using a shared
> > disk for failover, so how do we spell that out?  I say there are
> > multiple solutions, but I don't see how I can say that all are external
> > and not included.
> 
> None of those are replication solutions. So I would have to agree with
> Jim here.
> 
> This isn't about what people do with their app, so that is not relevant.
> 
> Warm standby is PITR which is a backup and recovery solution. It does
> not include a failover solution and is *not* replication. It technically
> does not provide an HA solution either as it will be almost always
> farther behind than a replication solution.
> 
> Shared disk for failover could be used by anything it isn't special to a
> replication scenario it is standard for many HA.

The section is no longer titled only "replication", but is now
"Failover, Replication, Load Balancing, and Clustering Options", so it
is more a catch-all, and hence saying nothing is included doesn't make
sense.  You could say no "replication" is included, but replication is
only one part of the section, so where do you put that, and why is it
worth it?

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-26 17:07  Richard Troy <[email protected]>
  parent: Josh Berkus <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Richard Troy @ 2006-10-26 17:07 UTC (permalink / raw)
  To: Josh Berkus <[email protected]>; +Cc: Bruce Momjian <[email protected]>; [email protected]; Alexey Klyukin <[email protected]>; pgsql-docs


On Wed, 25 Oct 2006, Josh Berkus wrote:
>
> Bruce,
>
> > It isn't designed for that.  It is designed for people to understand
> > what they want, and then they can look around for solutions.  I think
> > most agree we don't want a list of solutions in the documentation,
> > though I have a few as examples.
>
> Do they?   I've seen no discussion of the matter.  I think we should have
> them.
>
>

I completely agree; If you want to attract competent people from the
business world, one thing you have to do is respect their time by helping
them find information, especially about things they don't know exist. All
that's needed are pointers, but the pointers need to be to solid
documents/resources, not just the top of a heap - if you'll forgive the
pun.

Richard



-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[email protected], http://ScienceTools.com/




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [DOCS] Replication documentation addition
@ 2006-10-26 17:35  Richard Troy <[email protected]>
  parent: Cesar Suga <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Richard Troy @ 2006-10-26 17:35 UTC (permalink / raw)
  To: Cesar Suga <[email protected]>; +Cc: Joshua D. Drake <[email protected]>; [email protected]

> The documentation comes with the open source tarball.

Yuck.

>
> I would welcome if the docs point to an unofficial wiki (maintained
> externally from authoritative PostgreSQL developers) or a website
> listing them and giving a brief of each solution.
>
> postgresql.org already does this for events (commercial training!) and
> news. Point to postgresql.org/download/commercial as there *already* are
> brief descriptions, pricing and website links.

I wouldn't have looked in "download" for such a thing. Nor would I expect
everyone with a Postgres related solution to want to post it on
PosgreSql.org for download.

However I agree that a simple web page listing such things is needed. It's
easy to manage - way easier to manage than the development of a competent
relational database engine! It's just a bunch of text, after all, and
errors aren't that critical and will tend to self-correct through user
attention.

> >
> > You list the ones that are stable in their existence (commercial or not).
> >
> And how would you determine it? Years of existance? Contribution to
> PostgreSQL's source code? It is not easy and wouldn't be fair. There are
> ones that certainly will be listed, and other doubtful ones (which would
> perhaps complain, that's why I said 'all' - if they are not stable,
> either they stay out of the market or fix their problems).

You have to just trust people. If it's clear that "this isn't
PostgreSql.org", stuff can be unstable, etc - it isn't the group's
problem.

> > No it doesn't. Because there is always the, "It want's to be free!" crowd.
> >
> Yes, I agree there are. But also development in *that* cutting-edge is
> scarce. It feels that something had filled the gap if you list some
> commercial solution, mainly people in the trenches (DBAs). They would,
> obviously, firstly seek the commercial solutions as they are interested.
> So they click 'commercial products' in the main website.

Not necessarily. Most times, I'll seek the better solution, which may or
may not be commercial. Sometimes I'll avoid a commercial version because I
don't like the company!

... But getting genuine donations of time - without direct $$
self-interest attached, is a whole nother kettle o fish.  For example,
there are a lot of students out there that are excellent and would love to
have a mechanism to gain something for their resumes before entering the
business world. ...There might be some residual interest at UCB, for
example. Attracting this kind of support is a completely different
dialogue, but on _this_ topic, surely seeking the "it wants to be free!"
crowd can't (or shouldn't, in my view) be used as an excuse for not
publishing pointers to commercial soltions that involve PostgreSql. Do it
already!

> >> If people (who read the documentation) professionally work with
> >> PostgreSQL, they may already have been briefed by those commercial
> >> offerings in some way.
> >>
> >
> > Maybe, maybe not.

The "may" is a wiggler; sounds like an excuse with a back door. The real
answer is "probably not!" I'm in that world. I haven't been briefed. Ever.

> And I agree with your point, still. However, that would open a precedent
> for people to have to maintain lists of stable software in every
> documentation area.

All that's needed is ONE list, with clear disclaimer. It'll be all text
and links, and maybe the odd small .gif logo, if permitted, so it won't be
a huge thing. Come on now, are there thousands of such products? Tens
sounds more plausible.

Regards,
Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[email protected], http://ScienceTools.com/

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-26 18:27  Jim C. Nasby <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Jim C. Nasby @ 2006-10-26 18:27 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Dawid Kuroczko <[email protected]>; pgsql-docs; [email protected]

On Thu, Oct 26, 2006 at 11:59:57AM -0400, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote:
> > > Jim C. Nasby wrote:
> > > > Something else worth doing though is to have a paragraph explaining why
> > > > there's no built-in replication. I don't have time to write something
> > > > right now, but I can do it later tonight if no one beats me to it.
> > > 
> > > I thought that was implied in the early paragraph about why there are
> > > many solutions.
> > 
> > I think we should explicitely spell it out, especially considering how
> > many times people ask about it. How about...
> > 
> >  This multitude of choices is why PostgreSQL does not ship with a
> >  replication solution by default; any bundled solution would only
> >  satisfy a subset of replication needs.
> 
> The problem is that we do have some solutions in our code, like doing
> data partitioning in the application, warm standby, or using a shared
> disk for failover, so how do we spell that out?  I say there are
> multiple solutions, but I don't see how I can say that all are external
> and not included.

Good point... how about this?
-- 
Jim Nasby                                            [email protected]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Index: doc/src/sgml/failover.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/failover.sgml,v
retrieving revision 1.2
diff -c -r1.2 failover.sgml
*** doc/src/sgml/failover.sgml	26 Oct 2006 17:07:03 -0000	1.2
--- doc/src/sgml/failover.sgml	26 Oct 2006 18:26:21 -0000
***************
*** 29,35 ****
    working together.  Because there is no single solution that eliminates
    the impact of the sync problem for all use cases, there are multiple
    solutions.  Each solution addresses this problem in a different way, and
!   minimizes its impact for a specific workload.
   </para>
  
   <para>
--- 29,40 ----
    working together.  Because there is no single solution that eliminates
    the impact of the sync problem for all use cases, there are multiple
    solutions.  Each solution addresses this problem in a different way, and
!   minimizes its impact for a specific workload.  A few of these solutions are
!   provided with PostgreSQL itself, but it would be impractical for the core
!   database to handle every scenario. That is why most solutions are implemented
!   outside the database. PostgreSQL's unique extensibility is what allows this
!   to happen, and 3rd-party solutions should not be thought of as
!   <qoute>second-rate</> simply because they are not bundled with the database.
   </para>
  
   <para>


Attachments:

  [text/plain] patch (1.4K, 2-patch)
  download | inline:
Index: doc/src/sgml/failover.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/failover.sgml,v
retrieving revision 1.2
diff -c -r1.2 failover.sgml
*** doc/src/sgml/failover.sgml	26 Oct 2006 17:07:03 -0000	1.2
--- doc/src/sgml/failover.sgml	26 Oct 2006 18:26:21 -0000
***************
*** 29,35 ****
    working together.  Because there is no single solution that eliminates
    the impact of the sync problem for all use cases, there are multiple
    solutions.  Each solution addresses this problem in a different way, and
!   minimizes its impact for a specific workload.
   </para>
  
   <para>
--- 29,40 ----
    working together.  Because there is no single solution that eliminates
    the impact of the sync problem for all use cases, there are multiple
    solutions.  Each solution addresses this problem in a different way, and
!   minimizes its impact for a specific workload.  A few of these solutions are
!   provided with PostgreSQL itself, but it would be impractical for the core
!   database to handle every scenario. That is why most solutions are implemented
!   outside the database. PostgreSQL's unique extensibility is what allows this
!   to happen, and 3rd-party solutions should not be thought of as
!   <qoute>second-rate</> simply because they are not bundled with the database.
   </para>
  
   <para>

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-26 19:06  Robert Treat <[email protected]>
  parent: Andrew Sullivan <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Robert Treat @ 2006-10-26 19:06 UTC (permalink / raw)
  To: [email protected]; +Cc: Andrew Sullivan <[email protected]>

On Thursday 26 October 2006 10:45, Andrew Sullivan wrote:
> On Wed, Oct 25, 2006 at 05:46:33PM -0400, Bruce Momjian wrote:
> > Josh Berkus wrote:
> > > So, like www.postgresql.org/docs/techdocs/replication?   That would
> > > work.
> >
> > Yes.
>
> I like that idea, but I think that the URL needs to be decided upon,
> needs to be stable, and needs to be put into the docs.  (I don't see
> it ATM, I guess because the URL isn't chosen yet?)  We get so many
> questions about "what replication system" that I'm sure people are
> looking for outlines.
>
> A

Unfortunately the techdocs system won't support a url like the one above, 
rather you'll end up with something more like the following  
http://www.postgresql.org/docs/techdocs.54 which is the "GUI Tools Guide" 
(which is linked in the FAQ fwiw).  Once it is in place, it will be stable 
though. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-26 19:41  Bruce Momjian <[email protected]>
  parent: Jim C. Nasby <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-10-26 19:41 UTC (permalink / raw)
  To: Jim C. Nasby <[email protected]>; +Cc: Dawid Kuroczko <[email protected]>; pgsql-docs; [email protected]

Jim C. Nasby wrote:
> On Thu, Oct 26, 2006 at 11:59:57AM -0400, Bruce Momjian wrote:
> > Jim C. Nasby wrote:
> > > On Wed, Oct 25, 2006 at 08:42:07PM -0400, Bruce Momjian wrote:
> > > > Jim C. Nasby wrote:
> > > > > Something else worth doing though is to have a paragraph explaining why
> > > > > there's no built-in replication. I don't have time to write something
> > > > > right now, but I can do it later tonight if no one beats me to it.
> > > > 
> > > > I thought that was implied in the early paragraph about why there are
> > > > many solutions.
> > > 
> > > I think we should explicitely spell it out, especially considering how
> > > many times people ask about it. How about...
> > > 
> > >  This multitude of choices is why PostgreSQL does not ship with a
> > >  replication solution by default; any bundled solution would only
> > >  satisfy a subset of replication needs.
> > 
> > The problem is that we do have some solutions in our code, like doing
> > data partitioning in the application, warm standby, or using a shared
> > disk for failover, so how do we spell that out?  I say there are
> > multiple solutions, but I don't see how I can say that all are external
> > and not included.
> 
> Good point... how about this?

Sorry, that is too preachy, and I have the extensibility issue addressed
in the commerical solutions section.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-26 22:26  Andrew Sullivan <[email protected]>
  parent: Robert Treat <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Andrew Sullivan @ 2006-10-26 22:26 UTC (permalink / raw)
  To: [email protected]

On Thu, Oct 26, 2006 at 03:06:13PM -0400, Robert Treat wrote:
> 
> Unfortunately the techdocs system won't support a url like the one above, 
> rather you'll end up with something more like the following  
> http://www.postgresql.org/docs/techdocs.54 which is the "GUI Tools Guide" 
> (which is linked in the FAQ fwiw).  Once it is in place, it will be stable 
> though. 

Surely this is what redirects were invented for, no? 

http://www.postgresql.org/replication redirects to [stable magic URL]

Put the former in the docs.

A

-- 
Andrew Sullivan  | [email protected]
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."
		--Damien Katz



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: Replication documentation addition
@ 2006-10-27 19:57  Richard Troy <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Richard Troy @ 2006-10-27 19:57 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Hannu Krosing <[email protected]>; pgsql-docs; PostgreSQL-development <[email protected]>

On Wed, 25 Oct 2006, Bruce Momjian wrote:

   ...snip...
>
> > Data partitioning is often done within a single database on a single
> > server and therefore, as a concept, has nothing whatsoever to do with
> > different servers. Similarly, the second paragraph of this section is
>
> Uh, why would someone split things up like that on a single server?
>
> > problematic. Please define your term first, then talk about some
> > implementations - this is muddying the water. Further, there are both
> > vertical and horizontal partitioning - you mention neither - and each has
> > its own distinct uses. If partitioning is mentioned, it should be more
> > complete.
>
> Uh, what exactly needs to be defined.

OK, "Data partitioning"; data partitioning begins in the RDB world with
the very notion of tables, and we partition our data during schema
development with the goal of "normalizing" the design - "thrid normal
form" being the one most Professors talk about as a target. "Data
partitioning", then, is the intentional denormalization of the design to
accomplish some goal(s) - not all of which are listed in this document's
title. In this context, data partitioning takes two forms based upon which
axis of a two-dimensional table is to be divided, with the vertical
partition dividing attributes (as in a master/detail relationship with
one-to-one mapping), and the horizontal partition dividing based on one or
more attributes domain, or value (as in your example of London records
being kept in a database in London, while Paris records are kept in
Paris).

The point I was making was that that section of the document was in err
because it presumed there was only one form of data partitioning and that
it was horizontal. (The document is now missing, so I can't look at the
current content - it was here:
ftp://momjian.us/pub/postgresql/mypatches/replication.)

In answer to your query about why someone would use such partitioning, the
nearly universal answer is performance, and the distant second answer is
security. In one example that comes immediately to mind, there is a table
which is a central core of an application, and, as such, there's a lot to
say about the items in this table. The table's size is in the tens to
hundreds of millions of rows, and needs to be joined with something else
in a huge fraction of queries.  For performance reasons, the tables size
was therefore kept as tiny as possible and detail table(s) is(are) used
for the remaining attributes that logically belong in the table - it's a
vertical partition. It's an exceptionally common technique - so common, it
probably didn't occur to you that you were even talking about it when you
spoke of "data partitioning."

> > Next, Query Broadcast Load Balancing... also needs a lot of work. First,
> > it's foremost in my memory that sending read queries everywhere and
> > returning the first result set back is a key way to improve application
> > performance at the cost of additional load on other systems - I guess
> > that's not at all what the document is after here, but it's a worthy part
> > of a dialogue on broadcasting queries. In other words, this has more parts
> > to it than just what the document now entertains. Secondly, the document
>
> Uh, do we want to go into that here?  I guess I could.
>
> > doesn't address _at_all_ whether this is a two-phaise-commit environment
> > or not. If not, how are updates managed? If each server operates
> > independently and one of them fails, what do you do then? How do you know
> > _any_ server got an insert/update? ...  Each server _can't_ operate
> > independently unless the application does its own insert/update commits to
> > every one of them - and that can't be fast, nor does it load balance,
> > though it may contribute to superior uptime performance by the
> > application.
>
> I think having the application middle layer do the commits is how it
> works now.  Can someone explain how pgpool works, or should we mention
> how two-phase commit has to be done here?  pgpool2 has additional
> features.

Well, you hadn't mentioned two phaise commit at all and it surely belong
somewhere in this document - it's a core PG feature and enables a lot of
alternative solutions which the document discusses.

What it needs to say but doesn't (didn't?) is that the load from read
queries can be distributed for load balancing purposes but that there's no
benefit possible for writes, and that replication overhead costs could
possibly overwhelm the benefits in high-update scenarios. The point that
each server operates independently is only true if you ignore the the
necessary replication - which, to my mind, links the systems and they are
not independent. ...I suppose that in a completely read-only environment -
or updated nightly by dumping tarwads or something like that, they could
be considered independent, but it's hardly worth the sentence.

Regards,
Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[email protected], http://ScienceTools.com/

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-10-30 17:23  Chris Browne <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Chris Browne @ 2006-10-30 17:23 UTC (permalink / raw)
  To: pgsql-docs

[email protected] (Bruce Momjian) writes:
> With no new additions submitted today, I have moved my text into our
> SGML documentation:
>
> 	http://momjian.us/main/writings/pgsql/sgml/failover.html
>
> Please let me know what additional changes are needed.

It's looking a lot improved to me...

There are still numerous places where it needs s/Slony/Slony-I/g
because there is more than one thing out there called "Slony," only
one of which is the single-master-to-multiple-subscribers-asynchronous
replication system...

<http://momjian.us/main/writings/pgsql/sgml/query-broadcast-load-balancing.html;

"This can be complex to set up because functions like random() and
CURRENT_TIMESTAMP will have different values on different servers, and
sequences should be consistent across servers."

It doesn't make sense to call this "complex to set up."  This problem
isn't about complexity of setup; it is about whether updates are
processed identically on different hosts.  

Perhaps better:

"Query broadcasting can break down such that servers fall out of sync
if the queries have nondeterministic behavior.  For instance,
functions like random(), CURRENT_TIMESTAMP, and
nextval('some_sequence') will take on different values on different
servers.  Care must be taken at the application level to make sure
that queries are all fully deterministic and that they either COMMIT
or ABORT on all servers."

<http://momjian.us/main/writings/pgsql/sgml/clustering-for-load-balancing.html;
"24.6. Clustering For Load Balancing

In clustering, each server can accept write requests, and these write
requests are broadcast from the original server to all other servers
before each transaction commits. Under heavy load, this can cause
excessive locking and performance degradation. It is implemented by
Oracle in their RAC product. PostgreSQL does not offer this type of
load balancing, though PostgreSQL two-phase commit can be used to
implement this in application code or middleware."

Something doesn't feel entirely right here...

How about...

"24.6. Multimaster Replication For Load Balancing

In this scenario, each server can accept write requests, which are
broadcast from the original server to all other servers before each
transaction commits in order to ensure consistency.  Unfortunately,
under heavy load, the cost of distributing locks across servers can
lead to substantial performance degradation. It is implemented by
Oracle in their RAC product. PostgreSQL does not offer this type of
load balancing, though PostgreSQL two-phase commit using <xref
linkend="sql-prepare-transaction-title"> and <xref linkend=
"sql-commit-prepared-title"> may be used to implement this in
application code or middleware.

The communications costs involved in distributing locks and writes
have the result that write operations are considerably more expensive
than they would be on a single server.  In general, the cost of
distributed locking means that this clustering approach is only usable
across a cluster of servers at a local site.  

There will only be a performance "win" if the cluster mostly processes
read-only traffic that the cluster can distribute across a larger
number of database servers.  Write performance generally degrades a
fair bit as compared to using a single database server.  Reliability
should be enhanced since the cluster should be able to continue work
even if some of the members of the cluster should fail."

<http://momjian.us/main/writings/pgsql/sgml/clustering-for-parallel-query-execution.html;

"24.7. Clustering For Parallel Query Execution

This allows multiple servers to work on a single query. One possible
way this could work is for the data to be split among servers and for
each server to execute its part of the query and results sent to a
central server to be combined and returned to the user. There
currently is no PostgreSQL open source solution for this."

This seems a bit thin.

"24.7. Clustering For Parallel Query Execution

This allows multiple servers to work concurrently on a single query,
analagous to the way RAID permits multiple disk drives to respond
concurrently to disk I/O requests.

One way this could work is for the data to be partitioned across the
servers, where each server executes its part of the query, submitting
results to a central server to be combined and returned to the user.
There currently is no PostgreSQL open source solution for this."
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://cbbrowne.com/info/advocacy.html
Why do we put suits in a garment bag, and put garments in a suitcase? 

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 21:42  Bruce Momjian <[email protected]>
  parent: Chris Browne <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-11-14 21:42 UTC (permalink / raw)
  To: Chris Browne <[email protected]>; +Cc: pgsql-docs

Chris Browne wrote:
> [email protected] (Bruce Momjian) writes:
> > With no new additions submitted today, I have moved my text into our
> > SGML documentation:
> >
> > 	http://momjian.us/main/writings/pgsql/sgml/failover.html
> >
> > Please let me know what additional changes are needed.
> 
> It's looking a lot improved to me...
> 
> There are still numerous places where it needs s/Slony/Slony-I/g
> because there is more than one thing out there called "Slony," only
> one of which is the single-master-to-multiple-subscribers-asynchronous
> replication system...

Fixed.

> <http://momjian.us/main/writings/pgsql/sgml/query-broadcast-load-balancing.html;
> 
> "This can be complex to set up because functions like random() and
> CURRENT_TIMESTAMP will have different values on different servers, and
> sequences should be consistent across servers."
> 
> It doesn't make sense to call this "complex to set up."  This problem
> isn't about complexity of setup; it is about whether updates are
> processed identically on different hosts.  
> 
> Perhaps better:
> 
> "Query broadcasting can break down such that servers fall out of sync
> if the queries have nondeterministic behavior.  For instance,
> functions like random(), CURRENT_TIMESTAMP, and
> nextval('some_sequence') will take on different values on different
> servers.  Care must be taken at the application level to make sure
> that queries are all fully deterministic and that they either COMMIT
> or ABORT on all servers."

I redid the section with:

   Because each server operates independently, functions like
   <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
   sequences can have different values on different servers.  If
   this is unacceptable, applications must query such values from
   a single server and then use those values in write queries.
   Also, care must also be taken that all transactions either commit
   or abort on all servers  Pgpool is an example of this type of
   replication.

> <http://momjian.us/main/writings/pgsql/sgml/clustering-for-load-balancing.html;
> "24.6. Clustering For Load Balancing
> 
> In clustering, each server can accept write requests, and these write
> requests are broadcast from the original server to all other servers
> before each transaction commits. Under heavy load, this can cause
> excessive locking and performance degradation. It is implemented by
> Oracle in their RAC product. PostgreSQL does not offer this type of
> load balancing, though PostgreSQL two-phase commit can be used to
> implement this in application code or middleware."
> 
> Something doesn't feel entirely right here...
> 
> How about...
> 
> "24.6. Multimaster Replication For Load Balancing
> 
> In this scenario, each server can accept write requests, which are
> broadcast from the original server to all other servers before each
> transaction commits in order to ensure consistency.  Unfortunately,
> under heavy load, the cost of distributing locks across servers can
> lead to substantial performance degradation. It is implemented by
> Oracle in their RAC product. PostgreSQL does not offer this type of
> load balancing, though PostgreSQL two-phase commit using <xref
> linkend="sql-prepare-transaction-title"> and <xref linkend=
> "sql-commit-prepared-title"> may be used to implement this in
> application code or middleware.
> 
> The communications costs involved in distributing locks and writes
> have the result that write operations are considerably more expensive
> than they would be on a single server.  In general, the cost of
> distributed locking means that this clustering approach is only usable
> across a cluster of servers at a local site.  
> 
> There will only be a performance "win" if the cluster mostly processes
> read-only traffic that the cluster can distribute across a larger
> number of database servers.  Write performance generally degrades a
> fair bit as compared to using a single database server.  Reliability
> should be enhanced since the cluster should be able to continue work
> even if some of the members of the cluster should fail."

Your description was too detailed, but I took some of your concepts:

  <para>
   In clustering, each server can accept write requests, and these
   write requests are broadcast from the original server to all
   other servers before each transaction commits.  Heavy write
   activity can cause excessive locking, leading to poor performance.
   In fact, write performance is often worse than that of a single
   server.  Read requests can be sent to any server.  Clustering
   is best for mostly read workloads, though its big advantage is
   that any server can accept write requests --- there is no need
   to partition workloads between read/write and read-only servers.
  </para>

  <para>
   Clustering is implemented by <productname>Oracle</> in their
   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
   does not offer this type of load balancing, though
   <productname>PostgreSQL</> two-phase commit (<xref
   linkend="sql-prepare-transaction-title"> and <xref linkend=
   "sql-commit-prepared-title">) can be used to implement this in
   application code or middleware.
  </para>

> 
> <http://momjian.us/main/writings/pgsql/sgml/clustering-for-parallel-query-execution.html;
> 
> "24.7. Clustering For Parallel Query Execution
> 
> This allows multiple servers to work on a single query. One possible
> way this could work is for the data to be split among servers and for
> each server to execute its part of the query and results sent to a
> central server to be combined and returned to the user. There
> currently is no PostgreSQL open source solution for this."
> 
> This seems a bit thin.
> 
> "24.7. Clustering For Parallel Query Execution
> 
> This allows multiple servers to work concurrently on a single query,
> analagous to the way RAID permits multiple disk drives to respond
> concurrently to disk I/O requests.
> 
> One way this could work is for the data to be partitioned across the
> servers, where each server executes its part of the query, submitting
> results to a central server to be combined and returned to the user.
> There currently is no PostgreSQL open source solution for this."

I took some of your wording:

   This allows multiple servers to work concurrently on a single
   query.  One possible way this could work is for the data to be
   split among servers and for each server to execute its part of
   the query and results sent to a central server to be combined
   and returned to the user.  There currently is no
   <productname>PostgreSQL</> open source solution for this.

Because RAID is often used for high availability, I thought mentioning
it in this context was too complicated.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 22:02  Jeff Frost <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Jeff Frost @ 2006-11-14 22:02 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

On Tue, 14 Nov 2006, Bruce Momjian wrote:

> Your description was too detailed, but I took some of your concepts:
>
>  <para>
>   In clustering, each server can accept write requests, and these
>   write requests are broadcast from the original server to all
>   other servers before each transaction commits.  Heavy write
>   activity can cause excessive locking, leading to poor performance.
>   In fact, write performance is often worse than that of a single
>   server.  Read requests can be sent to any server.  Clustering
>   is best for mostly read workloads, though its big advantage is
>   that any server can accept write requests --- there is no need
>   to partition workloads between read/write and read-only servers.
>  </para>
>
>  <para>
>   Clustering is implemented by <productname>Oracle</> in their
>   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
>   does not offer this type of load balancing, though
>   <productname>PostgreSQL</> two-phase commit (<xref
>   linkend="sql-prepare-transaction-title"> and <xref linkend=
>   "sql-commit-prepared-title">) can be used to implement this in
>   application code or middleware.
>  </para>

Bruce,

Continuent's uni/cluster middleware product implements this type of 
clustering/load balancing.  Perhaps it warrants a mention?  Not sure how far 
we want to get into listing external products.


-- 
Jeff Frost, Owner 	<[email protected]>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 22:04  Bruce Momjian <[email protected]>
  parent: Jeff Frost <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-11-14 22:04 UTC (permalink / raw)
  To: Jeff Frost <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

Jeff Frost wrote:
> On Tue, 14 Nov 2006, Bruce Momjian wrote:
> 
> > Your description was too detailed, but I took some of your concepts:
> >
> >  <para>
> >   In clustering, each server can accept write requests, and these
> >   write requests are broadcast from the original server to all
> >   other servers before each transaction commits.  Heavy write
> >   activity can cause excessive locking, leading to poor performance.
> >   In fact, write performance is often worse than that of a single
> >   server.  Read requests can be sent to any server.  Clustering
> >   is best for mostly read workloads, though its big advantage is
> >   that any server can accept write requests --- there is no need
> >   to partition workloads between read/write and read-only servers.
> >  </para>
> >
> >  <para>
> >   Clustering is implemented by <productname>Oracle</> in their
> >   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
> >   does not offer this type of load balancing, though
> >   <productname>PostgreSQL</> two-phase commit (<xref
> >   linkend="sql-prepare-transaction-title"> and <xref linkend=
> >   "sql-commit-prepared-title">) can be used to implement this in
> >   application code or middleware.
> >  </para>
> 
> Bruce,
> 
> Continuent's uni/cluster middleware product implements this type of 
> clustering/load balancing.  Perhaps it warrants a mention?  Not sure how far 
> we want to get into listing external products.

We had a long discussion about that and felt that recommending
commercial products or even every open source project was too much.  The
idea was that we should reference a web page that has them all mentioned,
but no one has set one up yet.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 22:18  Jeff Frost <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Jeff Frost @ 2006-11-14 22:18 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

On Tue, 14 Nov 2006, Bruce Momjian wrote:

> Jeff Frost wrote:
>> On Tue, 14 Nov 2006, Bruce Momjian wrote:
>>
>>> Your description was too detailed, but I took some of your concepts:
>>>
>>>  <para>
>>>   In clustering, each server can accept write requests, and these
>>>   write requests are broadcast from the original server to all
>>>   other servers before each transaction commits.  Heavy write
>>>   activity can cause excessive locking, leading to poor performance.
>>>   In fact, write performance is often worse than that of a single
>>>   server.  Read requests can be sent to any server.  Clustering
>>>   is best for mostly read workloads, though its big advantage is
>>>   that any server can accept write requests --- there is no need
>>>   to partition workloads between read/write and read-only servers.
>>>  </para>
>>>
>>>  <para>
>>>   Clustering is implemented by <productname>Oracle</> in their
>>>   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
>>>   does not offer this type of load balancing, though
>>>   <productname>PostgreSQL</> two-phase commit (<xref
>>>   linkend="sql-prepare-transaction-title"> and <xref linkend=
>>>   "sql-commit-prepared-title">) can be used to implement this in
>>>   application code or middleware.
>>>  </para>
>>
>> Bruce,
>>
>> Continuent's uni/cluster middleware product implements this type of
>> clustering/load balancing.  Perhaps it warrants a mention?  Not sure how far
>> we want to get into listing external products.
>
> We had a long discussion about that and felt that recommending
> commercial products or even every open source project was too much.  The
> idea was that we should reference a web page that has them all mentioned,
> but no one has set one up yet.

That makes sense, I just hate to see us say something like "Oracle can do 
this with RAC but PostgreSQL cannot."

-- 
Jeff Frost, Owner 	<[email protected]>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 22:32  Bruce Momjian <[email protected]>
  parent: Jeff Frost <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-11-14 22:32 UTC (permalink / raw)
  To: Jeff Frost <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

Jeff Frost wrote:
> >>>  <para>
> >>>   Clustering is implemented by <productname>Oracle</> in their
> >>>   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
> >>>   does not offer this type of load balancing, though
> >>>   <productname>PostgreSQL</> two-phase commit (<xref
> >>>   linkend="sql-prepare-transaction-title"> and <xref linkend=
> >>>   "sql-commit-prepared-title">) can be used to implement this in
> >>>   application code or middleware.
> >>>  </para>
> >>
> >> Bruce,
> >>
> >> Continuent's uni/cluster middleware product implements this type of
> >> clustering/load balancing.  Perhaps it warrants a mention?  Not sure how far
> >> we want to get into listing external products.
> >
> > We had a long discussion about that and felt that recommending
> > commercial products or even every open source project was too much.  The
> > idea was that we should reference a web page that has them all mentioned,
> > but no one has set one up yet.
> 
> That makes sense, I just hate to see us say something like "Oracle can do 
> this with RAC but PostgreSQL cannot."

Agreed.  I think we would mention any PostgreSQL solution for this, even
if it is not open source.  We mention solutions as examples in this part
of the documentation.

FYI, as far as I know, Continuent's solution is "Query Broadcast Load
Balancing", not clustering.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 22:40  Jeff Frost <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Jeff Frost @ 2006-11-14 22:40 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

On Tue, 14 Nov 2006, Bruce Momjian wrote:

>>> We had a long discussion about that and felt that recommending
>>> commercial products or even every open source project was too much.  The
>>> idea was that we should reference a web page that has them all mentioned,
>>> but no one has set one up yet.
>>
>> That makes sense, I just hate to see us say something like "Oracle can do
>> this with RAC but PostgreSQL cannot."
>
> Agreed.  I think we would mention any PostgreSQL solution for this, even
> if it is not open source.  We mention solutions as examples in this part
> of the documentation.
>
> FYI, as far as I know, Continuent's solution is "Query Broadcast Load
> Balancing", not clustering.

I would speculate that your terminology is slightly more accurate than mine. 
The do query broadcast, but they also do a bit more with it than that as they 
evaluate many of the non deterministic write queries on a particular server 
and update the broadcast query so each db gets the same value.

I guess middleware of this sort automatically ends up in the query broadcast 
category.  It just sounds awfully similar to the description of cluster for 
load balancing:

In clustering, each server can accept write requests, and these write requests 
are broadcast from the original server to all other servers before each 
transaction commits.

I guess it's kind of a fine line how it gets defined?

-- 
Jeff Frost, Owner 	<[email protected]>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 22:43  Bruce Momjian <[email protected]>
  parent: Jeff Frost <[email protected]>
  0 siblings, 3 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-11-14 22:43 UTC (permalink / raw)
  To: Jeff Frost <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

Jeff Frost wrote:
> > FYI, as far as I know, Continuent's solution is "Query Broadcast Load
> > Balancing", not clustering.
> 
> I would speculate that your terminology is slightly more accurate than mine. 
> The do query broadcast, but they also do a bit more with it than that as they 
> evaluate many of the non deterministic write queries on a particular server 
> and update the broadcast query so each db gets the same value.
> 
> I guess middleware of this sort automatically ends up in the query broadcast 
> category.  It just sounds awfully similar to the description of cluster for 
> load balancing:
> 
> In clustering, each server can accept write requests, and these write requests 
> are broadcast from the original server to all other servers before each 
> transaction commits.
> 
> I guess it's kind of a fine line how it gets defined?

Hmmm.  Interesting.  Does anyone else have details or an opinion on
this?  The fact that there is something sitting above the servers seems
to be the defining issue of calling it query broadcast.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 22:50  Jeff Frost <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 1 reply; 117+ messages in thread

From: Jeff Frost @ 2006-11-14 22:50 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

On Tue, 14 Nov 2006, Bruce Momjian wrote:

>> In clustering, each server can accept write requests, and these write requests
>> are broadcast from the original server to all other servers before each
>> transaction commits.
>>
>> I guess it's kind of a fine line how it gets defined?
>
> Hmmm.  Interesting.  Does anyone else have details or an opinion on
> this?  The fact that there is something sitting above the servers seems
> to be the defining issue of calling it query broadcast.

My thinking on the definition of clustering was that there is some smarts for 
graceful failover and automated or semi-automated ways of bringing failed DB 
servers back up to date and online with the rest of the servers in the 
cluster.  All servers need to be able to accept writes, but do we 
differentiate on where the writes originated (i.e. middleware or another 
postgresql server) or on functionality?

-- 
Jeff Frost, Owner 	<[email protected]>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 22:54  Bruce Momjian <[email protected]>
  parent: Jeff Frost <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-11-14 22:54 UTC (permalink / raw)
  To: Jeff Frost <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

Jeff Frost wrote:
> On Tue, 14 Nov 2006, Bruce Momjian wrote:
> 
> >> In clustering, each server can accept write requests, and these write requests
> >> are broadcast from the original server to all other servers before each
> >> transaction commits.
> >>
> >> I guess it's kind of a fine line how it gets defined?
> >
> > Hmmm.  Interesting.  Does anyone else have details or an opinion on
> > this?  The fact that there is something sitting above the servers seems
> > to be the defining issue of calling it query broadcast.
> 
> My thinking on the definition of clustering was that there is some smarts for 
> graceful failover and automated or semi-automated ways of bringing failed DB 
> servers back up to date and online with the rest of the servers in the 
> cluster.  All servers need to be able to accept writes, but do we 

No, even replication servers can have that.

> differentiate on where the writes originated (i.e. middleware or another 
> postgresql server) or on functionality?

Fundamentally, broadcast means the queries are being propogated outside
the server, with the benefits and limitations inherent in that.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-14 23:37  Jeff Frost <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Jeff Frost @ 2006-11-14 23:37 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

On Tue, 14 Nov 2006, Bruce Momjian wrote:

>> My thinking on the definition of clustering was that there is some smarts for
>> graceful failover and automated or semi-automated ways of bringing failed DB
>> servers back up to date and online with the rest of the servers in the
>> cluster.  All servers need to be able to accept writes, but do we
>
> No, even replication servers can have that.
>
>> differentiate on where the writes originated (i.e. middleware or another
>> postgresql server) or on functionality?
>
> Fundamentally, broadcast means the queries are being propogated outside
> the server, with the benefits and limitations inherent in that.

I'd definitely have to agree with you on that.  I guess I'm trying to decide 
what differentiates clustering for load balancing from query broadcast based 
on your text.  Maybe just don't use the word broadcast here:

"In clustering, each server can accept write requests, and these write 
requests are broadcast from the original server to all other servers before 
each transaction commits."

Unfortunately, I can't seem to come up with anything more clever.

-- 
Jeff Frost, Owner 	<[email protected]>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-15 00:11  Bruce Momjian <[email protected]>
  parent: Jeff Frost <[email protected]>
  0 siblings, 2 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-11-15 00:11 UTC (permalink / raw)
  To: Jeff Frost <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

Jeff Frost wrote:
> On Tue, 14 Nov 2006, Bruce Momjian wrote:
> 
> >> My thinking on the definition of clustering was that there is some smarts for
> >> graceful failover and automated or semi-automated ways of bringing failed DB
> >> servers back up to date and online with the rest of the servers in the
> >> cluster.  All servers need to be able to accept writes, but do we
> >
> > No, even replication servers can have that.
> >
> >> differentiate on where the writes originated (i.e. middleware or another
> >> postgresql server) or on functionality?
> >
> > Fundamentally, broadcast means the queries are being propogated outside
> > the server, with the benefits and limitations inherent in that.
> 
> I'd definitely have to agree with you on that.  I guess I'm trying to decide 
> what differentiates clustering for load balancing from query broadcast based 
> on your text.  Maybe just don't use the word broadcast here:
> 
> "In clustering, each server can accept write requests, and these write 
> requests are broadcast from the original server to all other servers before 
> each transaction commits."
> 
> Unfortunately, I can't seem to come up with anything more clever.

Basically, when you are broadcasting outside the server, you are
broadcasting SQL queries, and those queries do not have information
about non-deterministic functions and have issues with universal commits
on all node.

I think I now see your point about using the word "broadcast" for both
clustering and middle-ware broadcast.  Let me find some new wording and
repost.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-15 00:13  Jeff Frost <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Jeff Frost @ 2006-11-15 00:13 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

On Tue, 14 Nov 2006, Bruce Momjian wrote:

>> "In clustering, each server can accept write requests, and these write
>> requests are broadcast from the original server to all other servers before
>> each transaction commits."
>>
>> Unfortunately, I can't seem to come up with anything more clever.
>
> Basically, when you are broadcasting outside the server, you are
> broadcasting SQL queries, and those queries do not have information
> about non-deterministic functions and have issues with universal commits
> on all node.

Ahh..I like this explanation, because the inter-server communication in 
clustering is not necessarily SQL queries.


-- 
Jeff Frost, Owner 	<[email protected]>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-15 01:10  Bruce Momjian <[email protected]>
  parent: Jeff Frost <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-11-15 01:10 UTC (permalink / raw)
  To: Jeff Frost <[email protected]>; +Cc: Chris Browne <[email protected]>; pgsql-docs

Jeff Frost wrote:
> On Tue, 14 Nov 2006, Bruce Momjian wrote:
> 
> >> "In clustering, each server can accept write requests, and these write
> >> requests are broadcast from the original server to all other servers before
> >> each transaction commits."
> >>
> >> Unfortunately, I can't seem to come up with anything more clever.
> >
> > Basically, when you are broadcasting outside the server, you are
> > broadcasting SQL queries, and those queries do not have information
> > about non-deterministic functions and have issues with universal commits
> > on all node.
> 
> Ahh..I like this explanation, because the inter-server communication in 
> clustering is not necessarily SQL queries.

OK, I have updated the documentation with the attached patch, which
clarifies SQL broadcast vs. modified row propogation.  Current version
is at:

	http://momjian.us/main/writings/pgsql/sgml/failover.html

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +


Attachments:

  [text/x-diff] /bjm/diff (3.9K, 2-%2Fbjm%2Fdiff)
  download | inline diff:
Index: doc/src/sgml/failover.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/failover.sgml,v
retrieving revision 1.5
diff -c -c -r1.5 failover.sgml
*** doc/src/sgml/failover.sgml	14 Nov 2006 22:25:15 -0000	1.5
--- doc/src/sgml/failover.sgml	15 Nov 2006 01:06:42 -0000
***************
*** 149,171 ****
    <title>Query Broadcast Load Balancing</title>
  
    <para>
!    Query broadcast load balancing is accomplished by having a program
!    intercept every query and send it to all servers.  Read-only queries can
!    be sent to a single server because there is no need for all servers to
!    process it.  This is unusual because most replication solutions have
!    each write server propagate its changes to the other servers.  With
!    query broadcasting, each server operates independently.
    </para>
  
    <para>
!    Because each server operates independently, functions like
     <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
!    sequences can have different values on different servers.  If
!    this is unacceptable, applications must query such values from
!    a single server and then use those values in write queries.
!    Also, care must also be taken that all transactions either commit
!    or abort on all servers  Pgpool is an example of this type of
!    replication.
    </para>
   </sect1>
  
--- 149,173 ----
    <title>Query Broadcast Load Balancing</title>
  
    <para>
!    Query broadcast load balancing is accomplished by having a
!    program intercept every SQL query and send it to all servers.
!    This is unique because most replication solutions have the write
!    server propagate its changes to the other servers.  With query
!    broadcasting, each server operates independently.  Read-only
!    queries can be sent to a single server because there is no need
!    for all servers to process it.
    </para>
  
    <para>
!    One limitation of this solution is that functions like
     <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
!    sequences can have different values on different servers.  This
!    is because each server operates independently, and because SQL
!    queries are broadcast (and not actual modified rows).  If this
!    is unacceptable, applications must query such values from a
!    single server and then use those values in write queries.  Also,
!    care must be taken that all transactions either commit or abort
!    on all servers  Pgpool is an example of this type of replication.
    </para>
   </sect1>
  
***************
*** 173,186 ****
    <title>Clustering For Load Balancing</title>
  
    <para>
!    In clustering, each server can accept write requests, and these
!    write requests are broadcast from the original server to all
!    other servers before each transaction commits.  Heavy write
!    activity can cause excessive locking, leading to poor performance.
!    In fact, write performance is often worse than that of a single
     server.  Read requests can be sent to any server.  Clustering
     is best for mostly read workloads, though its big advantage is
!    that any server can accept write requests --- there is no need
     to partition workloads between read/write and read-only servers.
    </para>
  
--- 175,188 ----
    <title>Clustering For Load Balancing</title>
  
    <para>
!    In clustering, each server can accept write requests, and modified
!    data is transmitted from the original server to every other
!    server before each transaction commits.  Heavy write activity
!    can cause excessive locking, leading to poor performance.  In
!    fact, write performance is often worse than that of a single
     server.  Read requests can be sent to any server.  Clustering
     is best for mostly read workloads, though its big advantage is
!    that any server can accept write requests &mdash; there is no need
     to partition workloads between read/write and read-only servers.
    </para>
  


^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-15 09:57  Markus Schiltknecht <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 0 replies; 117+ messages in thread

From: Markus Schiltknecht @ 2006-11-15 09:57 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Jeff Frost <[email protected]>; Chris Browne <[email protected]>; pgsql-docs

Hi,

> Jeff Frost wrote:
>> I would speculate that your terminology is slightly more accurate than mine. 

I can't help it, but I'm still thinking the terminology in the 
replication documentation is somewhat made up.

Bruce Momjian wrote:
> Hmmm.  Interesting.  Does anyone else have details or an opinion on
> this?  The fact that there is something sitting above the servers seems
> to be the defining issue of calling it query broadcast.

I'd argue that "Query Broadcast Load Balancing" and "Clustering For Load 
Balancing" are both the same replication type: sync, multi-master. And 
the problem they try to solve is the same (Load Balancing).

Listing them as two different types of replication... I don't know. But 
we should at least clearly state that both are sync, multi-master 
replication algorithms.

Anyway, instead of mocking around any longer I'm trying to come up with 
a better proposal... patch will follow.

Regards

Markus

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-15 19:45  Peter Eisentraut <[email protected]>
  parent: Bruce Momjian <[email protected]>
  2 siblings, 0 replies; 117+ messages in thread

From: Peter Eisentraut @ 2006-11-15 19:45 UTC (permalink / raw)
  To: pgsql-docs; +Cc: Bruce Momjian <[email protected]>; Jeff Frost <[email protected]>; Chris Browne <[email protected]>

Bruce Momjian wrote:
> Hmmm.  Interesting.  Does anyone else have details or an opinion on
> this?  The fact that there is something sitting above the servers
> seems to be the defining issue of calling it query broadcast.

Well, clustering is just a general term for putting several things, say, 
computers, together to a common cause.  If you cluster a database 
system, you need some way to distribute the incoming requests across 
the machines, which you can either do on the network layer or on the 
application layer.  Sequoia does the latter.  But I don't see 
any "broadcasting" in there as a defining quality.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-17 01:48  Jim Nasby <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 2 replies; 117+ messages in thread

From: Jim Nasby @ 2006-11-17 01:48 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Jeff Frost <[email protected]>; Chris Browne <[email protected]>; pgsql-docs

On Nov 14, 2006, at 5:11 PM, Bruce Momjian wrote:
> Basically, when you are broadcasting outside the server, you are
> broadcasting SQL queries, and those queries do not have information
> about non-deterministic functions and have issues with universal  
> commits
> on all node.

That's true of simple query broadcasting (ie: pgpool), but not true  
of Continuent/Sequoia. Continuent's software adds a lot of additional  
features on top of simple query broadcasting, making it far more  
robust than simply spewing queries out to every node in the cluster.  
You still have to be very careful with how you use it, but not nearly  
as much as with simpler solutions.
--
Jim Nasby                                            [email protected]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)





^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-18 23:29  Josh Berkus <[email protected]>
  parent: Jim Nasby <[email protected]>
  1 sibling, 2 replies; 117+ messages in thread

From: Josh Berkus @ 2006-11-18 23:29 UTC (permalink / raw)
  To: pgsql-docs; +Cc: Jim Nasby <[email protected]>; Bruce Momjian <[email protected]>; Jeff Frost <[email protected]>; Chris Browne <[email protected]>

Jim,

> That's true of simple query broadcasting (ie: pgpool), but not true  
> of Continuent/Sequoia. Continuent's software adds a lot of additional  
> features on top of simple query broadcasting, making it far more  
> robust than simply spewing queries out to every node in the cluster.  
> You still have to be very careful with how you use it, but not nearly  
> as much as with simpler solutions.

I think the general term is "statement-based replication", not "broadcasting".

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-20 09:43  Markus Schiltknecht <[email protected]>
  parent: Josh Berkus <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Markus Schiltknecht @ 2006-11-20 09:43 UTC (permalink / raw)
  To: ; +Cc: pgsql-docs; Bruce Momjian <[email protected]>

Hi,

Josh Berkus wrote:
> I think the general term is "statement-based replication", not "broadcasting".

I agree that this is a better description.

Markus





^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-20 15:01  Bruce Momjian <[email protected]>
  parent: Jim Nasby <[email protected]>
  1 sibling, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-11-20 15:01 UTC (permalink / raw)
  To: Jim Nasby <[email protected]>; +Cc: Jeff Frost <[email protected]>; Chris Browne <[email protected]>; pgsql-docs

Jim Nasby wrote:
> On Nov 14, 2006, at 5:11 PM, Bruce Momjian wrote:
> > Basically, when you are broadcasting outside the server, you are
> > broadcasting SQL queries, and those queries do not have information
> > about non-deterministic functions and have issues with universal  
> > commits
> > on all node.
> 
> That's true of simple query broadcasting (ie: pgpool), but not true  
> of Continuent/Sequoia. Continuent's software adds a lot of additional  
> features on top of simple query broadcasting, making it far more  
> robust than simply spewing queries out to every node in the cluster.  
> You still have to be very careful with how you use it, but not nearly  
> as much as with simpler solutions.

Yes, I have heard that Continuent/Sequoia has a process running on each
server that deals with many of the problems with broadcasting.  Not sure
how I should work that into the documentation.  In fact, based on our
description, the improvements Continuent/Sequoia made are probably
clearer.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-20 15:03  Bruce Momjian <[email protected]>
  parent: Josh Berkus <[email protected]>
  1 sibling, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-11-20 15:03 UTC (permalink / raw)
  To: Josh Berkus <[email protected]>; +Cc: pgsql-docs; Jim Nasby <[email protected]>; Jeff Frost <[email protected]>; Chris Browne <[email protected]>

Josh Berkus wrote:
> Jim,
> 
> > That's true of simple query broadcasting (ie: pgpool), but not true ?
> > of Continuent/Sequoia. Continuent's software adds a lot of additional ?
> > features on top of simple query broadcasting, making it far more ?
> > robust than simply spewing queries out to every node in the cluster. ?
> > You still have to be very careful with how you use it, but not nearly ?
> > as much as with simpler solutions.
> 
> I think the general term is "statement-based replication", not "broadcasting".

Well, the problem is that you can use a statement-based method to
replication from a master to a slave.  I think MySQL used to use this
method, or still does, so I don't think the term "statement-based" is
clear enough, though I am open to other terms than "broadcast".

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-20 15:05  Bruce Momjian <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-11-20 15:05 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Josh Berkus <[email protected]>; pgsql-docs; Jim Nasby <[email protected]>; Jeff Frost <[email protected]>; Chris Browne <[email protected]>

Bruce Momjian wrote:
> Josh Berkus wrote:
> > Jim,
> > 
> > > That's true of simple query broadcasting (ie: pgpool), but not true ?
> > > of Continuent/Sequoia. Continuent's software adds a lot of additional ?
> > > features on top of simple query broadcasting, making it far more ?
> > > robust than simply spewing queries out to every node in the cluster. ?
> > > You still have to be very careful with how you use it, but not nearly ?
> > > as much as with simpler solutions.
> > 
> > I think the general term is "statement-based replication", not "broadcasting".
> 
> Well, the problem is that you can use a statement-based method to
> replication from a master to a slave.  I think MySQL used to use this
> method, or still does, so I don't think the term "statement-based" is
> clear enough, though I am open to other terms than "broadcast".

Oops, I see Markus Schiltknech likes the term "statement-based
replication" better too.  Certainly master-slave communication using
"statement-based replication" has the same drawbacks as the broadcast
method, but I wanted to highlight that the broadcast was happening
outside the server.  Do we need a master/slave "statement-based
replication" item and a middleware broadcast item?

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-20 15:07  Bruce Momjian <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-11-20 15:07 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Josh Berkus <[email protected]>; pgsql-docs; Jim Nasby <[email protected]>; Jeff Frost <[email protected]>; Chris Browne <[email protected]>

Bruce Momjian wrote:
> Bruce Momjian wrote:
> > Josh Berkus wrote:
> > > Jim,
> > > 
> > > > That's true of simple query broadcasting (ie: pgpool), but not true ?
> > > > of Continuent/Sequoia. Continuent's software adds a lot of additional ?
> > > > features on top of simple query broadcasting, making it far more ?
> > > > robust than simply spewing queries out to every node in the cluster. ?
> > > > You still have to be very careful with how you use it, but not nearly ?
> > > > as much as with simpler solutions.
> > > 
> > > I think the general term is "statement-based replication", not "broadcasting".
> > 
> > Well, the problem is that you can use a statement-based method to
> > replication from a master to a slave.  I think MySQL used to use this
> > method, or still does, so I don't think the term "statement-based" is
> > clear enough, though I am open to other terms than "broadcast".
> 
> Oops, I see Markus Schiltknech likes the term "statement-based
> replication" better too.  Certainly master-slave communication using
> "statement-based replication" has the same drawbacks as the broadcast
> method, but I wanted to highlight that the broadcast was happening
> outside the server.  Do we need a master/slave "statement-based
> replication" item and a middleware broadcast item?

OK, new text:

 <varlistentry>
  <term>Statement-Based Replication</term>
  <listitem>

   <para>
    In statement-based replication, a program intercepts every SQL
    query and sends it to all servers.  Each server operates
    independently.  Read-only queries can be sent to a single server
    because there is no need for all servers to process it.
   </para>

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-20 15:35  Markus Schiltknecht <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Markus Schiltknecht @ 2006-11-20 15:35 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Josh Berkus <[email protected]>; pgsql-docs; Jim Nasby <[email protected]>; Jeff Frost <[email protected]>; Chris Browne <[email protected]>

Good morning Bruce,

Bruce Momjian wrote:
>> Oops, I see Markus Schiltknech likes the term "statement-based
>> replication" better too.  Certainly master-slave communication using
>> "statement-based replication" has the same drawbacks as the broadcast
>> method, but I wanted to highlight that the broadcast was happening
>> outside the server.  Do we need a master/slave "statement-based
>> replication" item and a middleware broadcast item?

Ah, I see you had a much narrower definition of statement-based 
replication in mind. As I've pointed out, there are different 
implementations of 'statement-based replication'. I don't know about 
sequoia, but Postgres-R fails back to statement based replication in 
certain situations. Thus having an external 'program intercept every SQL 
query' is absolutely no necessity of this algorithm, it can very well be 
done inside the db backend, where you can better catch non-deterministic 
functions... but again, that's an implementation detail.

So, do you want to describe pgpool here or do you want to give a more 
general description?

>  <varlistentry>
>   <term>Statement-Based Replication</term>
>   <listitem>
> 
>    <para>
>     In statement-based replication, a program intercepts every SQL
>     query and sends it to all servers.  Each server operates
>     independently.  Read-only queries can be sent to a single server
>     because there is no need for all servers to process it.
>    </para>

If you want to go for the general description, I think the 'each server 
operates independently' is somewhere between confusing and false. And 
again, the last sentence applies to all multi-master replication solutions.

Regards

Markus

^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-20 22:10  Bruce Momjian <[email protected]>
  parent: Markus Schiltknecht <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Bruce Momjian @ 2006-11-20 22:10 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: Josh Berkus <[email protected]>; pgsql-docs; Jim Nasby <[email protected]>; Jeff Frost <[email protected]>; Chris Browne <[email protected]>

Markus Schiltknecht wrote:
> Good morning Bruce,
> 
> Bruce Momjian wrote:
> >> Oops, I see Markus Schiltknech likes the term "statement-based
> >> replication" better too.  Certainly master-slave communication using
> >> "statement-based replication" has the same drawbacks as the broadcast
> >> method, but I wanted to highlight that the broadcast was happening
> >> outside the server.  Do we need a master/slave "statement-based
> >> replication" item and a middleware broadcast item?
> 
> Ah, I see you had a much narrower definition of statement-based 
> replication in mind. As I've pointed out, there are different 
> implementations of 'statement-based replication'. I don't know about 
> sequoia, but Postgres-R fails back to statement based replication in 
> certain situations. Thus having an external 'program intercept every SQL 
> query' is absolutely no necessity of this algorithm, it can very well be 
> done inside the db backend, where you can better catch non-deterministic 
> functions... but again, that's an implementation detail.
> 
> So, do you want to describe pgpool here or do you want to give a more 
> general description?

OK, I have updated the title to be "Statement-Based Replication Using
Middleware".  I personally think statement-based replication only makes
sense in middleware because when you are in the backend, you have more
information and can do things better, either by modifying the statement
or passing actual data rows, like Slony does, so I want to restrict this
to middleware like pgpool, and Usogres, which was an early
implementation of this idea.

> >  <varlistentry>
> >   <term>Statement-Based Replication</term>
> >   <listitem>
> > 
> >    <para>
> >     In statement-based replication, a program intercepts every SQL
> >     query and sends it to all servers.  Each server operates
> >     independently.  Read-only queries can be sent to a single server
> >     because there is no need for all servers to process it.
> >    </para>
> 
> If you want to go for the general description, I think the 'each server 
> operates independently' is somewhere between confusing and false. And 
> again, the last sentence applies to all multi-master replication solutions.

Am I OK now?

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-21 08:32  Markus Schiltknecht <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 117+ messages in thread

From: Markus Schiltknecht @ 2006-11-21 08:32 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Josh Berkus <[email protected]>; pgsql-docs; Jim Nasby <[email protected]>; Jeff Frost <[email protected]>; Chris Browne <[email protected]>; [email protected]

Bruce Momjian wrote:
> OK, I have updated the title to be "Statement-Based Replication Using
> Middleware".  I personally think statement-based replication only makes
> sense in middleware because when you are in the backend, 

I completely agree.

> you have more
> information and can do things better, either by modifying the statement
> or passing actual data rows, like Slony does, so I want to restrict this
> to middleware like pgpool, and Usogres, which was an early
> implementation of this idea.

That's fine and reasonable.

> Am I OK now?

The title and first paragraph are fine.

I'd still say that the second paragraph, about limitations is too pgpool 
specific. How's that for sequoia?

And I'm unsure what you mean by mentioning 2PC there. Do you have to 
'make sure every transaction commits or aborts' yourself with pgpool? Or 
did you just want to mention that pgpool does (and has to do) that for you?

Regards

Markus




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

* Re: [HACKERS] Replication documentation addition
@ 2006-11-21 18:30  Bruce Momjian <[email protected]>
  parent: Markus Schiltknecht <[email protected]>
  0 siblings, 0 replies; 117+ messages in thread

From: Bruce Momjian @ 2006-11-21 18:30 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: Josh Berkus <[email protected]>; pgsql-docs; Jim Nasby <[email protected]>; Jeff Frost <[email protected]>; Chris Browne <[email protected]>; [email protected]

Markus Schiltknecht wrote:
> Bruce Momjian wrote:
> > OK, I have updated the title to be "Statement-Based Replication Using
> > Middleware".  I personally think statement-based replication only makes
> > sense in middleware because when you are in the backend, 
> 
> I completely agree.
> 
> > you have more
> > information and can do things better, either by modifying the statement
> > or passing actual data rows, like Slony does, so I want to restrict this
> > to middleware like pgpool, and Usogres, which was an early
> > implementation of this idea.
> 
> That's fine and reasonable.
> 
> > Am I OK now?
> 
> The title and first paragraph are fine.
> 
> I'd still say that the second paragraph, about limitations is too pgpool 
> specific. How's that for sequoia?

OK, I made it more open-ended:

    If queries are simply broadcast unmodified, functions like
    <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
    sequences would have different values on different servers.
    This is because each server operates independently, and because
    SQL queries are broadcast (and not actual modified rows).  If
    this is unacceptable, either the middleware or the application
    must query such values from a single server and then use those
    values in write queries.  Also, care must be taken that all
    transactions either commit or abort on all servers, perhaps
    using two-phase commit (<xref linkend="sql-prepare-transaction"
    endterm="sql-prepare-transaction-title"> and <xref
    linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
    Pgpool is an example of this type of replication.

> And I'm unsure what you mean by mentioning 2PC there. Do you have to 
> 'make sure every transaction commits or aborts' yourself with pgpool? Or 
> did you just want to mention that pgpool does (and has to do) that for you?

I am not sure pgpool does that, but perhaps it should.  Looking at the
pgpool web site, it seems it does not use 2PC (see replication_strict):
	
	http://pgpool.projects.postgresql.org/
	
	replication_mode
	
	    set this true if you are going to use replication functionality.
	    Default is false.
	
	replication_strict
	
	    If true, pgpool will wait for the completion of the master query
	    before sending a query to the secondary server. This is the safest and
	    default operating mode for pgpool. Default is true.

The HA docs merely say that 2PC might be a good way to keep the servers
consistent.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +




^ permalink  raw  reply  [nested|flat] 117+ messages in thread

end of thread, other threads:[~2006-11-21 18:30 UTC | newest]

Thread overview: 117+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2006-10-24 03:39 Replication documentation addition Bruce Momjian <[email protected]>
2006-10-24 03:55 ` Bruce Momjian <[email protected]>
2006-10-24 18:37 ` Josh Berkus <[email protected]>
2006-10-24 19:23   ` Markus Schiltknecht <[email protected]>
2006-10-24 19:34     ` Joshua D. Drake <[email protected]>
2006-10-24 22:05       ` Simon Riggs <[email protected]>
2006-10-24 22:13         ` Joshua D. Drake <[email protected]>
2006-10-24 22:20           ` Simon Riggs <[email protected]>
2006-10-24 22:33             ` Joshua D. Drake <[email protected]>
2006-10-24 23:03               ` Simon Riggs <[email protected]>
2006-10-24 23:24                 ` Joshua D. Drake <[email protected]>
2006-10-24 23:58               ` Jim C. Nasby <[email protected]>
2006-10-25 00:13                 ` Joshua D. Drake <[email protected]>
2006-10-25 00:38                   ` Jeff Frost <[email protected]>
2006-10-25 03:02       ` Bruce Momjian <[email protected]>
2006-10-25 02:56     ` Bruce Momjian <[email protected]>
2006-10-24 23:50 ` Jim C. Nasby <[email protected]>
2006-10-24 04:20 Replication documentation addition Bruce Momjian <[email protected]>
2006-10-24 08:26 ` Markus Schiltknecht <[email protected]>
2006-10-25 02:53   ` Bruce Momjian <[email protected]>
2006-10-24 13:29 ` Hannu Krosing <[email protected]>
2006-10-24 13:54   ` Markus Schiltknecht <[email protected]>
2006-10-25 00:16   ` Bruce Momjian <[email protected]>
2006-10-25 02:55   ` Bruce Momjian <[email protected]>
2006-10-25 03:05     ` Josh Berkus <[email protected]>
2006-10-25 03:08       ` Joshua D. Drake <[email protected]>
2006-10-25 03:48         ` Bruce Momjian <[email protected]>
2006-10-25 04:10           ` Steve Atkins <[email protected]>
2006-10-25 04:20             ` Bruce Momjian <[email protected]>
2006-10-25 04:27               ` Steve Atkins <[email protected]>
2006-10-25 06:51                 ` Cesar Suga <[email protected]>
2006-10-25 14:16                   ` Markus Schaber <[email protected]>
2006-10-25 14:20                   ` Joshua D. Drake <[email protected]>
2006-10-25 14:31                     ` Magnus Hagander <[email protected]>
2006-10-25 14:35                       ` Joshua D. Drake <[email protected]>
2006-10-25 14:58                       ` Tom Lane <[email protected]>
2006-10-25 15:35                         ` Joshua D. Drake <[email protected]>
2006-10-25 15:44                         ` Bruce Momjian <[email protected]>
2006-10-25 16:00                           ` Joshua D. Drake <[email protected]>
2006-10-25 16:02                             ` Bruce Momjian <[email protected]>
2006-10-25 20:34                               ` Dawid Kuroczko <[email protected]>
2006-10-25 20:42                                 ` Bruce Momjian <[email protected]>
2006-10-25 23:49                                   ` Jim C. Nasby <[email protected]>
2006-10-26 00:42                                     ` Bruce Momjian <[email protected]>
2006-10-26 15:55                                       ` Jim C. Nasby <[email protected]>
2006-10-26 15:59                                         ` Bruce Momjian <[email protected]>
2006-10-26 16:19                                           ` Joshua D. Drake <[email protected]>
2006-10-26 16:21                                             ` Bruce Momjian <[email protected]>
2006-10-26 18:27                                           ` Jim C. Nasby <[email protected]>
2006-10-26 19:41                                             ` Bruce Momjian <[email protected]>
2006-10-26 02:08                     ` Cesar Suga <[email protected]>
2006-10-26 17:35                       ` Richard Troy <[email protected]>
2006-10-25 10:52               ` Shane Ambler <[email protected]>
2006-10-25 13:52                 ` Jim C. Nasby <[email protected]>
2006-10-25 14:13               ` Joshua D. Drake <[email protected]>
2006-10-25 14:21                 ` Bruce Momjian <[email protected]>
2006-10-25 14:30                   ` Joshua D. Drake <[email protected]>
2006-10-25 14:08           ` Joshua D. Drake <[email protected]>
2006-10-25 09:38     ` Markus Schiltknecht <[email protected]>
2006-10-25 13:57       ` Jim C. Nasby <[email protected]>
2006-10-25 15:41         ` Bruce Momjian <[email protected]>
2006-10-25 15:43         ` Markus Schiltknecht <[email protected]>
2006-10-25 15:40       ` Bruce Momjian <[email protected]>
2006-10-25 16:20       ` David Fetter <[email protected]>
2006-10-25 16:28         ` Bruce Momjian <[email protected]>
2006-10-25 16:24     ` Richard Troy <[email protected]>
2006-10-25 18:40       ` Richard Troy <[email protected]>
2006-10-25 19:31         ` Bruce Momjian <[email protected]>
2006-10-27 19:57           ` Richard Troy <[email protected]>
2006-10-24 22:14 ` Simon Riggs <[email protected]>
2006-10-25 02:54   ` Bruce Momjian <[email protected]>
2006-10-25 18:33 ` Alexey Klyukin <[email protected]>
2006-10-25 18:41   ` Bruce Momjian <[email protected]>
2006-10-25 18:59     ` Josh Berkus <[email protected]>
2006-10-25 19:32       ` Bruce Momjian <[email protected]>
2006-10-25 20:36         ` Josh Berkus <[email protected]>
2006-10-25 20:37           ` Bruce Momjian <[email protected]>
2006-10-25 20:59             ` Josh Berkus <[email protected]>
2006-10-25 21:46               ` Bruce Momjian <[email protected]>
2006-10-26 14:45                 ` Andrew Sullivan <[email protected]>
2006-10-26 19:06                   ` Robert Treat <[email protected]>
2006-10-26 22:26                     ` Andrew Sullivan <[email protected]>
2006-10-26 17:07           ` Richard Troy <[email protected]>
2006-10-25 00:56 Re: Replication documentation addition Luke Lonergan <[email protected]>
2006-10-25 02:57 ` Bruce Momjian <[email protected]>
2006-10-25 07:37   ` Hannu Krosing <[email protected]>
2006-10-25 14:28     ` Bruce Momjian <[email protected]>
2006-10-25 10:36   ` Magnus Hagander <[email protected]>
2006-10-25 16:49     ` Casey Duncan <[email protected]>
2006-10-26 15:53 Re: [HACKERS] Replication documentation addition Bruce Momjian <[email protected]>
2006-10-30 17:23 ` Chris Browne <[email protected]>
2006-11-14 21:42   ` Bruce Momjian <[email protected]>
2006-11-14 22:02     ` Jeff Frost <[email protected]>
2006-11-14 22:04       ` Bruce Momjian <[email protected]>
2006-11-14 22:18         ` Jeff Frost <[email protected]>
2006-11-14 22:32           ` Bruce Momjian <[email protected]>
2006-11-14 22:40             ` Jeff Frost <[email protected]>
2006-11-14 22:43               ` Bruce Momjian <[email protected]>
2006-11-14 22:50                 ` Jeff Frost <[email protected]>
2006-11-14 22:54                   ` Bruce Momjian <[email protected]>
2006-11-14 23:37                     ` Jeff Frost <[email protected]>
2006-11-15 00:11                       ` Bruce Momjian <[email protected]>
2006-11-15 00:13                         ` Jeff Frost <[email protected]>
2006-11-15 01:10                           ` Bruce Momjian <[email protected]>
2006-11-17 01:48                         ` Jim Nasby <[email protected]>
2006-11-18 23:29                           ` Josh Berkus <[email protected]>
2006-11-20 09:43                             ` Markus Schiltknecht <[email protected]>
2006-11-20 15:03                             ` Bruce Momjian <[email protected]>
2006-11-20 15:05                               ` Bruce Momjian <[email protected]>
2006-11-20 15:07                                 ` Bruce Momjian <[email protected]>
2006-11-20 15:35                                   ` Markus Schiltknecht <[email protected]>
2006-11-20 22:10                                     ` Bruce Momjian <[email protected]>
2006-11-21 08:32                                       ` Markus Schiltknecht <[email protected]>
2006-11-21 18:30                                         ` Bruce Momjian <[email protected]>
2006-11-20 15:01                           ` Bruce Momjian <[email protected]>
2006-11-15 09:57                 ` Markus Schiltknecht <[email protected]>
2006-11-15 19:45                 ` Peter Eisentraut <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox