Replication Docs

public inbox for [email protected]  
help / color / mirror / Atom feed

Replication Docs
4+ messages / 2 participants
[nested] [flat]

* Replication Docs
@ 2006-11-22 09:02 Markus Schiltknecht <[email protected]>
  2006-11-22 17:36 ` Re: Replication Docs Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Markus Schiltknecht @ 2006-11-22 09:02 UTC (permalink / raw)
  To: pgsql-docs; +Cc: [email protected]

Hello Bruce,

I was trying to put together all comments to specific sections, thus the 
new thread. Hope that helps.

*** Synchronous Multi-Master Replication ***

Bruce Momjian wrote:
 > OK, new title is "Synchonous Multi-Master Replication", and the next
 > heading is "Asynchronous Multi-Master Replication".

Good, I really like that one. :-)

 >> Why not simply call in "Multi Master Replication"? That implies
 >> clustering, doesn't it?
 >
 > Well, not really because of the async multi-master that is the next
 > item.

Yes, it's fine that way. I was just unsure if you want to have sync and 
async in one paragraph or not. The proposal "Multi Master Replication" 
would only fit if we'd describe both in one paragraph. I like to 
describe both in more detail, as you did now.

 >> BTW, I'm slowly beginning to accept that you don't want to mix
 >> "Statement-Based Replication Middleware" with "Multi Master
 >> Replication". ;-)
 >
 > OK, are they mixed now?

No, they're not. They're split, which I think is what you want. I've 
been uncomfortable with was that split into "Statement-Based Replication 
Middleware" and "Synchronous Multi-Master Replication". I've been 
arguing that the first describes one possible implementation of the 
second, while other implementations are not described (2PC, SHMEM, 
Postgres-R, etc...)

I was trying to say that I'm beginning to accept that split, because 
especially pgpool really seems to put a lot of those burdens to the 
user. I've been trying to use some humor, but that mainly seems to 
confuse people. My english might not be good enough for humor, yet.

However, where do you now fit Sequoia in? It uses "statement-based 
replication", but AFAIK it is much more clever than pgpool and handles 
non-deterministic functions. And the Sequoia people probably won't get 
excited about not calling them "Multi-Master Replication".

Bruce Momjian wrote:
 > I just saw it [the slides about PGCluster-II].  It does seem more like
 > Oracle RAC than any other method.

Yes. I think it's not production ready, yet, so there's no point in 
mentioning it in the documentation.

Bruce Momjian wrote:
 > I figured that shared-disk/memory only really makes sense for
 > multi-master clustering, so I mentioned it in that paragraph:
 >
 > ...<snipped the new paragraph>
 >
 > Is that enought?

I'd say so, yes. We are not going into more details for other aspects so 
that's fine.

You might not even mention shared-memory. I don't know of any 
implementation in the database world. Except perhaps using OpenMosix and 
running PostgreSQL on top of it. Maybe just leave it in there, it won't 
hurt.

Bruce Momjian wrote:
 > One problem I have is that we we have shared disk failover, but no
 > other shared case with a PostgreSQL implementation, and people don't
 > want to mention Oracle RAC, so why do we mention it if we have no
 > implementations even in the works.

Most probably you're already aware that with PGCluster-II we have such 
an implementation in the works.

*** Asynchronous Multi-Master Replication ***

 >> Again, IMHO, "Parallel Query Execution" says everything. The word
 >> 'Clustering' does not help, because it's not defined nor commonly
 >> used in any helpful way (probably besides marketing).
 >
 > OK, new title is Multi-Server Parallel Query Execution.  If I have
 > just "Parallel Query Execution", it could be multi-process parallel
 > query execution.

Yes, the new title is good.

In the text below, you are mainly describing what I call 'disconnected 
operation' (somebody have a better, more common term for that?). But the 
main advantage of async replication is having no delay before commit. 
Thus giving better performance for writing transactions.

In case of async, multi master replication, conflicts can arise, which
have to be resolved. I think your example does not make it clear that 
this applies to async, multi master replication in general. And that 
those can sometimes be resolved automatically.

*** Multi-Master Parallel Query Execution ***

Bruce Momjian wrote:
 > Uh, multi-master replication allows for load balancing, but it doesn't
 > help a single query to run any faster.  Think of having only one query
 > running on the cluster.  Parallel execution allows a single query to
 > use more than one computer, right?

Right.

 > Uh, this confuses me.  What is missing?  You split tables across
 > multiple servers.

In "Multi-Master Parallel Query Execution" you write: "One possible way 
this could work is for the data to be split among servers". So the 
example you give involves Data Partitioning.

I wanted to point out that another way to do Parallel Query Execution is 
using Multi-Master Replication to have equal replicas and then query 
them in parallel. I don't think there is any solution for that, yet. 
Except, perhaps PGPool-II can do it?

*** Introduction Text on the top ***

Bruce Momjian wrote:
 > OK, updated to add "little" delay, and removed "small" from async
 > case:
 >
 >   load-balanced servers will return consistent results with little
 >   propagation delay. Asynchronous updating has a delay between the

Hm, that does not address my concerns. But after thinking about it, I 
can accept the term 'consistent results' - it's clear enough what it 
means. I'm probably thinking into too many details...

But now, the "little delays" certainly is in the wrong place. Such 
delays occur before commit, not before returning results.

Maybe revert it back to "..no propagation delay". Or completely leave 
away the "no propagation delay".

Sorry for the noise here.

Regards

Markus

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Replication Docs
  2006-11-22 09:02 Replication Docs Markus Schiltknecht <[email protected]>
@ 2006-11-22 17:36 ` Bruce Momjian <[email protected]>
  2006-11-22 18:03   ` Re: Replication Docs Markus Schiltknecht <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Bruce Momjian @ 2006-11-22 17:36 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: pgsql-docs

Markus Schiltknecht wrote:
> Hello Bruce,
> 
> I was trying to put together all comments to specific sections, thus the 
> new thread. Hope that helps.
> 
> *** Synchronous Multi-Master Replication ***
> 
> Bruce Momjian wrote:
>  > OK, new title is "Synchonous Multi-Master Replication", and the next
>  > heading is "Asynchronous Multi-Master Replication".
> 
> Good, I really like that one. :-)

Great (until we change it again)  ;-)

>  >> Why not simply call in "Multi Master Replication"? That implies
>  >> clustering, doesn't it?
>  >
>  > Well, not really because of the async multi-master that is the next
>  > item.
> 
> Yes, it's fine that way. I was just unsure if you want to have sync and 
> async in one paragraph or not. The proposal "Multi Master Replication" 
> would only fit if we'd describe both in one paragraph. I like to 
> describe both in more detail, as you did now.

OK, it is two separate entries now:

	http://momjian.us/main/writings/pgsql/sgml/high-availability.html

>  >> BTW, I'm slowly beginning to accept that you don't want to mix
>  >> "Statement-Based Replication Middleware" with "Multi Master
>  >> Replication". ;-)
>  >
>  > OK, are they mixed now?
> 
> No, they're not. They're split, which I think is what you want. I've 
> been uncomfortable with was that split into "Statement-Based Replication 
> Middleware" and "Synchronous Multi-Master Replication". I've been 
> arguing that the first describes one possible implementation of the 
> second, while other implementations are not described (2PC, SHMEM, 
> Postgres-R, etc...)
> 
> I was trying to say that I'm beginning to accept that split, because 
> especially pgpool really seems to put a lot of those burdens to the 
> user. I've been trying to use some humor, but that mainly seems to 
> confuse people. My english might not be good enough for humor, yet.
> 
> However, where do you now fit Sequoia in? It uses "statement-based 
> replication", but AFAIK it is much more clever than pgpool and handles 
> non-deterministic functions. And the Sequoia people probably won't get 
> excited about not calling them "Multi-Master Replication".

Uh, good point.  The title is now "Statement-Based Replication
Middleware".  That doesn't say multi-master, but it doesn't say
master/slave either.  The Sequoia PDF you sent me is very detailed:

  http://www.continuent.org/uploads/sequoia/Resources/2006-08-15Cecchet_ApacheConAsia2006.pdf

I think we are back to the issue of classification.  We have traditional
master/slave as slony, and multi-master as perhaps pgcluster, and lots
in between.  I am thinking pgpool and sequoia fit in there.  I have
added Sequoia to the Statement-Based Replication Middleware section.

> Bruce Momjian wrote:
>  > I just saw it [the slides about PGCluster-II].  It does seem more like
>  > Oracle RAC than any other method.
> 
> Yes. I think it's not production ready, yet, so there's no point in 
> mentioning it in the documentation.

OK.

> Bruce Momjian wrote:
>  > I figured that shared-disk/memory only really makes sense for
>  > multi-master clustering, so I mentioned it in that paragraph:
>  >
>  > ...<snipped the new paragraph>
>  >
>  > Is that enought?
> 
> I'd say so, yes. We are not going into more details for other aspects so 
> that's fine.

OK.

> You might not even mention shared-memory. I don't know of any 
> implementation in the database world. Except perhaps using OpenMosix and 
> running PostgreSQL on top of it. Maybe just leave it in there, it won't 
> hurt.

OK, I will only mention shared disk now.

> Bruce Momjian wrote:
>  > One problem I have is that we we have shared disk failover, but no
>  > other shared case with a PostgreSQL implementation, and people don't
>  > want to mention Oracle RAC, so why do we mention it if we have no
>  > implementations even in the works.
> 
> Most probably you're already aware that with PGCluster-II we have such 
> an implementation in the works.

I do now.  :-)  I think we are OK with the additional sentence about
shared disk in the Synchonous Multi-Master Replication section, right?

> *** Asynchronous Multi-Master Replication ***
> 
>  >> Again, IMHO, "Parallel Query Execution" says everything. The word
>  >> 'Clustering' does not help, because it's not defined nor commonly
>  >> used in any helpful way (probably besides marketing).
>  >
>  > OK, new title is Multi-Server Parallel Query Execution.  If I have
>  > just "Parallel Query Execution", it could be multi-process parallel
>  > query execution.
> 
> Yes, the new title is good.
> 
> In the text below, you are mainly describing what I call 'disconnected 
> operation' (somebody have a better, more common term for that?). But the 
> main advantage of async replication is having no delay before commit. 
> Thus giving better performance for writing transactions.
> 
> In case of async, multi master replication, conflicts can arise, which
> have to be resolved. I think your example does not make it clear that 
> this applies to async, multi master replication in general. And that 
> those can sometimes be resolved automatically.

OK, good point, section updated:

	  <term>Asynchronous Multi-Master Replication</term>
	  <listitem>
	
	   <para>
	    For servers that are not regularly connected, like laptops or
	    remote servers, keeping data consistent among servers is a
	    challenge.  Using asynchronous multi-master replication, each
	    server works independently, and periodically communicates with
	    the other servers to identify conflicting transactions.  The
	    conflicts can be resolved by users or conflict resolution rules.
	    rules.

> 
> 
> *** Multi-Master Parallel Query Execution ***
> 
> Bruce Momjian wrote:
>  > Uh, multi-master replication allows for load balancing, but it doesn't
>  > help a single query to run any faster.  Think of having only one query
>  > running on the cluster.  Parallel execution allows a single query to
>  > use more than one computer, right?
> 
> Right.
> 
>  > Uh, this confuses me.  What is missing?  You split tables across
>  > multiple servers.
> 
> In "Multi-Master Parallel Query Execution" you write: "One possible way 
> this could work is for the data to be split among servers". So the 
> example you give involves Data Partitioning.

OK.

> I wanted to point out that another way to do Parallel Query Execution is 
> using Multi-Master Replication to have equal replicas and then query 
> them in parallel. I don't think there is any solution for that, yet. 
> Except, perhaps PGPool-II can do it?

Uh, if the data isn't partitioned, what value is there to hitting
multiple servers, for single query?  I am confused.

> *** Introduction Text on the top ***
> 
> Bruce Momjian wrote:
>  > OK, updated to add "little" delay, and removed "small" from async
>  > case:
>  >
>  >   load-balanced servers will return consistent results with little
>  >   propagation delay. Asynchronous updating has a delay between the
> 
> Hm, that does not address my concerns. But after thinking about it, I 
> can accept the term 'consistent results' - it's clear enough what it 
> means. I'm probably thinking into too many details...

OK.

> But now, the "little delays" certainly is in the wrong place. Such 
> delays occur before commit, not before returning results.

Uh, I don't think the little appears to talk about the results but only
the propogation.

> Maybe revert it back to "..no propagation delay". Or completely leave 
> away the "no propagation delay".

OK, how is this new text?

  This guarantees that a failover will not lose any data and that
  all load-balanced servers will return consistent results no matter
  which server is queried.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Replication Docs
  2006-11-22 09:02 Replication Docs Markus Schiltknecht <[email protected]>
  2006-11-22 17:36 ` Re: Replication Docs Bruce Momjian <[email protected]>
@ 2006-11-22 18:03   ` Markus Schiltknecht <[email protected]>
  2006-11-22 18:14     ` Re: Replication Docs Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Markus Schiltknecht @ 2006-11-22 18:03 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: pgsql-docs; [email protected]

Hi,

Bruce Momjian wrote:
> OK, it is two separate entries now:
> 
> 	http://momjian.us/main/writings/pgsql/sgml/high-availability.html

Yes, that's fine with me.

> Uh, good point.  The title is now "Statement-Based Replication
> Middleware".  That doesn't say multi-master, but it doesn't say
> master/slave either.  The Sequoia PDF you sent me is very detailed:
> 
>   http://www.continuent.org/uploads/sequoia/Resources/2006-08-15Cecchet_ApacheConAsia2006.pdf
> 
> I think we are back to the issue of classification.  We have traditional
> master/slave as slony, and multi-master as perhaps pgcluster, and lots
> in between.  I am thinking pgpool and sequoia fit in there.  I have
> added Sequoia to the Statement-Based Replication Middleware section.

I'll look into that shortly, but I think Emmanuel can better categorize 
sequoia, I've CCed him. I'd certainly categorize it as Multi Master 
Replication (like pgpool, only that it's a poor implementation).

>> Most probably you're already aware that with PGCluster-II we have such 
>> an implementation in the works.
> 
> I do now.  :-)  I think we are OK with the additional sentence about
> shared disk in the Synchonous Multi-Master Replication section, right?

Yes.

> OK, good point, section updated:
> 
> 	  <term>Asynchronous Multi-Master Replication</term>
> 	  <listitem>
> 	
> 	   <para>
> 	    For servers that are not regularly connected, like laptops or
> 	    remote servers, keeping data consistent among servers is a
> 	    challenge.  Using asynchronous multi-master replication, each
> 	    server works independently, and periodically communicates with
> 	    the other servers to identify conflicting transactions.  The
> 	    conflicts can be resolved by users or conflict resolution rules.
> 	    rules.
> 

Good, that sounds better for me.

There's only a typo at the very end:

"..conflict resolution rules. rules."

> Uh, if the data isn't partitioned, what value is there to hitting
> multiple servers, for single query?  I am confused.

Right, makes only sense for complex queries, i.e. when having multiple 
seq scans and/or joins. The executor would have to be super clever for 
such things to happen. Just forget about my comment.

>> But now, the "little delays" certainly is in the wrong place. Such 
>> delays occur before commit, not before returning results.
> 
> Uh, I don't think the little appears to talk about the results but only
> the propogation.
> 
>> Maybe revert it back to "..no propagation delay". Or completely leave 
>> away the "no propagation delay".
> 
> OK, how is this new text?
> 
>   This guarantees that a failover will not lose any data and that
>   all load-balanced servers will return consistent results no matter
>   which server is queried.

I like that wording better, yes.

Regards

Markus





^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Replication Docs
  2006-11-22 09:02 Replication Docs Markus Schiltknecht <[email protected]>
  2006-11-22 17:36 ` Re: Replication Docs Bruce Momjian <[email protected]>
  2006-11-22 18:03   ` Re: Replication Docs Markus Schiltknecht <[email protected]>
@ 2006-11-22 18:14     ` Bruce Momjian <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Bruce Momjian @ 2006-11-22 18:14 UTC (permalink / raw)
  To: Markus Schiltknecht <[email protected]>; +Cc: pgsql-docs; [email protected]

Markus Schiltknecht wrote:
> Hi,
> 
> Bruce Momjian wrote:
> > OK, it is two separate entries now:
> > 
> > 	http://momjian.us/main/writings/pgsql/sgml/high-availability.html
> 
> Yes, that's fine with me.

Good.

> > Uh, good point.  The title is now "Statement-Based Replication
> > Middleware".  That doesn't say multi-master, but it doesn't say
> > master/slave either.  The Sequoia PDF you sent me is very detailed:
> > 
> >   http://www.continuent.org/uploads/sequoia/Resources/2006-08-15Cecchet_ApacheConAsia2006.pdf
> > 
> > I think we are back to the issue of classification.  We have traditional
> > master/slave as slony, and multi-master as perhaps pgcluster, and lots
> > in between.  I am thinking pgpool and sequoia fit in there.  I have
> > added Sequoia to the Statement-Based Replication Middleware section.
> 
> I'll look into that shortly, but I think Emmanuel can better categorize 
> sequoia, I've CCed him. I'd certainly categorize it as Multi Master 
> Replication (like pgpool, only that it's a poor implementation).

OK, let's see what they say.  Right now, middleware is a separate
section.

> Good, that sounds better for me.
> 
> There's only a typo at the very end:
> 
> "..conflict resolution rules. rules."

OK, fixed, thanks.

> > Uh, if the data isn't partitioned, what value is there to hitting
> > multiple servers, for single query?  I am confused.
> 
> Right, makes only sense for complex queries, i.e. when having multiple 
> seq scans and/or joins. The executor would have to be super clever for 
> such things to happen. Just forget about my comment.

Oh, I see, splitting I/O load even with multiple copies --- interesting,
but seems too far out for this documentation, as you suggested above.

-- 
  Bruce Momjian   [email protected]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +




^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2006-11-22 18:14 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2006-11-22 09:02 Replication Docs Markus Schiltknecht <[email protected]>
2006-11-22 17:36 ` Bruce Momjian <[email protected]>
2006-11-22 18:03   ` Markus Schiltknecht <[email protected]>
2006-11-22 18:14     ` Bruce Momjian <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox