public inbox for [email protected]  
help / color / mirror / Atom feed
From: Jürgen Purtz <[email protected]>
To: [email protected]
To: Pg Docs <[email protected]>
Cc: Erik Rijkers <[email protected]>
Cc: Fabien COELHO <[email protected]>
Cc: Corey Huinker <[email protected]>
Cc: Justin Pryzby <[email protected]>
Cc: Roger Harkavy <[email protected]>
Subject: Re: Add A Glossary
Date: Mon, 18 May 2020 18:08:01 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

On 17.05.20 17:28, Alvaro Herrera wrote:
> On 2020-May-17, Erik Rijkers wrote:
>
>> On 2020-05-17 08:51, Alvaro Herrera wrote:
>>> I don't think that's the general understanding of those terms.  For all
>>> I know, they*are*  synonyms, and there's no specific term for "the
>>> fluctuating objects" as you call them.  The instance is either running
>>> (in which case there are processes and RAM) or it isn't.
>> For what it's worth, I've also always understood 'instance' as 'a running
>> database'.  I admit it might be a left-over from my oracle years:
>>
>> https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601
>>
>> There, 'instance' clearly refers to a running database.  When that database
>> is stopped, it ceases to be an instance.
> I've never understood it that way, but I'm open to having my opinion on
> it changed.  So let's discuss it and maybe gather opinions from others.
>
> I think the terms under discussion are just
>
> * cluster
> * instance
> * server
>
> We don't have "host" (I just made it a synonym for server), but perhaps
> we can add that too, if it's useful.  It would be good to be consistent
> with historical Postgres usage, such as the initdb usage of "cluster"
> etc.
>
> Perhaps we should not only define what our use of each term is, but also
> explain how each term is used outside PostgreSQL and highlight the
> differences.  (This would be particularly useful for "cluster" ISTM.)

In fact, we have reached a point where we don't have a common 
understanding of a group of terms. I'm sure that we will meet some more 
situations like this in the future. Such discussions, subsequent 
decisions, and implementations in the docs are necessary to gain a solid 
foundation - primarily for newcomers (what is my first motivation) as 
well as for more complex discussions among experts. Obviously, each of 
us will include his previous understanding of terms. But we also should 
be open to sometimes revise old terms.

Here are my two cents.

cluster/instance: PG (mainly) consists of a group of processes that 
commonly act on shared buffers. The processes are very closely related 
to each other and with the buffers. They exist altogether or not at all. 
They use a common initialization file and are incarnated by one command. 
Everything exists solely in RAM and therefor has a fluctuating nature. 
In summary: they build a unit and this unit needs to have a name of 
itself. In some pages we used to use the term *instance* - sometimes in 
extended forms: *database instance*, *PG instance*, *standby instance*, 
*standby server instance*, *server instance*, or *remote instance*.  For 
me, the term *instance* makes sense, the extensions *standby instance* 
and *remote instance* in their context too.

The next essential component is the data itself. It is organized as a 
group of databases plus some common management information (global, 
pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a 
whole because the management information concerns all databases. Its 
nature is different from the processes and shared buffers. Of course, 
its content changes, but it has a steady nature. It even survives a 
'power down'. There is one command to instantiate a new incarnation of 
the directory structure and all files. In summary, it's something of its 
own and should have its own name. 'database' is not possible because it 
consists of databases and other things. My favorite is *cluster*; 
*database cluster* is also possible.

server/host: We need a term to describe the underlying hardware 
respectively the virtual machine or container, where PG is running. I 
suggest to use both *server* and *host*. In computer science, both have 
their eligibility and are widely used. Everybody understands 
*client/server architecture* or *host* in TCP/IP configuration. We 
cannot change such matter of course. I suggest to use both depending on 
the context, but with the same meaning: "real hardware, a container, or 
a virtual machine".

-- 

Jürgen Purtz

(PS: I added the docs mailing list)




view thread (97+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Add A Glossary
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox