Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jaiJV-000376-BS for pgsql-hackers@arkaria.postgresql.org; Mon, 18 May 2020 16:08:25 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1jaiJR-00065Z-5n for pgsql-hackers@arkaria.postgresql.org; Mon, 18 May 2020 16:08:21 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jaiJQ-00064C-TL; Mon, 18 May 2020 16:08:20 +0000 Received: from mout.kundenserver.de ([212.227.126.134]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1jaiJN-0007K0-V3; Mon, 18 May 2020 16:08:20 +0000 Received: from [192.168.178.43] ([77.185.60.86]) by mrelayeu.kundenserver.de (mreue011 [212.227.15.129]) with ESMTPSA (Nemesis) id 1N17pC-1iuGj83Odm-012bis; Mon, 18 May 2020 18:08:04 +0200 Subject: Re: Add A Glossary To: pgsql-hackers@postgresql.org, Pg Docs Cc: Erik Rijkers , Fabien COELHO , Corey Huinker , Justin Pryzby , Roger Harkavy References: <20200517152851.GA31376@alvherre.pgsql> From: =?UTF-8?Q?J=c3=bcrgen_Purtz?= Message-ID: Date: Mon, 18 May 2020 18:08:01 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200517152851.GA31376@alvherre.pgsql> Content-Type: multipart/alternative; boundary="------------231FADEDBAC676AEE5ED873E" Content-Language: en-US X-Provags-ID: V03:K1:tGWgNAB5AmV+VK0IRTR+ejtoVa/Uh6Ed9LqHntphFdqOm5seLF5 XfCz+ARwnPvAxcEMTW00kKKu3LPhtsq7ar88DmW5u48+r1XcwkaEotXN3vpfm1ASn4FlpUf x2QWW01sxi26lplYYkL2erLq7xHbmo9rsIij+B6L4IRmZOU+Ispd1lYF4FN9mOvHh79qlp4 LJn8XOpVCMf7wie6vy4rg== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:TZj4+OEoT0s=:RrVQicfOKhoGPcSANLbd08 7+09nk7YF1kinvhPCDbIU5VNMybXyPoCshEUTfHmNQHdbohqDMRJcn5qTxCXpBzpYDLlUiKja Nyy5iBO0H8odTKTJH4ode+FcorvNVUMlboMLr3Tf/VZ7ONsKFMtYZmhqQo+nZDt2hcvJi5rc3 /VzKRQyzXuB5Z5Zgs1fb7EOfC+kKtFdIVDThkUDa1m+xziyi0LrYAc2CzGrnRBWMEP2h/IIXo 1I0NV+aGUo4l4TYJB08De3G8MnpG6F6Of+KUW5Fat6z78KoeOTGREKWFdorER0tjJq5++BAjJ wW6xnu0EscGt/JQnDbJNJqG68UmzTVk+E/hegtPpReZlCgNraSR84YDcrcl+H30YZQzGqDEwb KL3nf2XffWEqbhwrNYD8JrJJDnoLLVBJPtENjUI7r8yjcm5dbMSY7dz+l6/WzlwKGJisTklcu x8y6w6HMpQtKU5wnsTuNDutnTdXTOkJhkU8yHfRhOqgOm5KTtO1BvARElvAbRHu2uDqgiI6PG KrEymoxb5wImHZKkdYYqTQz+vzDfVIK4gDiRjnitf3vnp7O3A7OLqHh8cu0t2ixPqxcQuFCKu T9v302qYRmOqBAg3c+/8YFfhgp5T3US1vEpa9yRo/PmnYIWOlaPjAHHE9nuBENwW4Pr/WEHGI XeP8LOdbOYba4Vvqnz1qZXQPYfB7QGfUgKDEpxDtY1XTkdc45+XKNQbuNHl3zyl+Fkj8BEWr8 jp66jMrhl9y1cYiCyNz58k2TyYl0u/qh8tz/1EmHU6mubkvm1m+pHovwQAUtVl8eEfImzEAe4 GCcrQ2ob3MfcSSBTvxJSk8yod4AFAuNdXMwuDSK6hQl+jk1vQz1p4cmnvyBrM07xwIwz9EE List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk This is a multi-part message in MIME format. --------------231FADEDBAC676AEE5ED873E Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 17.05.20 17:28, Alvaro Herrera wrote: > On 2020-May-17, Erik Rijkers wrote: > >> On 2020-05-17 08:51, Alvaro Herrera wrote: >>> I don't think that's the general understanding of those terms. For all >>> I know, they*are* synonyms, and there's no specific term for "the >>> fluctuating objects" as you call them. The instance is either running >>> (in which case there are processes and RAM) or it isn't. >> For what it's worth, I've also always understood 'instance' as 'a running >> database'. I admit it might be a left-over from my oracle years: >> >> https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601 >> >> There, 'instance' clearly refers to a running database. When that database >> is stopped, it ceases to be an instance. > I've never understood it that way, but I'm open to having my opinion on > it changed. So let's discuss it and maybe gather opinions from others. > > I think the terms under discussion are just > > * cluster > * instance > * server > > We don't have "host" (I just made it a synonym for server), but perhaps > we can add that too, if it's useful. It would be good to be consistent > with historical Postgres usage, such as the initdb usage of "cluster" > etc. > > Perhaps we should not only define what our use of each term is, but also > explain how each term is used outside PostgreSQL and highlight the > differences. (This would be particularly useful for "cluster" ISTM.) In fact, we have reached a point where we don't have a common understanding of a group of terms. I'm sure that we will meet some more situations like this in the future. Such discussions, subsequent decisions, and implementations in the docs are necessary to gain a solid foundation - primarily for newcomers (what is my first motivation) as well as for more complex discussions among experts. Obviously, each of us will include his previous understanding of terms. But we also should be open to sometimes revise old terms. Here are my two cents. cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*.  For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too. The next essential component is the data itself. It is organized as a group of databases plus some common management information (global, pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a whole because the management information concerns all databases. Its nature is different from the processes and shared buffers. Of course, its content changes, but it has a steady nature. It even survives a 'power down'. There is one command to instantiate a new incarnation of the directory structure and all files. In summary, it's something of its own and should have its own name. 'database' is not possible because it consists of databases and other things. My favorite is *cluster*; *database cluster* is also possible. server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine". -- Jürgen Purtz (PS: I added the docs mailing list) --------------231FADEDBAC676AEE5ED873E Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
On 17.05.20 17:28, Alvaro Herrera wrote:
On 2020-May-17, Erik Rijkers wrote:

On 2020-05-17 08:51, Alvaro Herrera wrote:
I don't think that's the general understanding of those terms.  For all
I know, they *are* synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.
For what it's worth, I've also always understood 'instance' as 'a running
database'.  I admit it might be a left-over from my oracle years:

https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601

There, 'instance' clearly refers to a running database.  When that database
is stopped, it ceases to be an instance.
I've never understood it that way, but I'm open to having my opinion on
it changed.  So let's discuss it and maybe gather opinions from others.

I think the terms under discussion are just

* cluster
* instance
* server

We don't have "host" (I just made it a synonym for server), but perhaps
we can add that too, if it's useful.  It would be good to be consistent
with historical Postgres usage, such as the initdb usage of "cluster"
etc.

Perhaps we should not only define what our use of each term is, but also
explain how each term is used outside PostgreSQL and highlight the
differences.  (This would be particularly useful for "cluster" ISTM.)

In fact, we have reached a point where we don't have a common understanding of a group of terms. I'm sure that we will meet some more situations like this in the future. Such discussions, subsequent decisions, and implementations in the docs are necessary to gain a solid foundation - primarily for newcomers (what is my first motivation) as well as for more complex discussions among experts. Obviously, each of us will include his previous understanding of terms. But we also should be open to sometimes revise old terms.

Here are my two cents.

cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*.  For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too.

The next essential component is the data itself. It is organized as a group of databases plus some common management information (global, pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a whole because the management information concerns all databases. Its nature is different from the processes and shared buffers. Of course, its content changes, but it has a steady nature. It even survives a 'power down'. There is one command to instantiate a new incarnation of the directory structure and all files. In summary, it's something of its own and should have its own name. 'database' is not possible because it consists of databases and other things. My favorite is *cluster*; *database cluster* is also possible.

server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine".

--

Jürgen Purtz

(PS: I added the docs mailing list)


--------------231FADEDBAC676AEE5ED873E--