Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.92) (envelope-from ) id 1jFQF7-0003f1-1o for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Mar 2020 22:35:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.89) (envelope-from ) id 1jFQF5-0006Nd-LG for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Mar 2020 22:35:51 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1jFQBx-0001iO-LM for pgsql-hackers@lists.postgresql.org; Fri, 20 Mar 2020 22:32:37 +0000 Received: from mout.kundenserver.de ([212.227.126.134]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1jFQBt-0005uZ-Uw for pgsql-hackers@postgresql.org; Fri, 20 Mar 2020 22:32:36 +0000 Received: from [192.168.178.43] ([77.190.92.158]) by mrelayeu.kundenserver.de (mreue010 [212.227.15.129]) with ESMTPSA (Nemesis) id 1MfHUx-1jmrSL09MZ-00gnPy; Fri, 20 Mar 2020 23:32:26 +0100 From: =?UTF-8?Q?J=c3=bcrgen_Purtz?= Subject: Re: Add A Glossary To: Justin Pryzby , Alvaro Herrera Cc: Corey Huinker , Roger Harkavy , pgsql-hackers@postgresql.org, Fabien COELHO , Michael Paquier References: <20200320001122.GA19602@alvherre.pgsql> <20200320195841.GA13662@telsasoft.com> Message-ID: <64b49f34-eaf6-6de2-e951-0eb8d4afabc2@purtz.de> Date: Fri, 20 Mar 2020 23:32:25 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <20200320195841.GA13662@telsasoft.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Provags-ID: V03:K1:Fz311eVbHx7eGaR6M0jVuiULfxBwCqEmFVZs/7n3D4Oa0OVdaNk lx5nwBSykTDiXdcejwmAdTB4puswpuYJeiDFylnjcCh224Cp7YcHSlATERInGOFCfrs2YTJ fSMhdfgdmwc+uk6sK4voxseJLmqHq84iXaYSPMtCWnMiGxtC0HglqLU8d3gD85Aa8VUFrFM 92+Nve+++DgadWEl6kR/w== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:ZKfJqITmkoM=:viOR88o/MkYvhpJfhaxbe/ wqxu878AM9CT4P/AuJOY0uG/FT+GFiDJp+46kQQBzmsiKrvzoIJd3jPEYzInv+/T20k8Mqbgq 0RnJPriMKliEu/LyPqGInV0mRjHxSFvnkDGGh8PZ2BqCEnjzutYpFdind6o/aZBkE4UAi7Bks vQtEr0aihYSOvIDtvPfk44+EndqiAqaQzor8zfhb70lJRN+P+VoDTL5HtILud3fDBz64Brd/b eBjKmeRs0eO/DS1sINaAXEslRfS7fZRLwjH/AbIFtgTk8j+K+HSaQPo8D7tCVBWYH/gLi4ivv PmPdfFLvpMbGxx/He3aGIkloeJ5uEBF2yq+H8uwKSwWN4xH2rp0TyeRzjk8rcRX+UtCTE8TTA 8FIwgQDVoRUDyP02HwCjxAjIhR99cj0s8+vRwvLIK12nSyHSFA/gzvGPOS1ldKgXCBuT6ebor csE6J/ZgdyUoVYGJaXEVdhWmq3URcS2a208/qV9yH7nqBMiTK6XT+TeIAGSlCpvLHekwJW+lv N4IAK1kRz5hVS3k6AjUqum0ToD3o9iToY4ZcZoIisKQqYjMM+zBMARgyOgQ6h5oOGfIDHap26 ymMrKRkUkezIkRVNfTZOCEFODC/F9t61Pv7G/2vJZY2C46oxeTzocNvjlyb072PtX6dKwZRXM 8NkPq659EdogFmIEwLHRAeSfSNILo4CBS6Z4E2NBaxR9cejIm+NHTnwbk6AO3fNaa30+lYT6k RoGmuLcuZCpqT1mmmmuSSxlEGAE3tXIfnvy1jyeGmkzkSZhe9rTnqpCCPpHoxlzOZ61pWf1fb aS8N2lu2rUwqokIwN+iDiPqOSHzrmqBwHuHBmzTlO8QZDcr8qUFd2XJ8u2PD35FZikT8qnn List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk On 20.03.20 20:58, Justin Pryzby wrote: > On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote: >> + Aggregate >> + >> + >> + To combine a collection of data values into a single value, whose >> + value may not be of the same type as the original values. >> + Aggregate Functions >> + combine multiple Rows that share a common set >> + of values into one Row, which means that the >> + only data visible in the values in common, and the aggregates of the > IS the values in common ? > (or, "is the shared values") > >> + Analytic >> + >> + >> + A Function whose computed value can reference >> + values found in nearby Rows of the same >> + Result Set. >> + Archiver > Can you change that to archiver process ? I prefer the short term without the addition of 'process' - concerning 'Archiver' as well as the other cases. But I'm not an native English speaker. >> + Atomic > .. >> + >> + In reference to an operation: An event that cannot be completed in >> + part: it must either entirely succeed or entirely fail. A series of > Can you say: "an action which is not allowed to partially succed and then fail, > ..." > >> + Autovacuum > Say autovacuum process ? > >> + >> + >> + Processes that remove outdated MVCC > I would say "A set of processes that remove..." > >> + Records of the Heap and > I'm not sure, can you say "tuples" ? This concerns the upcomming MVCC terms. We need a linguistic distinction between the different versions of 'records' or 'tuples'. In my understanding the term 'tuple' is nearer to a logical construct (relational algebra) and a 'record' some concrete implementation on disc. Therefor I prefer 'record' in this context. >> + Backend Process >> + >> + >> + Processes of an Instance which act on behalf of > Say DATABASE instance -1: The term 'database' is used inflationary. We shall restrict it to a few cases. >> + Backend Server >> + >> + >> + See Instance. > same > >> + Background Worker >> + >> + >> + Individual processes within an Instance, which > same > >> + run system- or user-supplied code. Typical use cases are processes >> + which handle parts of an SQL query to take >> + advantage of parallel execution on servers with multiple >> + CPUs. > I would say "A typical use case is" +1 >> + Background Writer > Add "process" ? > >> + >> + >> + Writes continuously dirty pages from Shared > Say "Continuously writes" +1 >> + Memory to the file system. It starts periodically, but > Hm, maybe "wakes up periodically" +1 >> + Cast >> + >> + >> + A conversion of a Datum from its current data >> + type to another data type. > maybe just say > A conversion of a Datum another data type. > >> + Catalog >> + >> + >> + The SQL standard uses this standalone term to >> + indicate what is called a Database in >> + PostgreSQL's terminology. > Maybe remove "standalone" ? > >> + Checkpointer > Process > >> + A process that writes dirty pages and WAL >> + Records to the file system and creates a special > Does the chckpointer actually write WAL ? YES, not only WAL Writer. >> + checkpoint record. This process is initiated when predefined >> + conditions are met, such as a specified amount of time has passed, or >> + a certain volume of records have been collected. > collected or written? > > I would say: >> + A checkpoint is usually initiated by >> + a specified amount of time having passed, or >> + a certain volume of records having been written. +-0 >> + Checkpoint >> + >> + >> + A Checkpoint is a point in time > Extra space > >> + >> + Connection >> + >> + >> + A TCP/IP or socket line for inter-process > I don't know if I've ever heard the phase "socket line" > I guess you mean a unix socket. +1 >> + Constraint >> + >> + >> + A concept of restricting the values of data allowed within a >> + Table. > Just say: "A restriction on the values..."? > >> + Data Area > Remove this ? I've never heard this phrase before. grep on *.sgml delivers 4 occurrences. >> + >> + >> + The base directory on the filesystem of a >> + Server that contains all data files and >> + subdirectories associated with a Cluster with >> + the exception of tablespaces. The environment variable > Should add an entry for "tablespace". +1 >> + Datum >> + >> + >> + The internal representation of a SQL data type. > I'm not sure if should use "a SQL" or "an SQL", but not both. grep | wc delivers 106 occurrences for "an SQL" and 63 for "a SQL". It depends on how people pronounce the term SQL: "an esquel" or "a sequel". >> + Delete >> + >> + >> + A SQL command whose purpose is to remove > just say "which removes" +1 >> + >> + File Segment >> + >> + >> + If a heap or index file grows in size over 1 GB, it will be split > 1GB is the default "segment size", which you should define. ??? >> + >> + Foreign Data Wrapper >> + >> + >> + A means of representing data that is not contained in the local >> + Database as if were in local >> + Table(s). > I'd say: > > + A means of representing data as a Table(s) even though > + it is not contained in the local Database > > >> + >> + Foreign Key >> + >> + >> + A type of Constraint defined on one or more >> + Columns in a Table which >> + requires the value in those Columns to uniquely >> + identify a Row in the specified >> + Table. > An FK doesn't require the values in its table to be unique, right ? > I'd say something like: "..which enforces that the values in those Columns are > also present in an(other) table." > Reference Referential Integrity? > >> + Function >> + >> + >> + Any pre-defined transformation of data. Many >> + Functions are already defined within >> + PostgreSQL itself, but can also be >> + user-defined. > I would remove "pre-", since you mentioned that it can be user-defined. > >> + Global SQL Object >> + >> + >> + >> + Not all SQL Objects belong to a certain >> + Schema. Some belong to the complete >> + Database, or even to the complete >> + Cluster. These are referred to as >> + Global SQL Objects. Collations and Extensions >> + such as Foreign Data Wrappers reside at the >> + Database level; Database >> + names, Roles, >> + Tablespaces, Replication >> + origins, and subscriptions for logical >> + Replication at the >> + Cluster level. > I think "complete" is the wrong world. > I would say: > "An object which is not specific to a given database, but instead shared across > the entire Cluster". This phrase seems to be too simple. We must differentiate between the different levels: schema, database, cluster. Possibly someone finds a better phrase. >> + >> + Grant >> + >> + >> + A SQL command that is used to enable > I'd say "allow" > >> + >> + Heap >> + >> + >> + Contains the original values of Row attributes > I'm not sure what "original" means here ? Yes, this may be misleading. I want to express, that values are stored in the heap (the 'original') and possibly repeated as a key in an index. >> + (i.e. the data). The Heap is realized within >> + Database files and mirrored in >> + Shared Memory. > I wouldn't say mirrored, and probably just remove at least the part after "and". +-0 >> + >> + Host >> + >> + >> + See Server. > Or client. Or proxy at some layer or other intermediate thing. Maybe just > remove this. Sometimes the term "host" is used in a different meaning. Therefor we shall have this glossary entry for clarification that it shall be used only in the sense of a "server". >> + >> + Index >> + >> + >> + A Relation that contains data derived from a >> + Table (or Relation such >> + as a Materialized View). It's internal > Its > >> + structure supports very fast retrieval of and access to the original >> + data. >> + Instance >> + >> + > ... >> + >> + Many Instances can run on the same server as >> + long as they use different IP ports and manage > I would say "as long as their TCP/IP ports or sockets don't conflict, and manage..." +1 >> + Join >> + >> + >> + A technique used with SELECT statements for >> + correlating data in one or more Relations. > I would refer to this as a SQL keyword allowing to combine data from multiple > relations. > >> + Lock >> + >> + >> + A mechanism for one process temporarily preventing data from being >> + manipulated by any other process. > I'd say: > > + A mechanism by which a process protects simultaneous access to a resource > + by other processes. > > (I said "protects" since shared locks don't prevent all access, and it's easier > than explaining "unsafe access"). > > >> + >> + Log File >> + >> + >> + LOG files contain readable >> + text lines about serious and non-serious events, e.g.: use of wrong >> + password, long-running queries, ... . > Serious and non-serious? ok, can be removed: 'events' only. >> + Log Writer > process > >> + >> + >> + If activated and parameterized, the > I don't know what parameterized means here ok, unnecessary term. (There are parameters for the Log Writer process in the config file.) >> + Log Writer process >> + writes information about database events into the current >> + Log file. When reaching certain time- or >> + volume-dependent criterias, he creates a new > I think criteria is the plural.. +1 >> + Log Record > Can we remove this ? > Couple releases ago, "pg_xlog" was renamed to pg_wal. > I'd prefer to avoid defining something called "Log Record" about WAL that's > right next to text logs. "... that's right next to text logs."  This is the problem, which shall be clarified. The rename of the directory does not affect the records which are written into the WAL files or are used for replication. The term "log record" is used in the documentation as well as in error messages. >> + Logged >> + >> + >> + A Table is considered >> + Logged if changes to it are sent to the >> + WAL Log. By default, all regular >> + Tables are Logged. A >> + Table can be speficied as unlogged either at >> + creation time or via the ALTER TABLE command. The >> + primary use of unlogged Tables is for storing >> + transient work data that must be shared across processes, but with a >> + final result stored in logged Tables. >> + Temporary Tables are always unlogged. >> + >> + >> + > Maybe it's be better to define "unlogged", since 1) logged is the default; and > 2) it's right next to text logs. > >> + Master >> + >> + >> + When two or more Databases are linked via >> + Replication, the Server >> + that is considered the authoritative source of information is called >> + the Master. > I think it'd actually be the <> which is authoritative, in case they're > running on the same <> In this phase of the glossary we shall avoid the discussion about master/slave vs. primary/secondary. Some weeks ago we have seen many contributions without a clear result. In one of the next phases of the glossary we shall discuss all terms concerning replication separately. >> + >> + Materialized >> + >> + >> + The act of storing information rather than just the means of accessing > remove "means of" ? > >> + the information. This term is used in Materialized >> + Views meaning that the data derived from the >> + View is actually stored on disk separate from > separately > >> + the sources of that data. When the term >> + Materialized is used in speaking about >> + mulit-step queries, it means that the data of a given step is stored > multi > >> + (in memory, but that storage may spill over onto disk). >> + >> + >> + >> + >> + >> + Materialized View >> + >> + >> + A Relation that is defined in the same way that >> + a View is, but it stores data in the same way > change "it stores" to stores > >> + >> + Partition >> + >> + >> + >> + a) A Table that can be queried independently by >> + its own name, but can also be queried via another > just say "on its own" or "directly" > >> + Table, a partitionend > partitioned > also, put it in parens, like "via another table (a partitioned table)..." > >> + Table, which is a collection of > Say "set" here since you later talk about "subsets" and sets. > >> + >> + Primary Key >> + >> + >> + A special case of Unique Index defined on a >> + Table or other Relation >> + that also guarantees that all of the Attributes >> + within the Primary Key do not have >> + Null values. As the name implies, there can be >> + only one Primary Key per >> + Table, though it is possible to have multiple >> + Unique Indexes that also have no >> + Null-capable Attributes. > I would say "multiple >>unique indexes<< on >>attributes<< defined as not > nullable. > >> + Procedure >> + >> + >> + A defined set of instructions for manipulating data within a >> + Database. Procedure can > "procedures" or "a procedure" > >> + Record >> + >> + >> + See Tupple. > Tupple is back. And again below. > >> + A single Row of a Table >> + or other Relation. > I think it's commonly used to mean "an instance of a row" (in an MVCC sense), > but maybe that's too much detail for here. > >> + Referential Integrity >> + >> + >> + The means of restricting data in one Relation > A means > >> + >> + Relation >> + >> + >> + The generic term for all objects in a Database > "A generic term for any object in a >>database<< that has a name and..." > >> + >> + Result Set >> + >> + >> + A data structure transmitted from a Server to >> + client program upon the completion of a SQL >> + command, usually a SELECT but it can be an >> + INSERT, UPDATE, or >> + DELETE command if the RETURNING >> + clause is specified. > I'd remove everything in that sentence after "usually". > >> + Revoke >> + >> + >> + A command to reduce access to a named set of > s/reduce/prevent/ ? > >> + Row >> + >> + >> + See Tupple. > tuple > >> + >> + Savepoint >> + >> + >> + A special mark (such as a timestamp) inside a >> + Transaction. Data modifications after this >> + point in time may be rolled back to the time of the savepoint. > I don't think "timestamp" is a useful or accurate analogy for this. +1 >> + Schema >> + >> + >> + A schema is a namespace for >> + SQL objects, which all reside in the same >> + database. Each SQL >> + object must reside in exactly one >> + Schema. >> + >> + >> + In general, the names of SQL objects in the >> + schema are unique - even across different types of objects. The lone >> + exception is the case of Unique >> + Constraints, in which case there >> + must be a Unique Index >> + with the same name and Schema as the >> + Constraint. There is no restriction on having >> + a name used in multiple Schemas. > I think there's some confusion. Constraints are not objects, right ? > > But, constraints do have an exception (not just unique constraints, though): > the constraint is only unique on its table, not in its database/schema. > > "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname) CLUSTER Yes, you are right. But give me some time for a better suggestion. >> + Select >> + >> + >> + The command used to query a Database. Normally, >> + SELECTs are not expected to modify the >> + Database in any way, but it is possible that >> + Functions invoked within the query could have >> + side-effects that do modify data. > I think there should be references to the sql-* pages for this and others. > >> + >> + Serializable >> + >> + >> + Transactions defined as SERIALIZABLE are unable to >> + see changes made within other transactions. In effect, for the >> + initializing session the entire Database >> + appears to be frozen duration such a >> + Transaction. > Do you mean "for the duration of the >>Transaction<<" > >> + >> + Session >> + >> + >> + A Connection to the Database. >> + >> + >> + A description of the commands that were issued in the life cycle of a >> + particular Connection to the >> + Database. > I'm not sure what this means. > >> + Sequence >> + >> + >> + >> + An Database object which represents the > A not An > >> + mathematical concept of a numerical integral sequence. It can be >> + thought of as a Table with exactly one >> + Row and one Column. The >> + value stored is known as the current value. A >> + Sequence has a defined direction (almost always >> + increasing) and an interval step (usually 1). Whenever the >> + NEXTVAL pseudo-column of a >> + Sequence is accessed, the current value is moved >> + in the defined direction by the defined interval step, and that value > say "given interval step" > >> + Shared Memory >> + >> + >> + RAM which is used by the processes common to an >> + Instance. It mirrors parts of >> + Database files, provides an area for >> + WAL Records, > Do we use shared_buffers for WAL ? Yes, my information is that WAL records are part of the shared_buffers. >> + >> + Table >> + >> + >> + A collection of Tuples (also known as >> + Rows or Records) having >> + a common data structure (the same number of >> + Attributess, in the same order, having the same > Attributes has two esses. > >> + name and type per position). A Table is the > I don't think you need to say here that the columns of a table all have the > same type and order. In my opinion this is an essential information. >> + Temporary Tables >> + >> + >> + Tables that exist either for the lifetime of a >> + Session or a >> + Transaction, as defined at creation time. The > I would say "as specified at the time of its creation". > >> + Transaction >> + >> + >> + A combination of one or more commands that must act as a single > Remove "one or more" > >> + Trigger >> + >> + >> + A Function which can be defined to execute >> + whenever a certain operation (INSERT, >> + UPDATE, or DELTE) is applied to >> + that Relation. A Trigger > s/that/a/ > >> + >> + Unique >> + >> + >> + The condition of having no matching values in the same > s/matching/duplicate/ > >> + Relation. Most often used in the concept of > s/concept/context/ > >> + >> + Update >> + >> + >> + A command used to modify Rows that already > or 'may already' > >> + WAL File > ... >> + >> + The sequence of WAL Records in combination with >> + the sequence of WAL Files represents the > Remove "in combination with the sequence of >WAL Files<" > >> + >> + WAL Log > Can you just say WAL or "write-ahead log". Sometimes the term "WAL log" is used in the documentation. But the preferred term is "WAL file". This glossary entry does nothing but points to the preferred term, which indicates that he shall be avoided in the future. >> + >> + >> + A WAL Record contains either new or changed >> + Heap or Index data or >> + information about a COMMIT, >> + ROLLBACK, SAVEPOINT, or >> + Checkpointer operation. WAL records use a >> + non-printabe binary format. > non-printable +1 > Or just remove it. > Or just remove the sentence. > >> + WAL Writer > process > >> + >> + Window Function >> + >> + >> + A type of Function similar to an >> + Aggregate in that can derive its value from a > in that IT > >> + set of Rows in a Result >> + Set, but still retaining the original source data. Kind regards, Jürgen