Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tLiCq-00GVK5-8z for pgsql-announce@arkaria.postgresql.org; Thu, 12 Dec 2024 12:22:12 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tLiCn-00Ajfu-PR for pgsql-announce@arkaria.postgresql.org; Thu, 12 Dec 2024 12:22:11 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tLiCm-00Ajfd-GW for pgsql-announce@lists.postgresql.org; Thu, 12 Dec 2024 12:22:09 +0000 Received: from mahout.postgresql.org ([2001:4800:3e1:1::227]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tLiCj-002SPy-Dz for pgsql-announce@lists.postgresql.org; Thu, 12 Dec 2024 12:22:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Message-ID:Date:Reply-To:From:To:Subject: MIME-Version:Content-Type:Sender:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=FqDbT4sJtuTJPJ0I61q9yL/jj72Tuyc3ano0ZMYx/hM=; b=2mclc0sTl+q4w8j8k86QQgzaC3 SZ1WCxMeu8RQLKMXY8YdWywQAftlRVfencuD83lBBrzyDf8WxgHf7DLAMz7XRg8dp4deWMjm6gj9v FET+SLCcVxq/WwUrpLFOhD8R+lymCEukZIB4Vrn/QqtM8+XWZ29HL/4UO8CRlV9ILfZZYNMBm+uoe ijvjtZ7EraKvVmK+4BafvnUKoPVk8Uh9iwbjJdF+ZXacd0FPT2n9aCdQF3IYtYvZUo6qy6SBAREzs bekODywYm3U+s+7tlFIILgrP42iXFpCknOycEt0Xk8DCdB2sg5uxuomTm2aAwOCVH3tITE3Z5/cjv GXs/LJrg==; Received: from wrigleys.postgresql.org ([2a02:16a8:dc51::60]) by mahout.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tLiCi-007aNs-8T for pgsql-announce@lists.postgresql.org; Thu, 12 Dec 2024 12:22:04 +0000 Received: from localhost ([127.0.0.1] helo=wrigleys.postgresql.org) by wrigleys.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tLiCg-001INh-AY for pgsql-announce@lists.postgresql.org; Thu, 12 Dec 2024 12:22:02 +0000 Content-Type: multipart/mixed; boundary="===============7273329112385339151==" MIME-Version: 1.0 Subject: Greenmask 0.2.0 - 0.2.5 Releases To: PostgreSQL Announce From: "Greenmask.io via PostgreSQL Announce" Reply-To: info@greenmask.io Date: Thu, 12 Dec 2024 12:21:14 +0000 Message-ID: <173400607408.3842595.8157034097312578740@wrigleys.postgresql.org> X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-pglister-tags: related X-pglister-tagsig: 5e05bce0c581a989bdb18ab78c9a86db14b87143390331a6ef94f22a09238af7 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --===============7273329112385339151== Content-Type: multipart/alternative; boundary="===============3625081730217649155==" MIME-Version: 1.0 --===============3625081730217649155== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ## PostgreSQL database anonymization and synthetic data generation tool These releases mark major milestones, significantly expanding Greenmask's f= unctionality and transforming it into a simple, extensible, and reliable so= lution for database security, data anonymization, and everyday operations. = Our goal is to build a core system that serves as the foundation for compre= hensive dynamic staging environments and robust data security. These updates introduce new features such as database subsetting, pgzip sup= port, restoration in topological order, and refactored transformers, greatl= y enhancing Greenmask's flexibility to meet diverse business needs. They al= so include numerous fixes and improvements. ## Greenmask Overview Greenmask is a powerful open-source utility that is designed for logical da= tabase backup dumping, anonymization, synthetic data generation and restora= tion. It is stateless and does not require any changes to your database sch= ema. It is designed to be highly customizable and backward-compatible with = existing PostgreSQL utilities, fast and reliable. Is perfect for: * **Backup and Restoration**: Streamline daily tasks like logical backups, = table restoration after truncation, or replacing pg_dump and pg_restore wit= h ease. * **Anonymization and Data Masking**: Simplify staging environment setup an= d analytical tasks by anonymizing and transforming backups, ensuring consis= tent, secure data for faster Greenmask on [GitHub](https://github.com/GreenmaskIO/greenmask) ## Notable changes * PostgreSQL 17 support - revised ported library to support PostgreSQL 17 * [Database Subset](https://docs.greenmask.io/v0.2.5/database_subset/) - a = new feature that allows you to define a subset of the database, allowing y= ou to scale down the dump size ([#110](https://github.com/GreenmaskIO/green= mask/issues/110)). This is robust for multipurpose and especially useful f= or testing and development environments. It supports: * References with [NULL values](https://docs.greenmask.io/v0.2.5/databa= se_subset#references-with-null-values) - generate the LEFT JOIN query for t= he FK reference with NULL values to include them in the subset. * Supports [virtual references](https://docs.greenmask.io/v0.2.5/databa= se_subset#virtual-references) (virtual foreign keys) - create a logical = FK in Greenmask that will be used for subset dependencies graph. The virt= ual reference can be defined for a column or an expression, allowing you to= get the value from JSON and similar. * Supports [circular references](https://docs.greenmask.io/v0.2.5/datab= ase_subset#circular-reference) - Greenmask will automatically resolve circular dependencies in the subset by generating a recursive query. = The query is generated with integrity checks of the subset ensuring that th= e data gathered from circular dependencies is consistent. * Fully covered with documentation including [troubleshooting](https://= docs.greenmask.io/v0.2.5/database_subset#troubleshooting) and [examples](ht= tps://docs.greenmask.io/v0.2.5/database_subset#example-dump-a-subset-of-the= -database). * Supports FK and PK that have more than one column (or expression). * **Multi-cycles resolution in one strong connected component (SCC)** i= s supported - Greenmask will generate a recursive query for the SCC whether= it is a single cycle or multiple cycles, making the subset system universa= l for any database schema. * **Supports polymorphic relationships** - You can define a [virtual re= ference for a table with polymorphic references](https://docs.greenmask.io/= v0.2.5/database_subset/#polymorphic-references) using `polymorphic_exprs` a= ttribute and use greenmask to generate a subset for such tables. * [Transformation conditions](https://docs.greenmask.io/v0.2.5/built_in_tra= nsformers/transformation_condition/) - execute a defined transformation on= ly if a specified condition is met. [#133](https://github.com/GreenmaskIO/g= reenmask/pull/133) * [Transformation inheritance](https://docs.greenmask.io/v0.2.5/built_in_tr= ansformers/transformation_inheritance/) - transformation inheritance for pa= rtitioned tables and tables with foreign keys. Define once and apply to all= . [#229] * **pgzip** support for faster [compression](https://docs.greenmask.io/v0.2= .5/commands/dump#pgzip-compression) and [decompression](https://docs.green= mask.io/v0.2.5/commands//restore#pgzip-decompression) =E2=80=94 setting `--= pgzip` can speed up the dump and restoration processes through parallel com= pression. In some tests, it shows up to 5x faster dump and restore operatio= ns. * [Restoration in topological order](https://docs.greenmask.io/v0.2.5/comma= nds/restore/#restoration-in-topological-order) - This flag ensures that dep= endent tables are not restored until the tables they depend on have been re= stored. This is useful when you want to be notified of errors as immediatel= y as possible without waiting for the entire table to be restored. * [Insert format](https://docs.greenmask.io/v0.2.5/commands/restore#inserts= -and-error-handling) restoration - For a flexible restoration process, Gree= nmask now supports data restoration in the `INSERT` format. It generates th= e insert statements based on `COPY` records from the dump. You do not need = to re-dump your data to use this feature; it can be defined in the `restore= ` command. The list of new features related to the `INSERT` format: * Generate `INSERT` statements with the `ON CONFLICT DO NOTHING` clause= if the flag `--on-conflict-do-nothing` is set. * **[Error exclusion list](https://docs.greenmask.io/v0.2.5/configurati= on/#restoration-error-exclusion)** in the config to skip certain errors and= continue inserting subsequent rows from the dump. * Use cases - **incremental dump and restoration** for logical data. Fo= r example, if you have a database, and you want to insert data periodically= from another source, this can be used together with the database subset an= d transformations to catch up the target database. * [Restore data batching](https://docs.greenmask.io/v0.2.5/commands/restore= #restore-data-batching) ([#173](https://github.com/GreenmaskIO/greenmask/pu= ll/174)) - By default, the COPY protocol returns the error only on transact= ion commit. To override this behavior, use the `--batch-size` flag to speci= fy the number of rows to insert in a single batch during the COPY command. = This is useful when you want to control the transaction size and commit. * [Introduced](https://github.com/GreenmaskIO/greenmask/pull/162) `keep_nul= l` parameter for `RandomPerson` transformer. * [Introduced dynamic parameters in the transformers](https://docs.greenmas= k.io/v0.2.5/built_in_transformers/dynamic_parameters/) * Most transformers now support dynamic parameters where applicable. * Dynamic parameters are strictly enforced. If you need to cast values = to another type, Greenmask provides templates and predefined cast functions= accessible via `cast_to`. These functions cover frequent operations such a= s `UnixTimestampToDate` and `IntToBool`. * The transformation logic has been significantly refactored, making transf= ormers more customizable and flexible than before. * [Introduced transformation engines](https://docs.greenmask.io/v0.2.5/buil= t_in_transformers/transformation_engines) * `random` - generates transformer values based on pseudo-random algori= thms. * `hash` - generates transformer values using hash functions. Currently= , it utilizes `sha3` hash functions, which are secure but perform slowly. I= n the stable release, there will be an option to choose between `sha3` and = `SipHash`. * [Introduced static parameters value template](https://docs.greenmask.io/v= 0.2.5/built_in_transformers/parameters_templating) * [Dumps retention management](https://docs.greenmask.io/v0.2.5/commands/de= lete) - Introduced retention parameters ([#201](https://github.com/Greenmas= kIO/greenmask/pull/201)) for the delete command. Introduced two new statuse= s: failed and in progress. A dump is considered failed if it lacks a "done"= heartbeat or if the last heartbeat timestamp exceeds 30 minutes. The delet= e command now supports the following retention parameters: * `--dry-run`: Runs the deletion operation in test mode with verbose ou= tput, without actually deleting anything. * `--before-date 2024-08-27T23:50:54+00:00`: Deletes dumps older than t= he specified date. The date must be provided in RFC3339Nano format, for example: `2021-01-01T00:00:00Z`. * `--retain-recent 10`: Retains the N most recent dumps, where N is spe= cified by the user. * `--retain-for 1w2d3h4m5s6ms7us8ns`: Retains dumps for the specified d= uration. The format supports weeks (w), days (d), hours (h), minutes (m), s= econds (s), milliseconds (ms), microseconds (us), and nanoseconds (ns). * `--prune-failed`: Prunes (removes) all dumps that have failed. * `--prune-unsafe`: Prunes dumps with "unknown-or-failed" statuses. Thi= s option only works in conjunction with `--prune-failed`. #### Releases list:=20 * [v0.2.0](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.0) * [v0.2.1](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.1) * [v0.2.2](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.2) * [v0.2.3](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.3) * [v0.2.4](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.4) * [v0.2.5](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.5) ## Links Feel free to reach out to us if you have any questions or need assistance: * [Greenmask repository](https://github.com/GreenmaskIO/greenmask) * [Documentation](https://docs.greenmask.io/) * [Greenmask Roadmap](https://github.com/orgs/GreenmaskIO/projects/6) * [Discord](https://discord.gg/tAJegUKSTB) * [Telegram](https://t.me/greenmask_community) * [Email](mailto:support@greenmask.io) * [Twitter](https://twitter.com/GreenmaskIO) --===============3625081730217649155== Content-Type: text/html; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Greenmask 0.2.0 - 0.2.5 Releases
 

Greenmask 0.2.0 - 0.2.5 Releases

PostgreSQL database anonymizati= on and synthetic data generation tool

These releases mark major milestones, signi= ficantly expanding Greenmask's functionality and transforming it into a sim= ple, extensible, and reliable solution for database security, data anonymiz= ation, and everyday operations. Our goal is to build a core system that ser= ves as the foundation for comprehensive dynamic staging environments and ro= bust data security.

These updates introduce new features such a= s database subsetting, pgzip support, restoration in topological order, and= refactored transformers, greatly enhancing Greenmask's flexibility to meet= diverse business needs. They also include numerous fixes and improvements.=

Greenmask Overview

Greenmask is a powerful open-source utility= that is designed for logical database backup dumping, anonymization, synth= etic data generation and restoration. It is stateless and does not require = any changes to your database schema. It is designed to be highly customizab= le and backward-compatible with existing PostgreSQL utilities, fast and rel= iable.

Is perfect for:

  • Backup = and Restoration: Streamline daily tasks like logical backups, tabl= e restoration after truncation, or replacing pg_dump and pg_restore with ea= se.
  • Anonymi= zation and Data Masking: Simplify staging environment setup and an= alytical tasks by anonymizing and transforming backups, ensuring consistent= , secure data for faster

Greenmask on GitHub

Notable changes

  • PostgreSQL 17 support - revised ported libr= ary to support PostgreSQL 17

  • Dat= abase Subset - a new feature that allows you to define a subset of the = database, allowing you to scale down the dump size (#110). This is robust for multipurpose and especia= lly useful for testing and development environments. It supports:

    • References with= NULL va= lues - generate the LEFT JOIN query for the FK reference with NULL valu= es to include them in the subset.
    • Supports virtual references = (virtual foreign keys) - create a logical FK in Greenmask that will be= used for subset dependencies graph. The virtual reference can be defined f= or a column or an expression, allowing you to get the value from JSON and s= imilar.
    • Supports circular references= - Greenmask will automatically resolve circular dependencies in the subset by generating a recursive query. The = query is generated with integrity checks of the subset ensuring that the da= ta gathered from circular dependencies is consistent.
    • Fully covered w= ith documentation including troubleshooting and examples.
    • Supports FK and= PK that have more than one column (or expression).
    • Multi-c= ycles resolution in one strong connected component (SCC) is suppor= ted - Greenmask will generate a recursive query for the SCC whether it is a= single cycle or multiple cycles, making the subset system universal for an= y database schema.
    • Support= s polymorphic relationships - You can define a virtual reference for a table w= ith polymorphic references using polymorphic_exprs attribu= te and use greenmask to generate a subset for such tables.
  • Transformation conditions - execute a defi= ned transformation only if a specified condition is met. #133

  • Transformation in= heritance - transformation inheritance for partitioned tables and table= s with foreign keys. Define once and apply to all. [#229]
  • pgzip support for faster compression and decompression =E2=80=94 setting --pgzip can = speed up the dump and restoration processes through parallel compression. I= n some tests, it shows up to 5x faster dump and restore operations.
  • Restoration in t= opological order - This flag ensures that dependent tables are not rest= ored until the tables they depend on have been restored. This is useful whe= n you want to be notified of errors as immediately as possible without wait= ing for the entire table to be restored.
  • Insert format restoration - For a flexible rest= oration process, Greenmask now supports data restoration in the INSER= T format. It generates the insert statements based on COPY records from the dump. You do not need to re-dump your data to use this= feature; it can be defined in the restore command. The list o= f new features related to the INSERT format:

    • Generate = INSERT statements with the ON CONFLICT DO NOTHING claus= e if the flag --on-conflict-do-nothing is set.
    • Error exclusion = list in the config to skip certain errors and continue inserti= ng subsequent rows from the dump.
    • Use cases - incremental dump and restoration for logical data. For exampl= e, if you have a database, and you want to insert data periodically from an= other source, this can be used together with the database subset and transf= ormations to catch up the target database.
  • Restore data batching (#173) - By default, the COPY protocol returns the error only= on transaction commit. To override this behavior, use the --batch-si= ze flag to specify the number of rows to insert in a single batch du= ring the COPY command. This is useful when you want to control the transact= ion size and commit.

  • Int= roduced keep_null parameter for RandomPerson = transformer.

  • Introduced dynamic parameters in the transformers=

    • Most transforme= rs now support dynamic parameters where applicable.
    • Dynamic paramet= ers are strictly enforced. If you need to cast values to another type, Gree= nmask provides templates and predefined cast functions accessible via cast_to. These functions cover frequent operations such as Un= ixTimestampToDate and IntToBool.
  • The transformat= ion logic has been significantly refactored, making transformers more custo= mizable and flexible than before.
  • Introduced transformation engines

    • random - generates transformer values based on pseudo-random algorithms.
    • hash - generates transformer values using hash functions. Currently, it utili= zes sha3 hash functions, which are secure but perform slowly. = In the stable release, there will be an option to choose between sha3= and SipHash.
  • Introduced static parameters value template

  • Dump= s retention management - Introduced retention parameters (#201) for the delete command. Introduced two= new statuses: failed and in progress. A dump is considered failed if it la= cks a "done" heartbeat or if the last heartbeat timestamp exceeds 30 minute= s. The delete command now supports the following retention parameters:

    • --dry-run= : Runs the deletion operation in test mode with verbose output, with= out actually deleting anything.
    • --before-= date 2024-08-27T23:50:54+00:00: Deletes dumps older than the specifi= ed date. The date must be provided in RFC3339Nano format, for example: 2021-01-01T00:00:00Z.
    • --retain-= recent 10: Retains the N most recent dumps, where N is specified by = the user.
    • --retain-= for 1w2d3h4m5s6ms7us8ns: Retains dumps for the specified duration. T= he format supports weeks (w), days (d), hours (h), minutes (m), seconds (s)= , milliseconds (ms), microseconds (us), and nanoseconds (ns).
    • --prune-f= ailed: Prunes (removes) all dumps that have failed.
    • --prune-u= nsafe: Prunes dumps with "unknown-or-failed" statuses. This option o= nly works in conjunction with --prune-failed.

Releases list:

Links

Feel free to reach out to us if you have an= y questions or need assistance:

This email was sent to you from Greenmask.io. It was delivered on their beh= alf by the PostgreSQL project. Any questions about the content of the message shou= ld be sent to Greenmask.io.

You were sent this email as a subscriber of the pgsql-announce mai= linglist, for the content tag Related Open Source. To unsubscribe from further emails, or change which emails you want to receive, please click th= e personal unsubscribe link that you can find in the headers of this email, or visit https://lists.postgresql.org/unsubscribe/.
 
--===============3625081730217649155==-- --===============7273329112385339151==--