public inbox for [email protected]  
help / color / mirror / Atom feed
[PATCH] Fix recognizing 0x11A7 as a Hangul T syllable in Unicode normalization
4+ messages / 2 participants
[nested] [flat]

* [PATCH] Fix recognizing 0x11A7 as a Hangul T syllable in Unicode normalization
@ 2026-06-01 18:38  Diego Frias <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Diego Frias @ 2026-06-01 18:38 UTC (permalink / raw)
  To: [email protected]

Hi hackers

I was browsing the PostgreSQL’s Unicode normalization code and found an issue where the composition algorithm recognizes 0x11A7 as a T syllable and combines it with subsequent S and V syllables. Per the Unicode specification:

TBase is set to one less than the beginning of the range of trailing consonants, which starts at U+11A8. TCount is set to one more than the number of trailing consonants relevant to the decomposition algorithm: (11C216 - 11A816 + 1) + 1.

In short, TCount actually counts 1 more than the number of T syllables; this is so s % TCount == 0 implies that s has no T syllable (because the 0th place represents the absence of a T syllable), where s is the s-index of a precomposed Hangul character. Anyway, since PostgreSQL recognizes 0x11A7 as a T syllable, the composition algorithm yields a nonsense character when 0x11A7 is put in the T position. See https://github.com/unicode-rs/unicode-normalization/blob/576ae0b1407dd14854876c93f1a348df0c19dffe/sr... for a comment on this bug in Rust’s unicode-rs, and https://github.com/JuliaStrings/utf8proc/commit/0260ba56c81e5ef6f06c0804034a36284bcb8710 for a similar contribution I made to JuliaStrings/utf8proc a few months ago.

Let me know if this patch needs anything else. I can write a test for this, but it looks like the current testing setup in src/common/norm_test.c only runs the Unicode test suite and isn’t built for writing custom tests. If that is something of interest, though, I’m happy to add that to this patch.

Best,
Diego



Attachments:

  [application/octet-stream] v1-0001-Fix-recognizing-0x11A7-as-a-Hangul-T-syllable-in-Uni.patch (1.4K, 2-v1-0001-Fix-recognizing-0x11A7-as-a-Hangul-T-syllable-in-Uni.patch)
  download | inline diff:
From 37d7ba5193a8de6bd31a38a7d93a37b66db1dd9d Mon Sep 17 00:00:00 2001
From: Diego Frias <[email protected]>
Date: Mon, 1 Jun 2026 11:32:41 -0700
Subject: [PATCH] Fix recognizing 0x11A7 as a Hangul T syllable in Unicode
 normalization

0x11A7 is not a valid Hangul T syllable despite being equal to T_BASE.
This is because, per the Unicode spec:

  TBase is set to one less than the beginning of the range of trailing
  consonants, which starts at U+11A8. TCount is set to one more than the
  number of trailing consonants relevant to the decomposition algorithm:
  (11C216 - 11A816 + 1) + 1.

So the first valid Hangul T syllable is 0x11A8. Also see
https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G59434
for where the spec describes the usage of 0x11A8, not 0x11A7, during
composition.
---
 src/common/unicode_norm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/common/unicode_norm.c b/src/common/unicode_norm.c
index cf84f202414..0534ae34640 100644
--- a/src/common/unicode_norm.c
+++ b/src/common/unicode_norm.c
@@ -236,7 +236,7 @@ recompose_code(uint32 start, uint32 code, uint32 *result)
 	/* Check if two current characters are LV and T */
 	else if (start >= SBASE && start < (SBASE + SCOUNT) &&
 			 ((start - SBASE) % TCOUNT) == 0 &&
-			 code >= TBASE && code < (TBASE + TCOUNT))
+			 code > TBASE && code < (TBASE + TCOUNT))
 	{
 		/* make syllable of form LVT */
 		uint32		tindex = code - TBASE;
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Fix recognizing 0x11A7 as a Hangul T syllable in Unicode normalization
@ 2026-06-04 04:07  Michael Paquier <[email protected]>
  parent: Diego Frias <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Michael Paquier @ 2026-06-04 04:07 UTC (permalink / raw)
  To: Diego Frias <[email protected]>; +Cc: [email protected]

On Mon, Jun 01, 2026 at 11:38:32AM -0700, Diego Frias wrote:
> In short, TCount actually counts 1 more than the number of T
> syllables; this is so s % TCount == 0 implies that s has no T
> syllable (because the 0th place represents the absence of a T
> syllable), where s is the s-index of a precomposed Hangul
> character. Anyway, since PostgreSQL recognizes 0x11A7 as a T
> syllable, the composition algorithm yields a nonsense character when
> 0x11A7 is put in the T position.

Oops.  Yes, including TBASE in the recomposition is incorrect, finding
your quote here (TBase is set to one less..):
https://unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G59688

The character gets eaten by the normalization.  Pas glop.

> Let me know if this patch needs anything else. I can write a test
> for this, but it looks like the current testing setup in
> src/common/norm_test.c only runs the Unicode test suite and isn’t
> built for writing custom tests. If that is something of interest,
> though, I’m happy to add that to this patch.

We have a set of tests in src/test/regress/sql/unicode.sql that would
fit nicely with what you want to address here.  For this specific
problem, this would work:
SELECT normalize(U&'\AC00\11A7', NFC) = U&'\AC00\11A7';

How about adding more normalization check patterns, while on it?  I am
finishing with the attached, all things combined.  Diego. what do you
think?
--
Michael

From 0614fd3227eedffe91c31468def76400fd01d134 Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Thu, 4 Jun 2026 12:56:36 +0900
Subject: [PATCH] Fix off-by-one with NFC recomposition for Hangul U+11A7
 (TBASE)

The NFC recomposition incorrectly included TBASE as a valid T syllable,
which is incorrect based on the Unicode spec (TBASE is one below the
start of the range, range beginning at U+11A8).

This would cause the TBASE to be silently swallowed in the
normalization, leading to an incorrect result.

A couple of regression tests are added to check more patterns with
Hangul recomposition and decomposition, on top of a test to check this
issue with TBASE.

Author: Diego Frias <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 14
---
 src/common/unicode_norm.c             |  2 +-
 src/test/regress/expected/unicode.out | 78 +++++++++++++++++++++++++++
 src/test/regress/sql/unicode.sql      | 20 +++++++
 3 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/src/common/unicode_norm.c b/src/common/unicode_norm.c
index cf84f2024140..0534ae34640f 100644
--- a/src/common/unicode_norm.c
+++ b/src/common/unicode_norm.c
@@ -236,7 +236,7 @@ recompose_code(uint32 start, uint32 code, uint32 *result)
 	/* Check if two current characters are LV and T */
 	else if (start >= SBASE && start < (SBASE + SCOUNT) &&
 			 ((start - SBASE) % TCOUNT) == 0 &&
-			 code >= TBASE && code < (TBASE + TCOUNT))
+			 code > TBASE && code < (TBASE + TCOUNT))
 	{
 		/* make syllable of form LVT */
 		uint32		tindex = code - TBASE;
diff --git a/src/test/regress/expected/unicode.out b/src/test/regress/expected/unicode.out
index 1e06de226491..63e48d3a961f 100644
--- a/src/test/regress/expected/unicode.out
+++ b/src/test/regress/expected/unicode.out
@@ -105,3 +105,81 @@ ORDER BY num;
 
 SELECT is_normalized('abc', 'def');  -- run-time error
 ERROR:  invalid normalization form: def
+-- Hangul NFC recomposition tests
+-- L+V -> LV composition (first and last)
+SELECT normalize(U&'\1100\1161', NFC) = U&'\AC00' COLLATE "C" AS hangul_lv_first;
+ hangul_lv_first 
+-----------------
+ t
+(1 row)
+
+SELECT normalize(U&'\1112\1175', NFC) = U&'\D788' COLLATE "C" AS hangul_lv_last;
+ hangul_lv_last 
+----------------
+ t
+(1 row)
+
+-- LV+T -> LVT composition
+SELECT normalize(U&'\AC00\11A8', NFC) = U&'\AC01' COLLATE "C" AS hangul_lvt_first_t;
+ hangul_lvt_first_t 
+--------------------
+ t
+(1 row)
+
+SELECT normalize(U&'\AC00\11C2', NFC) = U&'\AC1B' COLLATE "C" AS hangul_lvt_last_t;
+ hangul_lvt_last_t 
+-------------------
+ t
+(1 row)
+
+SELECT normalize(U&'\D788\11A8', NFC) = U&'\D789' COLLATE "C" AS hangul_lvt_last_lv;
+ hangul_lvt_last_lv 
+--------------------
+ t
+(1 row)
+
+-- L+V+T -> LVT composition
+SELECT normalize(U&'\1100\1161\11A8', NFC) = U&'\AC01' COLLATE "C" AS hangul_full_lvt;
+ hangul_full_lvt 
+-----------------
+ t
+(1 row)
+
+SELECT normalize(U&'\1112\1175\11C2', NFC) = U&'\D7A3' COLLATE "C" AS hangul_full_lvt;
+ hangul_full_lvt 
+-----------------
+ t
+(1 row)
+
+-- TBASE invalid T syllable
+SELECT normalize(U&'\AC00\11A7', NFC) = U&'\AC00\11A7' COLLATE "C" AS hangul_tbase_not_combined;
+ hangul_tbase_not_combined 
+---------------------------
+ t
+(1 row)
+
+SELECT normalize(U&'\1100\1161\11A7', NFC) = U&'\AC00\11A7' COLLATE "C" AS hangul_lv_tbase_separate;
+ hangul_lv_tbase_separate 
+--------------------------
+ t
+(1 row)
+
+-- Hangul NFD decomposition tests
+SELECT normalize(U&'\AC00', NFD) = U&'\1100\1161' COLLATE "C" AS hangul_nfd_lv;
+ hangul_nfd_lv 
+---------------
+ t
+(1 row)
+
+SELECT normalize(U&'\AC01', NFD) = U&'\1100\1161\11A8' COLLATE "C" AS hangul_nfd_lvt;
+ hangul_nfd_lvt 
+----------------
+ t
+(1 row)
+
+SELECT normalize(U&'\D7A3', NFD) = U&'\1112\1175\11C2' COLLATE "C" AS hangul_nfd_last;
+ hangul_nfd_last 
+-----------------
+ t
+(1 row)
+
diff --git a/src/test/regress/sql/unicode.sql b/src/test/regress/sql/unicode.sql
index e50adb68ed0d..951f86a336e8 100644
--- a/src/test/regress/sql/unicode.sql
+++ b/src/test/regress/sql/unicode.sql
@@ -36,3 +36,23 @@ FROM
 ORDER BY num;
 
 SELECT is_normalized('abc', 'def');  -- run-time error
+
+-- Hangul NFC recomposition tests
+-- L+V -> LV composition (first and last)
+SELECT normalize(U&'\1100\1161', NFC) = U&'\AC00' COLLATE "C" AS hangul_lv_first;
+SELECT normalize(U&'\1112\1175', NFC) = U&'\D788' COLLATE "C" AS hangul_lv_last;
+-- LV+T -> LVT composition
+SELECT normalize(U&'\AC00\11A8', NFC) = U&'\AC01' COLLATE "C" AS hangul_lvt_first_t;
+SELECT normalize(U&'\AC00\11C2', NFC) = U&'\AC1B' COLLATE "C" AS hangul_lvt_last_t;
+SELECT normalize(U&'\D788\11A8', NFC) = U&'\D789' COLLATE "C" AS hangul_lvt_last_lv;
+-- L+V+T -> LVT composition
+SELECT normalize(U&'\1100\1161\11A8', NFC) = U&'\AC01' COLLATE "C" AS hangul_full_lvt;
+SELECT normalize(U&'\1112\1175\11C2', NFC) = U&'\D7A3' COLLATE "C" AS hangul_full_lvt;
+-- TBASE invalid T syllable
+SELECT normalize(U&'\AC00\11A7', NFC) = U&'\AC00\11A7' COLLATE "C" AS hangul_tbase_not_combined;
+SELECT normalize(U&'\1100\1161\11A7', NFC) = U&'\AC00\11A7' COLLATE "C" AS hangul_lv_tbase_separate;
+
+-- Hangul NFD decomposition tests
+SELECT normalize(U&'\AC00', NFD) = U&'\1100\1161' COLLATE "C" AS hangul_nfd_lv;
+SELECT normalize(U&'\AC01', NFD) = U&'\1100\1161\11A8' COLLATE "C" AS hangul_nfd_lvt;
+SELECT normalize(U&'\D7A3', NFD) = U&'\1112\1175\11C2' COLLATE "C" AS hangul_nfd_last;
-- 
2.54.0



Attachments:

  [text/plain] 0001-Fix-off-by-one-with-NFC-recomposition-for-Hangul-U-1.patch (5.3K, 2-0001-Fix-off-by-one-with-NFC-recomposition-for-Hangul-U-1.patch)
  download | inline diff:
From 0614fd3227eedffe91c31468def76400fd01d134 Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Thu, 4 Jun 2026 12:56:36 +0900
Subject: [PATCH] Fix off-by-one with NFC recomposition for Hangul U+11A7
 (TBASE)

The NFC recomposition incorrectly included TBASE as a valid T syllable,
which is incorrect based on the Unicode spec (TBASE is one below the
start of the range, range beginning at U+11A8).

This would cause the TBASE to be silently swallowed in the
normalization, leading to an incorrect result.

A couple of regression tests are added to check more patterns with
Hangul recomposition and decomposition, on top of a test to check this
issue with TBASE.

Author: Diego Frias <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 14
---
 src/common/unicode_norm.c             |  2 +-
 src/test/regress/expected/unicode.out | 78 +++++++++++++++++++++++++++
 src/test/regress/sql/unicode.sql      | 20 +++++++
 3 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/src/common/unicode_norm.c b/src/common/unicode_norm.c
index cf84f2024140..0534ae34640f 100644
--- a/src/common/unicode_norm.c
+++ b/src/common/unicode_norm.c
@@ -236,7 +236,7 @@ recompose_code(uint32 start, uint32 code, uint32 *result)
 	/* Check if two current characters are LV and T */
 	else if (start >= SBASE && start < (SBASE + SCOUNT) &&
 			 ((start - SBASE) % TCOUNT) == 0 &&
-			 code >= TBASE && code < (TBASE + TCOUNT))
+			 code > TBASE && code < (TBASE + TCOUNT))
 	{
 		/* make syllable of form LVT */
 		uint32		tindex = code - TBASE;
diff --git a/src/test/regress/expected/unicode.out b/src/test/regress/expected/unicode.out
index 1e06de226491..63e48d3a961f 100644
--- a/src/test/regress/expected/unicode.out
+++ b/src/test/regress/expected/unicode.out
@@ -105,3 +105,81 @@ ORDER BY num;
 
 SELECT is_normalized('abc', 'def');  -- run-time error
 ERROR:  invalid normalization form: def
+-- Hangul NFC recomposition tests
+-- L+V -> LV composition (first and last)
+SELECT normalize(U&'\1100\1161', NFC) = U&'\AC00' COLLATE "C" AS hangul_lv_first;
+ hangul_lv_first 
+-----------------
+ t
+(1 row)
+
+SELECT normalize(U&'\1112\1175', NFC) = U&'\D788' COLLATE "C" AS hangul_lv_last;
+ hangul_lv_last 
+----------------
+ t
+(1 row)
+
+-- LV+T -> LVT composition
+SELECT normalize(U&'\AC00\11A8', NFC) = U&'\AC01' COLLATE "C" AS hangul_lvt_first_t;
+ hangul_lvt_first_t 
+--------------------
+ t
+(1 row)
+
+SELECT normalize(U&'\AC00\11C2', NFC) = U&'\AC1B' COLLATE "C" AS hangul_lvt_last_t;
+ hangul_lvt_last_t 
+-------------------
+ t
+(1 row)
+
+SELECT normalize(U&'\D788\11A8', NFC) = U&'\D789' COLLATE "C" AS hangul_lvt_last_lv;
+ hangul_lvt_last_lv 
+--------------------
+ t
+(1 row)
+
+-- L+V+T -> LVT composition
+SELECT normalize(U&'\1100\1161\11A8', NFC) = U&'\AC01' COLLATE "C" AS hangul_full_lvt;
+ hangul_full_lvt 
+-----------------
+ t
+(1 row)
+
+SELECT normalize(U&'\1112\1175\11C2', NFC) = U&'\D7A3' COLLATE "C" AS hangul_full_lvt;
+ hangul_full_lvt 
+-----------------
+ t
+(1 row)
+
+-- TBASE invalid T syllable
+SELECT normalize(U&'\AC00\11A7', NFC) = U&'\AC00\11A7' COLLATE "C" AS hangul_tbase_not_combined;
+ hangul_tbase_not_combined 
+---------------------------
+ t
+(1 row)
+
+SELECT normalize(U&'\1100\1161\11A7', NFC) = U&'\AC00\11A7' COLLATE "C" AS hangul_lv_tbase_separate;
+ hangul_lv_tbase_separate 
+--------------------------
+ t
+(1 row)
+
+-- Hangul NFD decomposition tests
+SELECT normalize(U&'\AC00', NFD) = U&'\1100\1161' COLLATE "C" AS hangul_nfd_lv;
+ hangul_nfd_lv 
+---------------
+ t
+(1 row)
+
+SELECT normalize(U&'\AC01', NFD) = U&'\1100\1161\11A8' COLLATE "C" AS hangul_nfd_lvt;
+ hangul_nfd_lvt 
+----------------
+ t
+(1 row)
+
+SELECT normalize(U&'\D7A3', NFD) = U&'\1112\1175\11C2' COLLATE "C" AS hangul_nfd_last;
+ hangul_nfd_last 
+-----------------
+ t
+(1 row)
+
diff --git a/src/test/regress/sql/unicode.sql b/src/test/regress/sql/unicode.sql
index e50adb68ed0d..951f86a336e8 100644
--- a/src/test/regress/sql/unicode.sql
+++ b/src/test/regress/sql/unicode.sql
@@ -36,3 +36,23 @@ FROM
 ORDER BY num;
 
 SELECT is_normalized('abc', 'def');  -- run-time error
+
+-- Hangul NFC recomposition tests
+-- L+V -> LV composition (first and last)
+SELECT normalize(U&'\1100\1161', NFC) = U&'\AC00' COLLATE "C" AS hangul_lv_first;
+SELECT normalize(U&'\1112\1175', NFC) = U&'\D788' COLLATE "C" AS hangul_lv_last;
+-- LV+T -> LVT composition
+SELECT normalize(U&'\AC00\11A8', NFC) = U&'\AC01' COLLATE "C" AS hangul_lvt_first_t;
+SELECT normalize(U&'\AC00\11C2', NFC) = U&'\AC1B' COLLATE "C" AS hangul_lvt_last_t;
+SELECT normalize(U&'\D788\11A8', NFC) = U&'\D789' COLLATE "C" AS hangul_lvt_last_lv;
+-- L+V+T -> LVT composition
+SELECT normalize(U&'\1100\1161\11A8', NFC) = U&'\AC01' COLLATE "C" AS hangul_full_lvt;
+SELECT normalize(U&'\1112\1175\11C2', NFC) = U&'\D7A3' COLLATE "C" AS hangul_full_lvt;
+-- TBASE invalid T syllable
+SELECT normalize(U&'\AC00\11A7', NFC) = U&'\AC00\11A7' COLLATE "C" AS hangul_tbase_not_combined;
+SELECT normalize(U&'\1100\1161\11A7', NFC) = U&'\AC00\11A7' COLLATE "C" AS hangul_lv_tbase_separate;
+
+-- Hangul NFD decomposition tests
+SELECT normalize(U&'\AC00', NFD) = U&'\1100\1161' COLLATE "C" AS hangul_nfd_lv;
+SELECT normalize(U&'\AC01', NFD) = U&'\1100\1161\11A8' COLLATE "C" AS hangul_nfd_lvt;
+SELECT normalize(U&'\D7A3', NFD) = U&'\1112\1175\11C2' COLLATE "C" AS hangul_nfd_last;
-- 
2.54.0



  [application/pgp-signature] signature.asc (833B, 3-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Fix recognizing 0x11A7 as a Hangul T syllable in Unicode normalization
@ 2026-06-04 16:32  Diego Frias <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Diego Frias @ 2026-06-04 16:32 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: [email protected]

Looks great! Thanks for letting me know where the tests live. I’ll
try to get these tests in the official Unicode test suite, too. Should
help future implementors.

Thanks,
Diego

> On Jun 3, 2026, at 9:07 PM, Michael Paquier <[email protected]> wrote:
> 
> On Mon, Jun 01, 2026 at 11:38:32AM -0700, Diego Frias wrote:
>> In short, TCount actually counts 1 more than the number of T
>> syllables; this is so s % TCount == 0 implies that s has no T
>> syllable (because the 0th place represents the absence of a T
>> syllable), where s is the s-index of a precomposed Hangul
>> character. Anyway, since PostgreSQL recognizes 0x11A7 as a T
>> syllable, the composition algorithm yields a nonsense character when
>> 0x11A7 is put in the T position.
> 
> Oops.  Yes, including TBASE in the recomposition is incorrect, finding
> your quote here (TBase is set to one less..):
> https://unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G59688
> 
> The character gets eaten by the normalization.  Pas glop.
> 
>> Let me know if this patch needs anything else. I can write a test
>> for this, but it looks like the current testing setup in
>> src/common/norm_test.c only runs the Unicode test suite and isn’t
>> built for writing custom tests. If that is something of interest,
>> though, I’m happy to add that to this patch.
> 
> We have a set of tests in src/test/regress/sql/unicode.sql that would
> fit nicely with what you want to address here.  For this specific
> problem, this would work:
> SELECT normalize(U&'\AC00\11A7', NFC) = U&'\AC00\11A7';
> 
> How about adding more normalization check patterns, while on it?  I am
> finishing with the attached, all things combined.  Diego. what do you
> think?
> --
> Michael
> <0001-Fix-off-by-one-with-NFC-recomposition-for-Hangul-U-1.patch>







^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Fix recognizing 0x11A7 as a Hangul T syllable in Unicode normalization
@ 2026-06-04 22:55  Michael Paquier <[email protected]>
  parent: Diego Frias <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Michael Paquier @ 2026-06-04 22:55 UTC (permalink / raw)
  To: Diego Frias <[email protected]>; +Cc: [email protected]

On Thu, Jun 04, 2026 at 09:32:53AM -0700, Diego Frias wrote:
> Looks great! Thanks for letting me know where the tests live. I’ll
> try to get these tests in the official Unicode test suite, too. Should
> help future implementors.

Thanks.  Applied and backpatched down to v14.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2026-06-04 22:55 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-06-01 18:38 [PATCH] Fix recognizing 0x11A7 as a Hangul T syllable in Unicode normalization Diego Frias <[email protected]>
2026-06-04 04:07 ` Michael Paquier <[email protected]>
2026-06-04 16:32   ` Diego Frias <[email protected]>
2026-06-04 22:55     ` Michael Paquier <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox