Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miJf0-0005PD-UU for pgsql-www@arkaria.postgresql.org; Wed, 03 Nov 2021 17:02:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1miJez-00008j-P9 for pgsql-www@arkaria.postgresql.org; Wed, 03 Nov 2021 17:02:49 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miJez-000082-GW for pgsql-www@lists.postgresql.org; Wed, 03 Nov 2021 17:02:49 +0000 Received: from ploudseeker.com ([78.199.165.48]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miJew-00030a-I3 for pgsql-www@lists.postgresql.org; Wed, 03 Nov 2021 17:02:48 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cmatte.me; s=myselector; t=1635958963; bh=8NbVgTAhesQydgDRsVqNIPFMkG8UmBG5+9vjCHT1KXY=; h=Date:To:From:Subject; b=akrVULSdfV9MaTmkHtxu19kdCg8Uo1wQGFhgZgwFaj4quTodOTYddB9cvFA56rYgJ gEz4USTG9s/ad3/HVWC4mUDpRED83PT2IBOZgBNE4W4FPQNQOzKywreNlxPENL1tOy frLpoXLuWZxXyZC7Lc8bKwCeCeAPo1dyXL8oC4+D2W/BSe/SZLwERjmS4LGZivt8CT rBH9BE6kdu2JM504RRp0SF/tU5egsfJ1t3zIFEzld8dfnrzFaekN3byOME6wQYxhcQ NQQrVlGLCKGKVTJzOf+atmmtbUVKcIXdDmVPWBQzbK6LPxiM7BiXrSIyPdSZuDdIMl kClUQZr5ofATg== Content-Type: multipart/mixed; boundary="------------EpU0Y0anA3JBgW6eIivpHCbL" Message-ID: Date: Wed, 3 Nov 2021 18:02:42 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 To: pgsql-www@lists.postgresql.org Content-Language: en-US From: =?UTF-8?Q?C=c3=a9lestin_Matte?= Subject: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------EpU0Y0anA3JBgW6eIivpHCbL Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hello, As surprising as it may seem, Message-ID is actually not a mandatory email field [1]. While most MTAs do add this field, some might not, and this will cause load_message.py to crash. As a solution to this, when this field is missing, this patch: - attempts to find a "Sent-Message-ID" header and use it as the Message-ID (a case I encountered when trying to import an old mbox) - generates a new Message-ID if none exists, following (a simpler version of) [2]. [1] https://www.rfc-editor.org/rfc/rfc2822#section-3.6.4 [2] https://datatracker.ietf.org/doc/html/draft-ietf-usefor-message-id-00#section-3 Cheers, -- Célestin Matte --------------EpU0Y0anA3JBgW6eIivpHCbL Content-Type: text/x-patch; charset=UTF-8; name="0001-parser-handler-messages-in-which-Message-ID-is-missi.patch" Content-Disposition: attachment; filename*0="0001-parser-handler-messages-in-which-Message-ID-is-missi.pa"; filename*1="tch" Content-Transfer-Encoding: base64 RnJvbSAzNmU2YTZiNjdjN2Y2NGY1MjQ3NzBhODUyNTg3ZjhkYjA3MjYwNGE0IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiA9P1VURi04P3E/Qz1DMz1BOWxlc3Rpbj0yME1hdHRl Pz0gPGdpdGxhYkBjbWF0dGUubWU+CkRhdGU6IFdlZCwgMyBOb3YgMjAyMSAxNzowOTowMSAr MDEwMApTdWJqZWN0OiBbUEFUQ0hdIHBhcnNlcjogaGFuZGxlIG1lc3NhZ2VzIGluIHdoaWNo IE1lc3NhZ2UtSUQgaXMgbWlzc2luZwoKTWVzc2FnZS1JRCBpcyBub3QgbWFuZGF0b3J5IGlu IGVtYWlscy4gV2hlbiBzdWNoIGEgbWVzc2FnZSBpcyBpbXBvcnRlZCwKYXR0ZW1wdCB0byB1 c2UgUmVzZW50LU1lc3NhZ2UtSUQgaW5zdGVhZCBpZiBpdCBleGlzdHMsIG9yIGdlbmVyYXRl IGEgbmV3Cm9uZS4KLS0tCiBsb2FkZXIvbGliL3BhcnNlci5weSB8IDI0ICsrKysrKysrKysr KysrKysrKysrKystLQogMSBmaWxlIGNoYW5nZWQsIDIyIGluc2VydGlvbnMoKyksIDIgZGVs ZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvbG9hZGVyL2xpYi9wYXJzZXIucHkgYi9sb2FkZXIv bGliL3BhcnNlci5weQppbmRleCAxNzFmMTk3Li4yMWUxZTQ4IDEwMDY0NAotLS0gYS9sb2Fk ZXIvbGliL3BhcnNlci5weQorKysgYi9sb2FkZXIvbGliL3BhcnNlci5weQpAQCAtMSw2ICsx LDcgQEAKIGltcG9ydCByZQogaW1wb3J0IGRhdGV0aW1lCiBpbXBvcnQgZGF0ZXV0aWwucGFy c2VyCitpbXBvcnQgcmFuZG9tCiAKIGZyb20gZW1haWwucGFyc2VyIGltcG9ydCBCeXRlc1Bh cnNlcgogZnJvbSBlbWFpbC5oZWFkZXIgaW1wb3J0IGRlY29kZV9oZWFkZXIsIEhlYWRlcgpA QCAtMjgsMTMgKzI5LDEzIEBAIGNsYXNzIEFyY2hpdmVzUGFyc2VyKG9iamVjdCk6CiAgICAg ICAgICMgTG9vayBmb3IgYSBzcGVjaWZpYyBtZXNzYWdlaWQuIFRoaXMgbWVhbnMgd2UgbWln aHQgcGFyc2UgaXQgdHdpY2UsCiAgICAgICAgICMgYnV0IHNvIGJlIGl0LiBBbnkgZXhjZXB0 aW9uIG1lYW5zIHdlIGtub3cgaXQncyBub3QgdGhpcyBvbmUuLi4KICAgICAgICAgdHJ5Ogot ICAgICAgICAgICAgaWYgc2VsZi5jbGVhbl9tZXNzYWdlaWQoc2VsZi5kZWNvZGVfbWltZV9o ZWFkZXIoc2VsZi5nZXRfbWFuZGF0b3J5KCdNZXNzYWdlLUlEJykpKSA9PSBtc2dpZDoKKyAg ICAgICAgICAgIGlmIHNlbGYuY2xlYW5fbWVzc2FnZWlkKHNlbGYuZGVjb2RlX21pbWVfaGVh ZGVyKHNlbGYuZ2V0X29yX2dlbmVyYXRlX21lc3NhZ2VpZCgpKSkgPT0gbXNnaWQ6CiAgICAg ICAgICAgICAgICAgcmV0dXJuIFRydWUKICAgICAgICAgZXhjZXB0IEV4Y2VwdGlvbjoKICAg ICAgICAgICAgIHJldHVybiBGYWxzZQogCiAgICAgZGVmIGFuYWx5emUoc2VsZiwgZGF0ZV9v dmVycmlkZT1Ob25lKToKLSAgICAgICAgc2VsZi5tc2dpZCA9IHNlbGYuY2xlYW5fbWVzc2Fn ZWlkKHNlbGYuZGVjb2RlX21pbWVfaGVhZGVyKHNlbGYuZ2V0X21hbmRhdG9yeSgnTWVzc2Fn ZS1JRCcpKSkKKyAgICAgICAgc2VsZi5tc2dpZCA9IHNlbGYuY2xlYW5fbWVzc2FnZWlkKHNl bGYuZGVjb2RlX21pbWVfaGVhZGVyKHNlbGYuZ2V0X29yX2dlbmVyYXRlX21lc3NhZ2VpZCgp KSkKICAgICAgICAgc2VsZi5fZnJvbSA9IHNlbGYuZGVjb2RlX21pbWVfaGVhZGVyKHNlbGYu Z2V0X21hbmRhdG9yeSgnRnJvbScpLCBUcnVlKQogICAgICAgICBzZWxmLnRvID0gc2VsZi5k ZWNvZGVfbWltZV9oZWFkZXIoc2VsZi5nZXRfb3B0aW9uYWwoJ1RvJyksIFRydWUpCiAgICAg ICAgIHNlbGYuY2MgPSBzZWxmLmRlY29kZV9taW1lX2hlYWRlcihzZWxmLmdldF9vcHRpb25h bCgnQ0MnKSwgVHJ1ZSkKQEAgLTU0Nyw2ICs1NDgsMjUgQEAgY2xhc3MgQXJjaGl2ZXNQYXJz ZXIob2JqZWN0KToKICAgICAgICAgZXhjZXB0IFZhbHVlRXJyb3IgYXMgdmU6CiAgICAgICAg ICAgICByYWlzZSBJZ25vcmFibGVFeGNlcHRpb24oIkZhaWxlZCB0byBkZWNvZGUgaGVhZGVy IHZhbHVlICclcyc6ICVzIiAlIChoZHIsIHZlKSkKIAorICAgIGRlZiBnZXRfb3JfZ2VuZXJh dGVfbWVzc2FnZWlkKHNlbGYpOgorICAgICAgICB4ID0gc2VsZi5tc2dbIk1lc3NhZ2UtSUQi XQorICAgICAgICBpZiB4IGlzIE5vbmU6CisgICAgICAgICAgICAjIElmIE1lc3NhZ2UtSUQg aXMgbWVzc2FnZSwgdHJ5IHVzaW5nIFJlc2VudC1NZXNzYWdlLUlEIGluc3RlYWQKKyAgICAg ICAgICAgIHggPSBzZWxmLm1zZ1siUmVzZW50LU1lc3NhZ2UtSUQiXQorICAgICAgICBpZiB4 IGlzIE5vbmU6CisgICAgICAgICAgICAjIElmIFJlc2VudC1NZXNzYWdlLUlEIGlzIG1pc3Np bmcgdG9vLCBmb3JnZSBhIG5ldyBNZXNzYWdlLUlECisgICAgICAgICAgICAjIGZvbGxvd2lu ZyBhIHNpbXBsZXIgdmVyc2lvbiBvZgorICAgICAgICAgICAgIyBodHRwczovL2RhdGF0cmFj a2VyLmlldGYub3JnL2RvYy9odG1sL2RyYWZ0LWlldGYtdXNlZm9yLW1lc3NhZ2UtaWQtMDAj c2VjdGlvbi0zCisgICAgICAgICAgICBkYXRlX3BhcnQgPSByZS5zdWIoJ1teQS1aMC05XScs ICcnLCBzdHIoc2VsZi5mb3JnaXZpbmdfZGF0ZV9kZWNvZGUoc2VsZi5kZWNvZGVfbWltZV9o ZWFkZXIoc2VsZi5nZXRfbWFuZGF0b3J5KCdEYXRlJykpKSkpCisgICAgICAgICAgICByYW5k b21fcGFydCA9IHJhbmRvbS5nZXRyYW5kYml0cyg2NCkKKyAgICAgICAgICAgIGZyb21fZnFk biA9IHNlbGYuZGVjb2RlX21pbWVfaGVhZGVyKHNlbGYuZ2V0X21hbmRhdG9yeSgnRnJvbScp LCBUcnVlKS5zcGxpdCgnQCcpCisgICAgICAgICAgICBpZiBsZW4oZnJvbV9mcWRuKSA+IDE6 CisgICAgICAgICAgICAgICAgZnFkbiA9IGZyb21fZnFkblsxXQorICAgICAgICAgICAgZWxz ZToKKyAgICAgICAgICAgICAgICBmcWRuID0gIiIKKyAgICAgICAgICAgIHggPSAiPCIgKyBz dHIoZGF0ZV9wYXJ0KSArICIuIiArIHN0cihyYW5kb21fcGFydCkgKyAiQCIgKyBmcWRuICsg Ij4iCisgICAgICAgIHJldHVybiB4CisKICAgICBkZWYgZ2V0X21hbmRhdG9yeShzZWxmLCBm aWVsZG5hbWUpOgogICAgICAgICB0cnk6CiAgICAgICAgICAgICB4ID0gc2VsZi5tc2dbZmll bGRuYW1lXQotLSAKMi4zMy4xCgo= --------------EpU0Y0anA3JBgW6eIivpHCbL--