public inbox for [email protected]
help / color / mirror / Atom feedFrom: Célestin Matte <[email protected]>
To: [email protected]
Subject: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
Date: Wed, 3 Nov 2021 18:02:42 +0100
Message-ID: <[email protected]> (raw)
Hello,
As surprising as it may seem, Message-ID is actually not a mandatory email field [1]. While most MTAs do add this field, some might not, and this will cause load_message.py to crash.
As a solution to this, when this field is missing, this patch:
- attempts to find a "Sent-Message-ID" header and use it as the Message-ID (a case I encountered when trying to import an old mbox)
- generates a new Message-ID if none exists, following (a simpler version of) [2].
[1] https://www.rfc-editor.org/rfc/rfc2822#section-3.6.4
[2] https://datatracker.ietf.org/doc/html/draft-ietf-usefor-message-id-00#section-3
Cheers,
--
Célestin Matte
Attachments:
[text/x-patch] 0001-parser-handler-messages-in-which-Message-ID-is-missi.patch (3.0K, 2-0001-parser-handler-messages-in-which-Message-ID-is-missi.patch)
download | inline diff:
From 36e6a6b67c7f64f524770a852587f8db072604a4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9lestin=20Matte?= <[email protected]>
Date: Wed, 3 Nov 2021 17:09:01 +0100
Subject: [PATCH] parser: handle messages in which Message-ID is missing
Message-ID is not mandatory in emails. When such a message is imported,
attempt to use Resent-Message-ID instead if it exists, or generate a new
one.
---
loader/lib/parser.py | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/loader/lib/parser.py b/loader/lib/parser.py
index 171f197..21e1e48 100644
--- a/loader/lib/parser.py
+++ b/loader/lib/parser.py
@@ -1,6 +1,7 @@
import re
import datetime
import dateutil.parser
+import random
from email.parser import BytesParser
from email.header import decode_header, Header
@@ -28,13 +29,13 @@ class ArchivesParser(object):
# Look for a specific messageid. This means we might parse it twice,
# but so be it. Any exception means we know it's not this one...
try:
- if self.clean_messageid(self.decode_mime_header(self.get_mandatory('Message-ID'))) == msgid:
+ if self.clean_messageid(self.decode_mime_header(self.get_or_generate_messageid())) == msgid:
return True
except Exception:
return False
def analyze(self, date_override=None):
- self.msgid = self.clean_messageid(self.decode_mime_header(self.get_mandatory('Message-ID')))
+ self.msgid = self.clean_messageid(self.decode_mime_header(self.get_or_generate_messageid()))
self._from = self.decode_mime_header(self.get_mandatory('From'), True)
self.to = self.decode_mime_header(self.get_optional('To'), True)
self.cc = self.decode_mime_header(self.get_optional('CC'), True)
@@ -547,6 +548,25 @@ class ArchivesParser(object):
except ValueError as ve:
raise IgnorableException("Failed to decode header value '%s': %s" % (hdr, ve))
+ def get_or_generate_messageid(self):
+ x = self.msg["Message-ID"]
+ if x is None:
+ # If Message-ID is message, try using Resent-Message-ID instead
+ x = self.msg["Resent-Message-ID"]
+ if x is None:
+ # If Resent-Message-ID is missing too, forge a new Message-ID
+ # following a simpler version of
+ # https://datatracker.ietf.org/doc/html/draft-ietf-usefor-message-id-00#section-3
+ date_part = re.sub('[^A-Z0-9]', '', str(self.forgiving_date_decode(self.decode_mime_header(self.get_mandatory('Date')))))
+ random_part = random.getrandbits(64)
+ from_fqdn = self.decode_mime_header(self.get_mandatory('From'), True).split('@')
+ if len(from_fqdn) > 1:
+ fqdn = from_fqdn[1]
+ else:
+ fqdn = ""
+ x = "<" + str(date_part) + "." + str(random_part) + "@" + fqdn + ">"
+ return x
+
def get_mandatory(self, fieldname):
try:
x = self.msg[fieldname]
--
2.33.1
view thread (15+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox