While recently getting a rather large mail archive to search through, in PST-format (obviously) some of the mail boxes contained e-mails that had “rtf-body.rtf” for body text. Yeah, Microsoft really likes Linux.
Anyway, I took a deep breath and wrote some sloppy Python. I now can fix those e-mails server side (a Dovecot imap server). This is just a small note to self – but should you happen to be in the same situation, please use the script below:
#!/usr/bin/python3
import base64
import os
import sys
import re
import mimetypes
import email
from email.policy import default
from email.parser import BytesParser
import subprocess
plcy=default.clone(refold_source='none')
for fname in sys.argv[1:]:
print(fname)
try:
mail=open(fname,'rb')
except:
print("Not found")
continue
msg = BytesParser(policy=plcy).parse(mail)
mail.close()
totaal=list(msg.walk())
if (totaal[1].get_content_type() == 'application/rtf'):
print("convert")
html=subprocess.run(['/usr/bin/unrtf'], input=totaal[1].get_content(), capture_output=True).stdout
totaal[1].set_content(html, maintype='text',subtype='html')
try:
mail=open(fname,'w')
except:
print("Error writing")
continue
print(totaal[0], file=mail)
mail.close()
As said: it’s a bit messy. It just reads the e-mail as a file, checks if the first mime message is application/rtf
and if so, pushes the content through the unrtf
utility to make it HTML
. Fun fact: the Python email module pushes the right buttons automatically, i.e. replacing totaal[1] with something else will keep the message intact, so the print statement at the end pushes out the e-mail with fixed content.
Running it is simple: on the server, go to /var/mail/username/.Some.mail.archive/cur
and use ~/rtftohtmlmail.py *
Please note that there’s not much error checking. So please please please make a copy of your mail box first!