rtf-body.rtf?

While recently getting a rather large mail archive to search through, in PST-format (obviously) some of the mail boxes contained e-mails that had “rtf-body.rtf” for body text. Yeah, Microsoft really likes Linux.

Anyway, I took a deep breath and wrote some sloppy Python. I now can fix those e-mails server side (a Dovecot imap server). This is just a small note to self – but should you happen to be in the same situation, please use the script below:

#!/usr/bin/python3
import base64
import os
import sys
import re
import mimetypes
import email
from email.policy import default
from email.parser import BytesParser
import subprocess

plcy=default.clone(refold_source='none')
for fname in sys.argv[1:]:
  print(fname)
  try:
    mail=open(fname,'rb')
  except:
    print("Not found")
    continue
  msg = BytesParser(policy=plcy).parse(mail)
  mail.close()
  totaal=list(msg.walk())
  if (totaal[1].get_content_type() == 'application/rtf'):
    print("convert")
    html=subprocess.run(['/usr/bin/unrtf'], input=totaal[1].get_content(), capture_output=True).stdout
    totaal[1].set_content(html, maintype='text',subtype='html')
    try:
      mail=open(fname,'w')
    except:
      print("Error writing")
      continue
    print(totaal[0], file=mail)
    mail.close()

As said: it’s a bit messy. It just reads the e-mail as a file, checks if the first mime message is application/rtf and if so, pushes the content through the unrtf utility to make it HTML. Fun fact: the Python email module pushes the right buttons automatically, i.e. replacing totaal[1] with something else will keep the message intact, so the print statement at the end pushes out the e-mail with fixed content.

Running it is simple: on the server, go to /var/mail/username/.Some.mail.archive/cur and use ~/rtftohtmlmail.py *

Please note that there’s not much error checking. So please please please make a copy of your mail box first!

Leave a Reply

Your email address will not be published. Required fields are marked *