Powerpoint Files Changing in Transit

I think this is a story worth telling.

At the site of my current customer, we use Mediawiki for internal knowledge management — the same software that runs Wikipedia. I have set up the installation and added quite a few special features to it. It’s a very nice software package to work with.

A few days ago, the head of our group tried to upload a Powerpoint file into the Wiki, but it wouldn’t work. The Mediawiki software complained that although the extension of the file was .ppt, it didn’t look like a Powerpoint file (wrong MIME type: application/zip instead of application/vnd.ms-powerpoint). I asked him to send me the file via e-mail. I tried to upload it, and for me it worked just fine. We switched off the MIME check in Mediawiki, and then he could upload the file as well. But it was a different file that arrived at the server — about 4kB shorter than the one I had uploaded.

We were baffled. We suspected a bug in his web browser, since he used a lower version of Firefox than I did. Maybe an incompatibility between the browser and the Apache web server, a problem during compression negotiation, or a problem with PHP which ultimately handles the upload and writes the file to the disk on the server. Searching the Internet for any problems in this area yielded no results however. It looked like a complete mystery.

To make the story short, after two days of research, we found out what had happened. The Powerpoint file that my customer tried to upload was indeed not recognized by Mediawiki’s MIME detection. It was not a problem with the file upload, which worked flawlessly. But when he sent me the file via e-mail (in Outlook), it was a different file that arrived in my Inbox. The file had silently been re-encoded by Outlook: it was now 4kB larger, and a byte-by-byte comparison revealed that several blocks inside the file had changed (not just added or dropped). The resulting file was considered a legal Powerpoint file by Mediawiki. The MIME type was correct, and I could upload it without problems.

What I want to know is this: How on earth does an e-mail program dare to change a file that I’m sending as an attachment?

Don’t get me wrong on this: Every software has bugs, and sometimes embarassing ones. Software that I have written has them too. But this behaviour of Outlook/Powerpoint to me reveals a system philosophy that I find totally unacceptable. An e-mail program is just not supposed to do that, and it breaks a fundamental assumption that the user has about the software he’s working with.

I am once more glad that whenever I buy a new computer, my first action is to wipe every single Microsoft bit from the hard disk. I’m a happy Ubuntu user. No, it’s not perfect, and sometimes not as polished as Microsoft’s or Apple’s software (although getting ever closer to it). But it is the kind of software that an IT professional can work with, without his sense of logic being insulted.

For reference, these are the software versions involved:

  • Mediawiki 1.15.0
  • Microsoft Office Outlook 2003 (11.8313.8221) SP3
  • The Powerpoint files were originally written in Office 2007 (.pptx), and then saved in Office 2000 format (not quite sure if that’s a meaningful version identification).