3 min read

Wish for browsers : Adopt MHTML format

This is a request to the communities behind all the open source browsers: Please adopt the MHTML format (or even better, the Mozilla Archive Format) and make it a native part of the browsers.

Use cases

  1. Every time a user wants to send across content that doesn’t fit into an email, the user has to then decide between using .doc, .docx and .pdf formats. This implies additional software that needs to be installed on the recipient’s computer. This is unnecessary because browsers already do a fantastic job of rendering content, why should that be outsourced to other software simply for the reason that they don’t have a common document format?
    • Think product help documentation, resumes, small galleries of photos, and so on.
    • PDF is pixel-level which means it is good for printing, and HTML/MHT is presentation-level which means it is good for viewing while still maintaining full fidelity.
  2. Because there is simply no good “File Save As” solution. This is especially useful to store pages offline so that the user always has access to them, e.g., the Markdown text formatting syntax, and so on.
  3. Print to PDF is abysmal because most websites don’t have appropriate print stylesheets. Currently I’m using the Aviary “To Image” bookmarklet to save pages and preserving decent presentation at the same time. However, saving the document as an image means that I cannot search for text. If only the browser had a proper “Save As” solution, then we would have the best of both worlds.
  4. The future is full of small screen devices Netbooks, Chrome OS, CrunchPad, iPhone, Android, etc. Do you see PDF readers or office suites on all of these devices? Unlikely. But what they do already have are web browsers. So why not have a browser-native document format that works across all these platforms.

Format Possibilities

The MHTML format is already adopted by IE and Opera. Firefox has the UnMHT addon and also has alternatives such as the Mozilla Archive Format. Safari does not support MHTML but instead has its own .webarchive format.

Each browser supports its own file format, clearly demonstrating that there is a use case for storing documents in single files. The gap is whether browser vendors can agree to adopt a common format. That would mean that the file format would actually be useful since it does not need assumptions on the platform/installed software of the recipient.

What I’m hoping for is the browser vendors to bring the vision of the MAFF file format and KDE WAR file format to life.

Extensibility

  • PDF is read-only by design. The new file format could support highlighting and annotating features such as those present in Scrapbook addon.
    • Use case: The highlighting feature means that I can save an online article, mark the parts that I think that are relevant and important and send the annotated file to a friend via email.
  • If the new file format has a container structure (zip, tarball, etc.), then we can include images, videos and other multimedia, just like the office suites’ formats. Continuing that line of thought, can all the browsers adopt one of the office suite file format standards? What if every browser had “Save as DOCX” and “Open DOCX” options? DOCX is appropriate because it is a ISO standard and it will be interoperable with the most popular office suite out there.

Summary

The wish is that the “Save as MHTML” feature will bundle the webpage into a single file, which can be stored, transmitted, and viewed later using any web browser. This will also be useful for small-screen devices of the future which have browsers but not necessarily have dedicated format readers and office suites. If a container structure format is used instead of MHTML, then features such as highlighting, commenting, multimedia, etc. can be added.

I hope this sparks a discussion about whether this idea has potential and could be something useful, or is completely unnecessary.

Update 1: Thanks to “Rik|work” on irc.freenode.net#webkit, got to know about two open bugs in the Webkit bugbase which exactly talks about this  — Bug 7168 –  Support reading of MHTML (multipart/related) web archives and Bug 7211 –  Support save as “Web page, complete” in Firefox format, and as pointed in the comments to the latter bug, Chromium/Google Chrome already supports this! So it is not an outlandish idea as it seems :)

Update 2: Thanks to “Mardeg” on irc.mozilla.org#firefox, got to know about the this proposal from Alexander Limi called Making browsers faster: Resource Packages.

Update 3: Thanks again to “Mardeg” for pointing out these filed proposals in Firefox – Bug 18764 –  Full rfc2557 MHTML multipart/related support in browser (filed in 1999!) and  Bug 40873 –  Save as rfc 2557 MHTML; complete webpage in one file (filed in 2000!).

Update 4: Continuing the discussion with “Mardeg”, it seems there is already a format that can solve this purpose – SVG. It is supported in all modern browsers and Google is working on svgweb which is a JavaScript library that any website can use that enables IE to render SVG using Flash Player behind the scenes. Very interesting! If only IE natively supported SVG along with browsers and word processors having a “Save as SVG” option, this pain point would just go away.

Update 5 (Oct 19, 2009): Looks like MHT is indeed not an obscure file format, Zoho Notebook has “Export to MHT” and “Export to HTML” as the two export options for notebooks and pages.