Swaroop C H

blog books about contact subscribe

Wish for browsers - Adopt MHTML format

15 Sep 2009

This is a request to the communities behind all the open source browsers: Please adopt the MHTML format (or even better, the Mozilla Archive Format) and make it a native part of the browsers.

Use cases

  1. Every time a user wants to send across content that doesn't fit into an email, the user has to then decide between using .doc, .docx and .pdf formats. This implies additional software that needs to be installed on the recipient's computer. This is unnecessary because browsers already do a fantastic job of rendering content, why should that be outsourced to other software simply for the reason that they don't have a common document format?
    • Think product help documentation, resumes, small galleries of photos, and so on.
    • PDF is pixel-level which means it is good for printing, and HTML/MHT is presentation-level which means it is good for viewing while still maintaining full fidelity.
  2. Because there is simply no good "File Save As" solution. This is especially useful to store pages offline so that the user always has access to them, e.g., the Markdown text formatting syntax, and so on.
  3. Print to PDF is abysmal because most websites don't have appropriate print stylesheets. Currently I'm using the Aviary "To Image" bookmarklet to save pages and preserving decent presentation at the same time. However, saving the document as an image means that I cannot search for text. If only the browser had a proper "Save As" solution, then we would have the best of both worlds.
  4. The future is full of small screen devices Netbooks, Chrome OS, CrunchPad, iPhone, Android, etc. Do you see PDF readers or office suites on all of these devices? Unlikely. But what they do already have are web browsers. So why not have a browser-native document format that works across all these platforms.

Format Possibilities

The MHTML format is already adopted by IE and Opera. Firefox has the UnMHT addon and also has alternatives such as the Mozilla Archive Format. Safari does not support MHTML but instead has its own .webarchive format.

Each browser supports its own file format, clearly demonstrating that there is a use case for storing documents in single files. The gap is whether browser vendors can agree to adopt a common format. That would mean that the file format would actually be useful since it does not need assumptions on the platform/installed software of the recipient.

What I'm hoping for is the browser vendors to bring the vision of the MAFF file format and KDE WAR file format to life.

Extensibility


Summary

The wish is that the "Save as MHTML" feature will bundle the webpage into a single file, which can be stored, transmitted, and viewed later using any web browser. This will also be useful for small-screen devices of the future which have browsers but not necessarily have dedicated format readers and office suites. If a container structure format is used instead of MHTML, then features such as highlighting, commenting, multimedia, etc. can be added.

I hope this sparks a discussion about whether this idea has potential and could be something useful, or is completely unnecessary.

Update 1: Thanks to "Rik|work" on irc.freenode.net#webkit, got to know about two open bugs in the Webkit bugbase which exactly talks about this -- Bug 7168 - Support reading of MHTML (multipart/related) web archives and Bug 7211 - Support save as "Web page, complete" in Firefox format, and as pointed in the comments to the latter bug, Chromium/Google Chrome already supports this! So it is not an outlandish idea as it seems :)

Update 2: Thanks to "Mardeg" on irc.mozilla.org#firefox, got to know about the this proposal from Alexander Limi called Making browsers faster: Resource Packages.

Update 3: Thanks again to "Mardeg" for pointing out these filed proposals in Firefox - Bug 18764 - Full rfc2557 MHTML multipart/related support in browser (filed in 1999!) and Bug 40873 - Save as rfc 2557 MHTML; complete webpage in one file (filed in 2000!).

Update 4: Continuing the discussion with "Mardeg", it seems there is already a format that can solve this purpose - SVG. It is supported in all modern browsers and Google is working on svgweb which is a JavaScript library that any website can use that enables IE to render SVG using Flash Player behind the scenes. Very interesting! If only IE natively supported SVG along with browsers and word processors having a "Save as SVG" option, this pain point would just go away.

Update 5 (Oct 19, 2009): Looks like MHT is indeed not an obscure file format, Zoho Notebook has "Export to MHT" and "Export to HTML" as the two export options for notebooks and pages.

Comments

Devdas Bhagat says:

Actually, I prefer a different set of document formats depending on presentation requirements.

If the document needs to be editable, I use ODF or plain text.
If I want read-only, view exactly as I created it, PDF.
Read-only, without formatting restrictions, use HTML with CSS.

How do you enable these behaviours in a single application, without distinct file types?

Swaroop says:

@Devdas As I was stressing in the use cases, I am referring to a read-only format, analogous to the PDF but without the need for a separate PDF viewer but instead it can be read by all browsers. It is meant mainly as an interchange format and not for full editing.

The problem with ODF and PDF is that these are document formats that require separate software for the recipient, whereas it can easily be replaced with something the browser can render.

Hari K T says:

Yes, you are right .

I hope the open-source , GNU/GPL and community can change the world . ;)

Swaroop says:

@Hari :)

Devdas Bhagat says:

So why do you want everything to be in the browser? Why not in emacs/vim/notepad?

Swaroop says:

@Devdas I don't understand your question. All I'm wishing for is PDF to be replaced by MHTML/MAFF. Is something wrong with that?

gildas says:

I prefer to promote a standard and simple way to embed HTML page external resources. This standard way use data URI scheme. An extension for chrome is available here :
https://chrome.google.com/extensions/detail/mpiodijhokgodhhofbcjdecpffjipkle.

Feedback

There's no comment box, but please do email me or tweet me your thoughts and criticisms, and I will publish the relevant ones here.