GreyWyvern.com

PHF: Portable Hypertext Format

Save an entire webpage in a single file like a PDF, only HTML

PHF_Get version 0.3.2 (beta)
  • User Agent

    • – "phf_get %{version}"
    • – "CCBot/2.0 (http://commoncrawl.org/faq/)"

  • This operation may take up to a minute or so for large and/or complicated pages. Please be patient while the script works and do not press the PHF button more than once.

What's Happening?

Have you ever downloaded a webpage only to find out that it becomes a shadow of its former self when viewed later? In order for downloaded webpages to work properly, all of the associated content, such as images, stylesheets and javascripts, must be downloaded as well or you end up with an unstyled mess of text. What if you could download a complete single webpage, with all associated content, all in one file?

Many people know that in Opera, Mozilla/Firefox, Safari and Konqueror browsers, images can be embedded into your HTML pages and stylesheets using the data: protocol. Yet this ability can be extended to include virtually any type of file, from stylesheet to external javascript.

This tool will download a webpage you specify and grab all associated content, such as images, stylesheets and javascripts. External content will be encoded in data: URIs and substituted for the original URIs. The result is an HTML document which is completely self-contained; all styling, scripting and image data is included in a single file, like a PDF. The only problem is, it won't work in MSIE; you must load the completed file in a newer browser.

I call this resulting document a PHF because you can save it, move it and email it, just like a PDF. There are no folders or linked files necessary; the HTML page you download should appear completely like the original. Try it for yourself using the form above.

Important Notes

This tool essentially downloads all the files associated with the supplied webpage URI. It costs me bandwidth to grab it, CPU time to process it, and more bandwidth to serve it. If you have your own webspace with PHP 4.3.0+, you can do me a favour and host your own copy using the source code!

  1. The script includes external javascript files as-is; it will not parse them for URIs
  2. I am developing this script right here, using the very file referenced by the form, so this service may be unavailable from time to time.
  3. This copy of the script has been set so any single file retrieved can have a maximum size of 200kB (the total returned PHF may be larger than this). Any files larger than this will be ignored.
  4. Please report bugs using my contact form.

Version 0.3.2 (beta)

Get the PHP source for this tool! This script is licenced under the BSD licence.

Opera Users

Opera users can also use this button which allows them to send the URI of the page they are currently viewing directly to this tool. Just remember that this script will only work on pages which this script can access; eg. pages from your local computer/network or pages behind SSL or HTTP-Authentication will not work.

Also, Opera Community user profiT has created a UserScript version of this tool which doesn't require a remote server! View the forum thread for more info and to get the tools you need to add the function to your Opera browser.

Twitter RSS 2.0 Valid XHTML 1.0! Copyright © 2017 Brian Huisman AKA GreyWyvern
ContactSite mapSearch