Smartial Wayback Machine Text Extractor



Live version of this page DOES NOT exist (#404)


This article contains 8 images. You will find them at the very end of the article.

This article contains 4174 words.

Offline First

Ajax Revolution

Offline First

We can’t keep building apps with the desktop mindset of permanent, fast connectivity, where a temporary disconnection or slow service is regarded as a problem and communicated as an error

Offline First: Hoodie

bit.ly/1qPSL9z

Offline First

We can’t keep building apps with the desktop mindset of permanent, fast connectivity, where a temporary disconnection or slow service is regarded as a problem and communicated as an error

Offline First: Hoodie [bit.ly/1qPSL9z]

Offline First

  • we'll cover
    • taking your application offline
    • persisting data in the browser
    • are we online?
    • what else can I do?
    • what comes next?
  • Someone who has surpassed the levels of jerk and asshole, however not yet reached f***er or motherf****er. Not to be confuzed with douche
  • Urban Dictionary

Making a cache

As I'm sure you know, browsers cache HTML, CSS, JavaScript files, images and other resources of the sites you visit, to speed up the subsequent loading of pages. However, you never know when the browser might discard cached files, and so this is not a reliable way for sites to work offline. But what if we could tell the browser what to cache? Well, with HTML5 application caches (also known as "appcaches") we can do just that. Let's look at how.

  • browsers cache HTML, CSS, JavaScript files, images and other resources of the sites you visit, to speed up the subsequent loading of pages
  • However, you never know when the browser might discard cached files, and so this is not a reliable way for sites to work offline
  • with HTML5 application caches we can tell the browser how to cache resources

with the appcache

  • web apps still work when the user is not connected
  • sites and apps can improve performance by caching resources

how do we make the appcache work?

  • 1. create an appcache manifest file
  • 2. link to this from HTML documents
  • 3. Beware of the genie's rules

The heart of the technique is to create an appcache manifest, a simple text file, which tells the browser what to cache (and also what not to). The resources are then cached in an "application cache", or "appcache", which is distinct from the cache a browser uses for its own purposes. The anatomy of an appcache manifest is straightforward, but there are a few subtleties.

A simple appcache manifest

CACHE MANIFEST CACHE: #images /images/image1.png #pages /pages/page1.html /pages/page2.html #CSS /style/style.css NETWORK: signup.html FALLBACK: / /offline.html

The CACHE section

  • explicitly list the resources we want cached.
  • Use either a URL relative to the .appcache file, or an absolute URL.
CACHE: #images /images/image1.png http://somedomain.com/images/image2.png http://anotherdomain.com/stuff/image3.png

The NETWORK Section

  • Particularly with dynamically generated content (from APIs etc), some resources we won't ever want cached
  • Specify these resources in the NETWORK section
  • these resources will always be fetched, and never cached

Structure of a NETWORK: section

  • Can specify resources individually with an absolute or relative URL
  • Can also specify a group of resources using a partial URL
  • Any resources which have URLs beginning with this pattern are never cached
  • There's also a special wildcard, *. This specifies that any resources not explicitly cached in a CACHE: section should not be cached.

the NETWORK: section

NETWORK: signup.html payments/pay.html /payments/ * NETWORK: *

the FALLBACK: section

  • specifies resources to be used for non-cached resources when the user is offline
  • it's much less widely used, details in my book
/images/ /images/missing.png

In the fallback section, we specify resources to be replaced, as well as the resources to replace them. The first of these pairs can be a URL or prefix match pattern. The second must specifically identify a resource to replace any matching the first pattern. There's no wildcard for the FALLBACK section

A simple appcache manifest

CACHE MANIFEST # version 1.0 CACHE: #images /images/image1.png /images/image2.png NETWORK: * FALLBACK: /images/ /images/missing.png

Using the appcache manifest

So now we've created our appcache manifest, we need to associate it with our HTML documents. We do this by adding the manifest attribute to the html element of a document, where the value of this attribute is the URL of the appcache file.

The current recommendation is the appcache file have the extension .appcache. So, if our manifest is located at the root of our site, we'd link to it like so

<html manifest='manifest.appcache'>

There are also suggestions that using the HTML5 doctype may be required for some browsers to use app cache, so, make sure you use the doctype

<!DOCTYPE html>
  • to use the manifest file with an HTML document, we add a manifest attribute to HTML element
    • <html manifest='manifest.appcache'>

Caching Algorithm

  • browser fetches and begins parsing a page
  • sees there's a manifest for the page
  • fetches the manifest file
  • having completed parsing and fetching page resources caches any in the manifest CACHE: section
  • then fetches and caches resources in the CACHE: not in the page
  • lastly fetches and caches FALLBACK: resources

Genie's Rules

App caching can be very powerful, allowing apps to work while the user is offline, and can increase site performance, but there are some definite gotchas it pays to be aware of. Here's a few well worth knowing about.

Persistence

Effectively, appcaches don't expire. You can't override them with HTTP headers. This has particular implications for developing, with appcaches. We'll look at this in a moment.

  • Once a resource is cached, the browser will continue to use this cached version, effectively forever, even if you change that resource on the server.
  • To ensure the browser updates the cache, you must change the .appcache file.
  • The browser will then re-cache any changed resources (and only those that have changed)
  • But, the user must visit the page twice for new cache to take effect!

What the?!?

  • the user must reload twice for new cache to take effect
  • first visit
    • browser sees there's a manifest, creates the appcache
  • next visit
    • browser has an appcache, renders the page using cached resources
    • browser then checks to see whether the manifest has changed
    • if so, it then updates the appcache, downloading the changed resources
    • but because it has already rendered the page, it doesn't re-render it

Forcing a cache update

  • it is possible to listen on window.applicationCache for whether there's a new cache updateready
  • then, rather than simply refresh the page, it's recommended you prompt the user to do so
  • I cover this in detail in my book

stylesheets & other resources

You'd be forgiven for thinking that any images in a style sheet that has been cached will be included in the appcache, but that's not so. Images your style sheet refers to must be explicitly referenced in the CACHE section of the manifest as well.

Similarly, style sheets that are imported using @import, and resources included via JavaScript must also be explicitly cached.

To help build an appcache manifest, I've developed manifestR, which we'll look at in detail in a moment. It will generate an appcache manifest for you, which includes all the scripts, style sheets, including those @imported, images, linked pages at the same site, and any images linked to in stylesheets.

There's one exception to the rule that only explicitly listed resources are cached, and it is important to understand. Any HTML document that has a manifest attribute will be cached, even if it is not listed in the manifest. This can cause all kinds of headaches while developing, which we cover shortly.

  • resources must be explicitly cached
    • images in style sheets and script files won't be cached (unless explicitly listed in a CACHE: section)
    • pay attention to @imported CSS, modular JavaScript

Master Entries

To improve the performance of a site, you might be tempted to preload the entire site, by adding all the pages, images etc in it to the appcache manifest. And, in certain circumstances this might be desirable. It will however place considerable demands on your server, and use more bandwidth, as the first time a person visits your site, they will download more resources than they otherwise might have. Luckily, appcaches have an additional feature that can help here.

You'll recall that HTML documents with a manifest attribute do not need to be included in the manifest file to be cached. The benefit of this is that rather than explicitly listing all the pages at your site in a manifest for them to be cached, each time someone visits a page that links to a manifest, it will then be cached.

If the primary motivation for using an appcache is to ensure your site or more likely app works offline, you'll likely want to explicitly list the pages of the site. If your primary motivation is increased performance, then let pages lazily cache when the user visits them, but cache scripts, CSS, and perhaps commonly used images.

  • any page with a manifest attribute in the HTML element will be cached in the associated cache
  • even if not included in the manifest
  • even if included in NETWORK:
  • AppCache consequently is better suited to applications rather than rapidly changing content sites

From the Spec

The mixed-content model does not work well with the application cache feature: since the content is cached, it would result in the user always seeing the stale data from the previous time the cache was updated.

Cache failure

An important, but subtle gotcha with appcaching is that if even one of the resources you include in your cache manifest is not available, then no resources will be cached. So, it is really important to ensure that any resource listed in your appcache manifest is available online. There's a tool we discuss in a moment, the Cache Manifest Validator, to help ensure all those resources are online.

  • if even one of the resources you include in your cache manifest is not available, then no resources will be cached.
  • Worse, if there's already a cache, then it won't be updated. The browser continues to use the out of date cache.
  • listen for error events on the window.applicationCache, and send these to error servers

Missing (not missing) Resources

  • Appcache's most un-intuitive feature
    • if a resource is not listed in a CACHE section of a manifest
    • and it's not listed (either explicitly, or via a partial match, or the wildcard)
    • then the browser treats that resource as being unavailable!

Missing (not missing) Resources

One truly counter-intuitive aspect of appcache is as follows

  • if a resource is not listed in a CACHE section of a manifest
  • and it's not listed (either explicitly, or via a partial match, or the wildecard)

then the browser treats that resource as being unavailable!

what about browser support?

It must also be noted that the general consensus is that appcaching is currently far from perfect across all browsers which support it. The specification is still in draft, but it should also be noted that most browsers have supported at least some appcaching for quite some time.

According to an amalgam of When can I use, Dive into HTML5 and other online resources:

  • Safari since version 4
  • Chrome since version 5
  • Mobile Safari since iOS 2.1
  • Firefox since version 3.5
  • Opera since version 11
  • Android since version 2.1
  • Internet Explorer since 10

Appcache and JavaScript/DOM

  • Appcache is largely declarative–we can only specify whether a resource should or shouldn't be cached using a manifest.
    • There is some DOM access to the appcache.
    • We can't alter the manifest via JavaScript and the DOM
    • We can detect whether a cache needs updating, or the download failed
    • we don't have time to go beyond this, but my book covers this!

Application Cache

  • Does work
  • is not a douche
  • but it is a pain in the $#%^
  • has its quirks
  • avoid using it for performance based caching
  • is amazing when used appropriately, particularly for offlining applications

Web Storage overview

  • Until HTML5, the only ways to persist a user's data between sessions
    • to store it on the server
    • to use cookies on the browser
  • Both present significant security challenges
  • Both are a lot of work

Until recently, the only ways to persist a user's data between visits to your site have been to store it on the server, or use cookies on the browser. Both present significant security challenges.

Cookies are designed for communication between the browser and a server, that persists between sessions. They're typically used for identifying a user on return visits, storing details about that user, details which can include passwords and user names. Cookies are sent between the browser and server in plain text, unencrypted. So, unless your browser encrypts cookie contents, these can be quite trivially be picked up particularly on public wifi networks, when used over standard HTTP (not encrypted HTTPS).

Storing all client data on the server creates usability issues as well, as users need to log in each time they use that site.

Web Storage overview

  • As web applications become increasingly sophisticated, we need ways to keep data around in the browser
  • Web Storage, two closely related technologies allow us to develop "offline first", minimizing or eliminating our need for networks and servers
  • sessionStorage and localStorage

I'm sure more than once in your life, you've filled in a form, only to have the browser crash, or to accidentally close a window, or go back, or otherwise lose the contents of a form you've filled in painstakingly.

As web applications become increasingly sophisticated, developers need ways to keep data around on the browser (particularly if we want our applications to work whether the user is on or offline.)

Two closely related but slightly different features of HTML5 exist to help keep track of information solely in the browser. They enable far more structured data than cookies, are much easier to use, and the information stored can only be transmitted to a server explicitly by the application.

sessionStorage stores data during a session, and is removed once a session is finished. localStorage is almost identical, but the data stored persists indefinitely, until removed by the application. Let's start with sessionStorage, keeping in mind that we use localStorage almost identically.

  • As web applications become increasingly sophisticated, developers need ways to keep data around on the browser
  • Two closely related but slightly different features of HTML5 exist to help keep track of information solely in the browser
    • sessionStorage stores data during a session, and is removed once a session is finished
    • localStorage is almost identical, but the data stored persists indefinitely, until removed by the application

Web Storage overview

  • both sessionStorage and localStorage are simple 'key, value' databases
  • sessionStorage is available for a single session, and when that ends, the browser clears the database
  • localStorage persists across multiple visits to a domain, and is available to any page at that domain

What is a session?

  • The key feature of sessionStorage is that data only persists for a session
  • In essence, a session is the combination of a particular window or tab, and a fully qualified domain

What good is sessionStorage?

  • maintain sensitive information during a transaction
  • create a multipage form or process without the need to store data on the server until it completes
  • moves the heavy lifting for protecting sensitive data from application developers

So, what good is sessionStorage? Well, one very useful application would be to maintain sensitive information during a transaction, signup, sign in and so on, which will be purged as soon as the user closes a window or tab. It can be used to create a multipage form or application, where the information in each page can persist, and be sent at once. It also moves the heavy lifting for protecting sensitive data from application developers (for example encrypting cookie data) to the browser

  • For maintaining sensitive information during a transaction, signup, sign in and so on, which will be purged as soon as the user closes a window or tab
  • It can be used to create a multipage form or application, where the information in each page can persist, and be sent at once
  • it moves the heavy lifting for protecting sensitive data from application developers (for example encrypting cookie data) to the browser

what good is localStorage?

  • easily maintain subtle application state, user preferences
  • drastically reduces the need for server side storage
  • drastically reduces the need for communication with the server
  • potentially improving performance and security

using sessionStorage & localStorage?

  • sessionStorage & localStorage are properties of the window object
  • We use them to store "key, value pairs"
  • each pair is a piece of information, identified by a unique identifier, called a key.
  • Both the key and the value are strings.
  • We use the setItem method to store data
//get the value of the input with id="name" var username = document.querySelector('#username').value; var userage = document.querySelector('#userage').value; //temporarily store the contents of the variable 'username' using the key "name" window.sessionStorage.setItem('name', username); //persist the contents of the variable 'age' between sessions window.localStorage.setItem('age', userage);

reading from sessionStorage & localStorage

  • We retrieve values by using the function getItem using a single parameter, the key we used to set the item.
function displayDetails(){ var username = window.sessionStorage.getItem('name'); var userage = window.localStorage.getItem('age'); }

managing sessionStorage & localStorage

  • We can remove items from Web Storage with the method removeItem(key).
  • To delete the entire localStorage, we can use localStorage.clear().
  • These also apply to sessionStorage

localStorage and sessionStorage are synchronous

  • Though this will likely only have an impact when saving large amounts of data, web storage functionality is synchronous
  • This means that all JavaScript in a page stops executing while the storage function is performed.
  • In theory, this could have an effect on the responsiveness of an application to user input.
  • In practice it's unlikely to have an impact.

sessionStorage and localStorage store all data as strings

  • Whatever you store is converted to a string
  • This includes boolean values, integers, floating point numbers, dates, and so on
  • So keep this in mind if you are storing boolean preferences, arrays, or more complex types of data
  • JSON is your friend
window.localStorage.setItem("details", JSON.stringify(userDetails)); var details = JSON.parse(window.localStorage.getItem("details"))

localStorage and sessionStorage limits

  • browsers typically implement a 5MB limit on the amount of data localStorage or sessionStorage can save for a given domain.
  • If the storage needs of your application are likely to exceed 5MB, then indexedDB provides a solution to this.

IndexedDB

  • sophisticated client side database
  • asynchronous read/write
  • much less limited than Web Storage
  • low level, much more complicated API
  • widely supported with iOS8 and Safari 8

Browser Support

  • Internet Explorer 8
  • Firefox 3.5
  • Safari 4
  • Chrome 4
  • Opera 10.5
  • iOS Safari 3.2
  • Android 2.1

Are we online?

  • We might want to handle user input differently depending on whether we are online or not.
    • A simple example might be to disable an upload or submit button when the user is offline, and enable it when the user is back online.
  • With HTML5 there are two main ways we can determine whether the user is online or not.
    • the navigator.onLine property
    • online and offline events

The navigator.onLine property

  • The navigator object has a boolean property, onLine
  • This is true when the user is online, and false when not.
  • When updating controls we can check this attribute, and enable or disable a button
var online = navigator.onLine; var submit = document.querySelectorAll('button[type=submit]'); for (var i=0; i < submit.length; i++) { submit[i].disabled = !online; };

online and offline events

  • However, what would be preferable is to magically update the interface when the user goes on or offline.
  • We can do this with new offline and online events in HTML5
  • When the user goes online or offline, an event is sent, that we can listen for
window.addEventListener("offline", updateUI()); window.addEventListener("online", updateUI());

Are we online?

  • if navigator.onLine == false or when you receive an offline event: user is definitely offline
  • if navigator.onLine == true or when you receive an online event: user may be online
  • There are ways of determining the true online state with a little extra effort

File API

  • allows us to access local file system
  • get details about a file (name, last modified, size)
  • read the contents of a file
  • take photos from any web page
  • this is all covered in my book

Service Workers

  • replace AppCache, and allow very sophisticated offline applications, using JavaScript and the DOM
  • Essentially a proxy server installed per domain for a browser
  • write event handlers in JavaScript to intercept traffic!
  • Currently experimental support in Chrome and Firefox

The offline web

  • is a reality today
  • allows us to do completely new things with the Web
  • brings benefits for even simple Web sites
  • you should you be using it

Thankyou

  • @johnallsopp
  • get the draft of my offline book
  • http://bit.ly/QXAQ4w


Images:

The images are downsized due to limited space here. The original dimensions may differ.
Click on the image to open it on a new tab.



Please close this window manually.