Smartial Wayback Machine Text Extractor

Live version of this page DOES NOT exist (#404)

https://web.archive.org/web/20150427100416/http://www.webdirections.org/speakeasy/presentations/SmashingOffline/offline.html

This article contains 8 images. You will find them at the very end of the article.

This article contains 4174 words.

Offline First

Ajax Revolution

Offline First

We can’t keep building apps with the desktop mindset of permanent, fast connectivity, where a temporary disconnection or slow service is regarded as a problem and communicated as an error

Offline First: Hoodie

bit.ly/1qPSL9z

Offline First

We can’t keep building apps with the desktop mindset of permanent, fast connectivity, where a temporary disconnection or slow service is regarded as a problem and communicated as an error

Offline First: Hoodie [bit.ly/1qPSL9z]

Offline First

we'll cover
- taking your application offline
- persisting data in the browser
- are we online?
- what else can I do?
- what comes next?

Someone who has surpassed the levels of jerk and asshole, however not yet reached f***er or motherf****er. Not to be confuzed with douche
Urban Dictionary

Making a cache

As I'm sure you know, browsers cache HTML, CSS, JavaScript files, images and other resources of the sites you visit, to speed up the subsequent loading of pages. However, you never know when the browser might discard cached files, and so this is not a reliable way for sites to work offline. But what if we could tell the browser what to cache? Well, with HTML5 application caches (also known as "appcaches") we can do just that. Let's look at how.

browsers cache HTML, CSS, JavaScript files, images and other resources of the sites you visit, to speed up the subsequent loading of pages
However, you never know when the browser might discard cached files, and so this is not a reliable way for sites to work offline
with HTML5 application caches we can tell the browser how to cache resources

with the appcache

web apps still work when the user is not connected
sites and apps can improve performance by caching resources

how do we make the appcache work?

1. create an appcache manifest file
2. link to this from HTML documents
3. Beware of the genie's rules

The heart of the technique is to create an appcache manifest, a simple text file, which tells the browser what to cache (and also what not to). The resources are then cached in an "application cache", or "appcache", which is distinct from the cache a browser uses for its own purposes. The anatomy of an appcache manifest is straightforward, but there are a few subtleties.

A simple appcache manifest

CACHE MANIFEST CACHE: #images /images/image1.png #pages /pages/page1.html /pages/page2.html #CSS /style/style.css NETWORK: signup.html FALLBACK: / /offline.html

The CACHE section

explicitly list the resources we want cached.
Use either a URL relative to the .appcache file, or an absolute URL.

CACHE: #images /images/image1.png http://somedomain.com/images/image2.png http://anotherdomain.com/stuff/image3.png

The NETWORK Section

Particularly with dynamically generated content (from APIs etc), some resources we won't ever want cached
Specify these resources in the NETWORK section
these resources will always be fetched, and never cached

Structure of a NETWORK: section

Can specify resources individually with an absolute or relative URL
Can also specify a group of resources using a partial URL
Any resources which have URLs beginning with this pattern are never cached
There's also a special wildcard, *. This specifies that any resources not explicitly cached in a CACHE: section should not be cached.

the NETWORK: section

NETWORK: signup.html payments/pay.html /payments/ * NETWORK: *

the FALLBACK: section

specifies resources to be used for non-cached resources when the user is offline
it's much less widely used, details in my book

/images/ /images/missing.png

In the fallback section, we specify resources to be replaced, as well as the resources to replace them. The first of these pairs can be a URL or prefix match pattern. The second must specifically identify a resource to replace any matching the first pattern. There's no wildcard for the FALLBACK section

A simple appcache manifest

CACHE MANIFEST # version 1.0 CACHE: #images /images/image1.png /images/image2.png NETWORK: * FALLBACK: /images/ /images/missing.png

Using the appcache manifest

So now we've created our appcache manifest, we need to associate it with our HTML documents. We do this by adding the manifest attribute to the html element of a document, where the value of this attribute is the URL of the appcache file.

The current recommendation is the appcache file have the extension .appcache. So, if our manifest is located at the root of our site, we'd link to it like so

There are also suggestions that using the HTML5 doctype may be required for some browsers to use app cache, so, make sure you use the doctype

<!DOCTYPE html>

to use the manifest file with an HTML document, we add a manifest attribute to HTML element
- <html manifest='manifest.appcache'>

Caching Algorithm

browser fetches and begins parsing a page
sees there's a manifest for the page
fetches the manifest file
having completed parsing and fetching page resources caches any in the manifest CACHE: section
then fetches and caches resources in the CACHE: not in the page
lastly fetches and caches FALLBACK: resources

Genie's Rules

App caching can be very powerful, allowing apps to work while the user is offline, and can increase site performance, but there are some definite gotchas it pays to be aware of. Here's a few well worth knowing about.

Persistence

Effectively, appcaches don't expire. You can't override them with HTTP headers. This has particular implications for developing, with appcaches. We'll look at this in a moment.

Once a resource is cached, the browser will continue to use this cached version, effectively forever, even if you change that resource on the server.
To ensure the browser updates the cache, you must change the .appcache file.
The browser will then re-cache any changed resources (and only those that have changed)
But, the user must visit the page twice for new cache to take effect!

What the?!?

the user must reload twice for new cache to take effect

first visit
- browser sees there's a manifest, creates the appcache
next visit
- browser has an appcache, renders the page using cached resources
- browser then checks to see whether the manifest has changed
- if so, it then updates the appcache, downloading the changed resources
- but because it has already rendered the page, it doesn't re-render it

Forcing a cache update

it is possible to listen on window.applicationCache for whether there's a new cache updateready
then, rather than simply refresh the page, it's recommended you prompt the user to do so
I cover this in detail in my book

stylesheets & other resources

You'd be forgiven for thinking that any images in a style sheet that has been cached will be included in the appcache, but that's not so. Images your style sheet refers to must be explicitly referenced in the CACHE section of the manifest as well.

Similarly, style sheets that are imported using @import, and resources included via JavaScript must also be explicitly cached.

To help build an appcache manifest, I've developed manifestR, which we'll look at in detail in a moment. It will generate an appcache manifest for you, which includes all the scripts, style sheets, including those @imported, images, linked pages at the same site, and any images linked to in stylesheets.

There's one exception to the rule that only explicitly listed resources are cached, and it is important to understand. Any HTML document that has a manifest attribute will be cached, even if it is not listed in the manifest. This can cause all kinds of headaches while developing, which we cover shortly.

resources must be explicitly cached
- images in style sheets and script files won't be cached (unless explicitly listed in a CACHE: section)
- pay attention to @imported CSS, modular JavaScript

Master Entries

To improve the performance of a site, you might be tempted to preload the entire site, by adding all the pages, images etc in it to the appcache manifest. And, in certain circumstances this might be desirable. It will however place considerable demands on your server, and use more bandwidth, as the first time a person visits your site, they will download more resources than they otherwise might have. Luckily, appcaches have an additional feature that can help here.

You'll recall that HTML documents with a manifest attribute do not need to be included in the manifest file to be cached. The benefit of this is that rather than explicitly listing all the pages at your site in a manifest for them to be cached, each time someone visits a page that links to a manifest, it will then be cached.

If the primary motivation for using an appcache is to ensure your site or more likely app works offline, you'll likely want to explicitly list the pages of the site. If your primary motivation is increased performance, then let pages lazily cache when the user visits them, but cache scripts, CSS, and perhaps commonly used images.

any page with a manifest attribute in the HTML element will be cached in the associated cache
even if not included in the manifest
even if included in NETWORK:
AppCache consequently is better suited to applications rather than rapidly changing content sites

From the Spec

The mixed-content model does not work well with the application cache feature: since the content is cached, it would result in the user always seeing the stale data from the previous time the cache was updated.

Cache failure

An important, but subtle gotcha with appcaching is that if even one of the resources you include in your cache manifest is not available, then no resources will be cached. So, it is really important to ensure that any resource listed in your appcache manifest is available online. There's a tool we discuss in a moment, the Cache Manifest Validator, to help ensure all those resources are online.

if even one of the resources you include in your cache manifest is not available, then no resources will be cached.
Worse, if there's already a cache, then it won't be updated. The browser continues to use the out of date cache.
listen for error events on the window.applicationCache, and send these to error servers

Missing (not missing) Resources

Appcache's most un-intuitive feature
- if a resource is not listed in a CACHE section of a manifest
- and it's not listed (either explicitly, or via a partial match, or the wildcard)
- then the browser treats that resource as being unavailable!

Missing (not missing) Resources

One truly counter-intuitive aspect of appcache is as follows

if a resource is not listed in a CACHE section of a manifest
and it's not listed (either explicitly, or via a partial match, or the wildecard)

then the browser treats that resource as being unavailable!

what about browser support?

It must also be noted that the general consensus is that appcaching is currently far from perfect across all browsers which support it. The specification is still in draft, but it should also be noted that most browsers have supported at least some appcaching for quite some time.

According to an amalgam of When can I use, Dive into HTML5 and other online resources:

Safari since version 4
Chrome since version 5
Mobile Safari since iOS 2.1
Firefox since version 3.5
Opera since version 11
Android since version 2.1
Internet Explorer since 10

Appcache and JavaScript/DOM

Appcache is largely declarative–we can only specify whether a resource should or shouldn't be cached using a manifest.
- There is some DOM access to the appcache.
- We can't alter the manifest via JavaScript and the DOM
- We can detect whether a cache needs updating, or the download failed
- we don't have time to go beyond this, but my book covers this!

Application Cache

Does work
is not a douche
but it is a pain in the $#%^
has its quirks
avoid using it for performance based caching
is amazing when used appropriately, particularly for offlining applications

Web Storage overview

Until HTML5, the only ways to persist a user's data between sessions
- to store it on the server
- to use cookies on the browser
Both present significant security challenges
Both are a lot of work

Until recently, the only ways to persist a user's data between visits to your site have been to store it on the server, or use cookies on the browser. Both present significant security challenges.

Cookies are designed for communication between the browser and a server, that persists between sessions. They're typically used for identifying a user on return visits, storing details about that user, details which can include passwords and user names. Cookies are sent between the browser and server in plain text, unencrypted. So, unless your browser encrypts cookie contents, these can be quite trivially be picked up particularly on public wifi networks, when used over standard HTTP (not encrypted HTTPS).

Storing all client data on the server creates usability issues as well, as users need to log in each time they use that site.

Web Storage overview

As web applications become increasingly sophisticated, we need ways to keep data around in the browser
Web Storage, two closely related technologies allow us to develop "offline first", minimizing or eliminating our need for networks and servers
sessionStorage and localStorage

I'm sure more than once in your life, you've filled in a form, only to have the browser crash, or to accidentally close a window, or go back, or otherwise lose the contents of a form you've filled in painstakingly.

As web applications become increasingly sophisticated, developers need ways to keep data around on the browser (particularly if we want our applications to work whether the user is on or offline.)

Two closely related but slightly different features of HTML5 exist to help keep track of information solely in the browser. They enable far more structured data than cookies, are much easier to use, and the information stored can only be transmitted to a server explicitly by the application.

sessionStorage stores data during a session, and is removed once a session is finished. localStorage is almost identical, but the data stored persists indefinitely, until removed by the application. Let's start with sessionStorage, keeping in mind that we use localStorage almost identically.

As web applications become increasingly sophisticated, developers need ways to keep data around on the browser
Two closely related but slightly different features of HTML5 exist to help keep track of information solely in the browser
- sessionStorage stores data during a session, and is removed once a session is finished
- localStorage is almost identical, but the data stored persists indefinitely, until removed by the application

Web Storage overview

both sessionStorage and localStorage are simple 'key, value' databases
sessionStorage is available for a single session, and when that ends, the browser clears the database
localStorage persists across multiple visits to a domain, and is available to any page at that domain

What is a session?

The key feature of sessionStorage is that data only persists for a session
In essence, a session is the combination of a particular window or tab, and a fully qualified domain

What good is sessionStorage?

maintain sensitive information during a transaction
create a multipage form or process without the need to store data on the server until it completes
moves the heavy lifting for protecting sensitive data from application developers

So, what good is sessionStorage? Well, one very useful application would be to maintain sensitive information during a transaction, signup, sign in and so on, which will be purged as soon as the user closes a window or tab. It can be used to create a multipage form or application, where the information in each page can persist, and be sent at once. It also moves the heavy lifting for protecting sensitive data from application developers (for example encrypting cookie data) to the browser

For maintaining sensitive information during a transaction, signup, sign in and so on, which will be purged as soon as the user closes a window or tab
It can be used to create a multipage form or application, where the information in each page can persist, and be sent at once
it moves the heavy lifting for protecting sensitive data from application developers (for example encrypting cookie data) to the browser

what good is localStorage?

easily maintain subtle application state, user preferences
drastically reduces the need for server side storage
drastically reduces the need for communication with the server
potentially improving performance and security

using sessionStorage & localStorage?

sessionStorage & localStorage are properties of the window object
We use them to store "key, value pairs"
each pair is a piece of information, identified by a unique identifier, called a key.
Both the key and the value are strings.
We use the setItem method to store data

//get the value of the input with id="name" var username = document.querySelector('#username').value; var userage = document.querySelector('#userage').value; //temporarily store the contents of the variable 'username' using the key "name" window.sessionStorage.setItem('name', username); //persist the contents of the variable 'age' between sessions window.localStorage.setItem('age', userage);

reading from sessionStorage & localStorage

We retrieve values by using the function getItem using a single parameter, the key we used to set the item.

function displayDetails(){ var username = window.sessionStorage.getItem('name'); var userage = window.localStorage.getItem('age'); }

managing sessionStorage & localStorage

We can remove items from Web Storage with the method removeItem(key).
To delete the entire localStorage, we can use localStorage.clear().
These also apply to sessionStorage

localStorage and sessionStorage are synchronous

Though this will likely only have an impact when saving large amounts of data, web storage functionality is synchronous
This means that all JavaScript in a page stops executing while the storage function is performed.
In theory, this could have an effect on the responsiveness of an application to user input.
In practice it's unlikely to have an impact.

sessionStorage and localStorage store all data as strings

Whatever you store is converted to a string
This includes boolean values, integers, floating point numbers, dates, and so on
So keep this in mind if you are storing boolean preferences, arrays, or more complex types of data
JSON is your friend

window.localStorage.setItem("details", JSON.stringify(userDetails)); var details = JSON.parse(window.localStorage.getItem("details"))

localStorage and sessionStorage limits

browsers typically implement a 5MB limit on the amount of data localStorage or sessionStorage can save for a given domain.
If the storage needs of your application are likely to exceed 5MB, then indexedDB provides a solution to this.

IndexedDB

sophisticated client side database
asynchronous read/write
much less limited than Web Storage
low level, much more complicated API
widely supported with iOS8 and Safari 8

Browser Support

Internet Explorer 8
Firefox 3.5
Safari 4
Chrome 4
Opera 10.5
iOS Safari 3.2
Android 2.1

Are we online?

We might want to handle user input differently depending on whether we are online or not.
- A simple example might be to disable an upload or submit button when the user is offline, and enable it when the user is back online.
With HTML5 there are two main ways we can determine whether the user is online or not.
- the navigator.onLine property
- online and offline events

The navigator.onLine property

The navigator object has a boolean property, onLine
This is true when the user is online, and false when not.
When updating controls we can check this attribute, and enable or disable a button

var online = navigator.onLine; var submit = document.querySelectorAll('button[type=submit]'); for (var i=0; i < submit.length; i++) { submit[i].disabled = !online; };

online and offline events

However, what would be preferable is to magically update the interface when the user goes on or offline.
We can do this with new offline and online events in HTML5
When the user goes online or offline, an event is sent, that we can listen for

window.addEventListener("offline", updateUI()); window.addEventListener("online", updateUI());

Are we online?

if navigator.onLine == false or when you receive an offline event: user is definitely offline
if navigator.onLine == true or when you receive an online event: user may be online
There are ways of determining the true online state with a little extra effort

File API

allows us to access local file system
get details about a file (name, last modified, size)
read the contents of a file
take photos from any web page
this is all covered in my book

Service Workers

replace AppCache, and allow very sophisticated offline applications, using JavaScript and the DOM
Essentially a proxy server installed per domain for a browser
write event handlers in JavaScript to intercept traffic!
Currently experimental support in Chrome and Firefox

The offline web

is a reality today
allows us to do completely new things with the Web
brings benefits for even simple Web sites
you should you be using it

Thankyou

@johnallsopp
get the draft of my offline book
http://bit.ly/QXAQ4w

Images:

The images are downsized due to limited space here. The original dimensions may differ.
Click on the image to open it on a new tab.

Please close this window manually.

Smartial Wayback Machine Text Extractor

Offline First

Ajax Revolution

Offline First

Offline First

Offline First

Making a cache

with the appcache

how do we make the appcache work?

A simple appcache manifest

The CACHE section

The NETWORK Section

Structure of a NETWORK: section

the NETWORK: section

the FALLBACK: section

A simple appcache manifest

Using the appcache manifest

Caching Algorithm

Genie's Rules

Persistence

What the?!?

Forcing a cache update

stylesheets & other resources

Master Entries

From the Spec

Cache failure

Missing (not missing) Resources

Missing (not missing) Resources

what about browser support?

Appcache and JavaScript/DOM

Application Cache

Web Storage overview

Web Storage overview

Web Storage overview

What is a session?

What good is sessionStorage?

what good is localStorage?

using sessionStorage & localStorage?

reading from sessionStorage & localStorage

managing sessionStorage & localStorage

localStorage and sessionStorage are synchronous

sessionStorage and localStorage store all data as strings

localStorage and sessionStorage limits

IndexedDB

Browser Support

Are we online?

The navigator.onLine property

online and offline events

Are we online?

File API

Service Workers

The offline web

Thankyou

Images:

Instructions:

Free archive.org search tools

Free domain auditing tools

Free expired content hunting tools

Free email tools