Latency 101: What is latency and why is it such a big deal?

(This post was written as an addendum to this post, which discusses some recent research into desktop versus mobile latency.)

In web performance circles, “latency” is the amount of time it takes for the host server to receive and process a request for a page object. The amount of latency depends largely on how far away the user is from the server.

To put this in real-world terms, say you visit a web page and that page contains 100 objects — things like images, CSS files, etc. Your browser has to make 100 individual requests to the site’s host server(s) in order to retrieve those objects. Each of those requests experiences at least 20-30ms of latency. (More typically, latency is in the 75-140ms range, even for sites that use a CDN.) This adds up to 2 or 3 seconds, which is pretty significant when you consider it as just one factor that can slow your pages down.

When you also consider that a page can have upwards of 300 or 400 objects, and that latency can reach a full second for some mobile users, you can easily see where latency becomes a major problem. If your goal is to have your entire page load in less than 3 seconds (and if that’s not your goal, it should be), then latency can kill you right out of the gate.

For obvious reasons, tackling latency is a top priority for the performance industry. There are several ways to do this:

  • Allow more requests to happen concurrently.
  • Shorten the server round trips by bringing content closer to users.
  • Reduce the number of round trips.
  • Improve the browser cache, so that it can (1) store files and serve them where relevant on subsequent pages in a visit and (2) store and serve files for repeat visits.

Browser vendors work around this problem by using multiple connections, which allows the browser to make simultaneous requests to the host server. Since 2008, most browsers have finally moved from 2 connections per domain to 6. Vendors also focus on improving the browser cache.

Google’s SPDY protocol extends what the browser can do by adding a session layer atop of SSL that allows for multiple concurrent streams over a single connection.

Content delivery networks (CDNs) cache content in distributed servers across a region or worldwide, thereby bringing content closer to users and reducing the round trip time. Important to note: While CDNs help with desktop performance, they don’t help mobile latency.

Front-end optimization (either manual or automated) alleviates latency by consolidating page objects into bundles. Fewer bundles means fewer trips to the server, so the total latency hit is greatly reduced. For example, using Strangeloop’s Site Optimizer, a page that starts  with 63 objects could see those objects consolidated into 9 resource requests. FEO also leverages the browser cache and allows it to do a better job of storing files and serving them again where relevant, so that the browser doesn’t have to make repeat calls to the server.

Solving latency is an ongoing, spare-no-expense effort. Just last week, it was announced that this summer marks the start of a $4.5 billion fiber-optic cable project that will connect the UK and Japan — with the sole purpose of shaving 60ms of latency. The reason why:

The massive drop in latency is expected to supercharge algorithmic stock market trading, where a difference of a few milliseconds can gain (or lose) millions of dollars.

Related posts: