Why the performance measurement island you trust is sinking

I want to start this post with a little story. (Self indulgent, I know. But I never do this, so humor me this one time.)

The enlightenment currently going on in our industry reminds me of an allegory told in a book called Flatland, written in 1884 by Edwin Abbot. Flatland is a two-dimensional world whose inhabitants are geometric figures. The protagonist is a square.

One day, the square is visited by a sphere from a three-dimensional world called Spaceland. When the sphere visits Flatland, however, all that is visible to Flatlanders is the part of the sphere that lies in their plane: a circle. The square is astonished that the circle is able to grow or shrink at will (by rising or sinking into the plane of Flatland) and even to disappear and reappear in a different place.

The sphere tries to explain the concept of the third dimension to the two-dimensional sphere, but the square, though skilled at two-dimensional geometry, doesn’t get it. He can’t understand what it means to have thickness in addition to height and width, nor can he understand that the circle has a view of the world from up above him, where “up” does not mean from the north.

In desperation, the sphere yanks the square up out of Flatland and into the third dimension so that the square can look down on his world and see it all at once. The square recalls the experience:

An unspeakable horror seized me. There was darkness; then a dizzy, sickening sensation of sight that was not like seeing; I saw space that was not space; I was myself, and not myself. When I could find voice, I shrieked aloud in agony, “Either this is madness or it is Hell.” “It is neither,” calmly replied the voice of the sphere. “It is Knowledge; it is Three dimensions: open your eye once again and try to look steadily.” I looked, and, behold, a new world.”

The square is awestruck. He prostrates himself before the sphere and becomes the sphere’s disciple. Upon his return to Flatland, he struggles to preach the Gospel of Three Dimensions to his fellow two-dimensional creatures.

What does this have to do with performance?

I’ve sugarcoated this message in the past. Now I want to come right out and say it:

There is a very good chance that the measurements you trust to tell you how fast your site is are wrong.

I’ll go one step further (and risk losing a few friends in the CDN and ADC space) and say this:

If you’re a site owner, many members of the performance industry are intentionally misleading you.

In other words, it’s in many performance vendors’ best interests to keep you living in Flatland.

Let me tell you a very common story:

I was talking with a customer who runs a very large ecommerce site. He’s been told by all his trusted performance advisors (analytics company, performance measurement company, large CDN company and/or large load balancer company) that the gold standard for measurement is a graph based on synthetic backbone tests, which looks something like this:

Click to see larger version

If you’ve ever had a similar conversation with one of your performance vendors, I’m going to bet that he or she has told you some or all of the following things:

  • This graph represents the home page performance of your site.
  • The test is a representative average of many geographical locations.
  • The test methodology is rarely exposed but, when dissected, the vendor tells you it is the industry standard, tested using real modern browsers.
  • The vendor assures you that this is the “safe island” upon which all companies measure performance.

When pushed, the vendor may also show you a waterfall that looks something like this:

Click to see larger version

I routinely encounter customers that have been led, by the very experts they trust, into believing that their site performance can be measured by tools like this. It can’t.

Three reasons why you can’t always believe the experts you trust

1. Industry benchmarks are wrong.

I’ve written about this here, so I won’t rehash it all again. Short version: Benchmarks are based on backbone tests, which only tell you how fast your site loads at major internet hubs. Because you’re skipping the “last mile” between the server and the user’s browser, you’re not seeing how your site actually performs in the real world. As I noted in my earlier post, your backbone test may tell you that your site loads in 1.3 seconds, when in the real world it actually takes anywhere from 3 to 10 seconds. That’s a huge discrepancy.

2. Major performance vendors have convinced site owners to tie bonuses for key employees to backbone test results.

This practice is so widespread and accepted that I have spoken to employees at big companies who tell me that they don’t even care what their real-world performance is anymore, because the bonus they get only depends on how their backbone test performance compares to the industry benchmarks.

3. These same performance vendors continue to cling to these test results because it justifies their high margins.

This is the saddest part. I have spoken at length to large companies — trusted performance experts — who know that these tests are a worthless measure of real-world performance. But they continue to sell them as the gospel truth because, as I’ve been told, “This is the only safe island we have. Without a standard, no matter how untrustworthy, we could not command high margins for our products.”

Diabolical scam? Or good intentions gone wrong? A bit of both?

I want to be clear about this: I’m not criticizing the companies like Gomez that make these tests. I am, however, extremely critical of performance vendors who use these tests as a means of communicating the value of their product.

Backbone tests were originally developed to do two things:

  • Monitor uptime/downtime – Telling you if your site is up and running.
  • Spot performance trends – If you see a spike in your graph, then something’s probably wrong somewhere, especially if the spike lasts a while.

Monitoring downtime and spotting performance trends remain valid use cases for backbone tests. Measuring actual page speed, however? Not a valid use case. So why do some performance vendors pretend different?

In recent years, interest in front-end performance has boomed. At the same time, there was a void in measurement tools. At the time, backbone tests filled this void. They were able to do this because, as long as some basic assumptions went unquestioned, they felt close enough that people accepted them without question.

Why backbone tests are deeply flawed measures of real-world performance

Backbone tests are flawed performance indicators because they rely on several basic assumptions that simply are not true:

Measurement factor Why it matters What SHOULD be tested What IS tested
Latency In the real world, we know that most people are, at best, 20-30ms from the closest server. Synthetic clients should be exposed to last mile latency that mimics real-world users. Synthetic clients sit at the elbow of the internet and experience almost no latency. (The waterfall above shows the first asset coming down in 0.002s. Ha!)
Bandwidth In the real world, most users have limited bandwidth (2-5Mbps) and are on cable/DSL/ADSL lines. Synthetic clients should be exposed to last mile bandwidth limitations, just like real-world users. Synthetic clients enjoy warm, cozy homes in plush tier 1datacenters with nearly unlimited bandwidth to the world
Stopping the clock We need to know when to stop measuring, so we know how fast our pages are. Actual browser events are the closest indicators to page speed.  Pages are finished loading when the on-load event fires, which is often way before the last resource is served. In fact, from a user’s perspective, pages are often finished loading much earlier than the onload event. Pages are finished loading when the last resource is served.  (You know all those JavaScripts everyone keeps telling you to defer? Totally immaterial if you test like this!)
Where the resources are Pages often have resources all over the place, near and far; particularly for longtail or low hit rate content. A simulation of what a real user would experience.  Resources that are likely to be near the user should be near it and those likely to be far should be far. For CDN customers, the resources are conveniently located and always available in the rack next to the test box.
(Some CDNs have gamed this system and ensured that their pops are conveniently located next to the servers that perform these types of test.)
Browsers There are many browsers out there, and the modern ones are iterating pretty quickly. The browsers available for synthetic testing should be close to the browsers that are used in the real world.  And they should keep up. We still see many IE6 agents. Many of the agents that report as IE8 actually work as IE7 agents.
JavaScript JavaScript is a big part of the web. Its effect on the browser and page speed can sometimes be significant. Synthetic clients should run client-side code exactly like a normal browser would and report back on the client-side processing impact and its effect on the overall speed of the page. Many tests we see don’t run JavaScript properly.

So how do you get a true measure of your site’s speed?

Many different solutions exist to help you see the truth about exactly what is going on in the real world — from free tools like WebPagetest to real end user monitoring tools. (There are too many to discuss here, but you can find them by Googling “real user monitoring”. When you’re analyzing tools, be sure to check out how each one measures the last mile, the trip between the server and the browser).

While there is no absolutely perfect tool out there (and when I find one, I’ll be sure to let you know about it), I can assure you that these tools are better than the backbone tests you’ve been fed so far.

Related posts: