Here at Strangeloop, we’re in the process of compiling the results of our Alexa 2000 performance study (which I wrote about here) into a formal, more detailed report. I was looking at one of our spreadsheets, which contains all the data about content delivery networks (CDNs), and had a Rainman moment that led to the creation of this scatter graph:
Here you can see the top 1000 retail sites, ordered from left to right. The blue diamonds represent the sites that use a CDN. The red squares X’s represent the sites that don’t. Not surprisingly, CDN use is much more prevalent among the higher-ranked sites. and non-existent among the lower-ranked sites.
Now look at the vertical axis, which indicates the page load times for each site. This is where things get really surprising.
As you probably know, content delivery networks make sites faster by caching content closer to users. Therefore if CDN use automatically correlates with faster-than-average websites, this graph should look profoundly different:
- The blue diamonds should be clustered exclusively in the bottom left sector. Instead, the blue diamonds are almost an identical overlay of the red squares X’s.
- I would also expect that lower-ranked, non-CDN-using sites would have, on average, markedly slower load times, meaning that the red squares X’s on the right side of the graph should be clustered in the top-right corner. But instead you can see that the right half of this graph is almost a mirror image of the left half.
There are a few possible explanations for these results:
- Website owners believe they have “fixed” their site speed issues once they start using a content delivery network, and therefore they stop investing in other performance best practices. Part of the reason they may believe their sites are faster is because they are relying on backbone tests to give them a readout of their websites’ speed. As I’ve written before, backbone tests are not a good indicator of performance.
- The CDN’s edge caches may be too full to contain all the objects needed to deliver fast web pages.
- A CDN’s edge caches contain the objects for pages that are most frequently requested by users. If objects are not pulled often enough, they will not remain in the cache, meaning that you won’t always receive the full benefit from your CDN.
- CDN pops are too far away from the end users and offer limited benefit.
It’s interesting to note that companies that are not currently using CDNs are making good use of other performance best practices to deliver comparable site speed. When these companies make the decision to add a content delivery network to their mix, they have a good chance of surpassing those sites that rely solely on their CDN. And because performance is a key factor in driving online sales, we could see some interesting shake-ups in next year’s Alexa retail ranking.
This research brings up a number of unanswered questions. I intend to dig deeper into this to find out if these findings correlates to which content delivery network provider is selected, what type of CDN they use (small object or whole site) and the impact of geography and sure type.
Edited on 12/22/2010 to add:
I decided to generate a new graph to make it more readable, using a less heavy X icon in lieu of the red squares. In generating the new graph, I realized that I had incorrectly displayed the sites that use CDNs (the blue diamonds), so that they were compressed in the left half of the graph rather than across the entire width. The new, correct graph now appears above. You can see the old graph here. This correction doesn’t affect the observations and theories contained in this post.