5 Jun 2013
Rampant page bloat may not be news to many of us, but that doesn’t make new findings any less alarming. According to the HTTP Archive, the average top 1,000 web page is 1246 KB, compared to 828 KB in May 2012. This represents a 50% rate of growth in just one year.
Last fall, when we reported here that pages had taken a sudden 20% jump in payload, we speculated that this could be due to holiday bloat — when site owners stuff their pages with media-rich content to attract shoppers.
However, recent HTTP Archive data suggests that this wasn’t just a case of pre-holiday super-sizing, but instead seems to be a persistent trend. Check out the growth in overall page size — as well as growth in images, scripts, Flash, and more — from November 2010, just two and a half years ago, to now:
A little background about this data: The HTTP Archive is a permanent repository of performance information, including the size and construction of pages. Twice a month, the Archive gathers this data from the top 1,000,000 Alexa-ranked websites via a private instance of WebPagetest. The Archive lets you compare stats over time across all sites, across the top 1,000 sites, and across the top 100 sites.
Throughout this post, I’ll be focusing on stats for the top 1,000 sites. Why? Because it’s fairly safe to assume that these are relatively top-tier sites that have some awareness of performance, making these sites more relevant to our discussion than the general gamut of sites.
Let’s focus on three key sets of numbers.
In November 2010, when the HTTP Archive publicly launched, the average top 1,000 site had a total payload of 626 KB. By May 2013, this number almost doubled to 1246 KB.
What’s even more compelling, much of this growth has happened in the past twelve months. From 2011 to 2012, the growth rate was 22%. From 2012 to 2013, the growth rate was 50.5%.
Much, though not all, of this growth is due to the proliferation of images. Over time, images have held fairly steady — between 52% and 59% — in terms of their proportion of overall payload. In 2010, images accounted for 372 KB of the average page. By 2013, that number grew to 654 KB.
These days, images on the web have to work hard. They need to be high-res enough to satisfy users with retina displays, and they also need to be small enough in size that they don’t blow your mobile data cap in one fell swoop. Responsive web design attempts to navigate this tricky terrain, with varying degrees of success.
Images are a hot-button performance issue for a number of reasons, including:
- Many are in the wrong format. It would be great if everyone who is responsible for creating web graphics knew when to use a JPEG and when to use a GIF, but this isn’t the case.
- Many are poorly sized. I used to work for a well-known design blog, where every contributing editor was responsible for creating and uploading their own images, and it was surprising how many people would upload 1+ MB native images that were 2,000 pixels wide.
- Many are uncompressed. Correct formatting and sizing are a great first step. Compression is the next step. There are a lot of solid image compression tools out there, but unfortunately not everyone is using them.
- Most don’t take advantage of progressive image rendering (PIR). Despite the fact that progressive rendering offers huge performance benefits, only 7% of all images on the web are optimized to allow for PIR.
- Many are hosted in multiple locations. This increases the risk of additional latency and possible outages.
In 2010, “other” resources accounted for just 17 KB of total payload. Since then, this number has risen steeply to 194 KB. My understanding is that this category is largely comprised of video, which we’re we’re going to continue to see grow.
Despite all this growth, is the internet getting faster?
This is a really good question — with more than one answer.
In April of this year, Google published findings, based on Google Analytics data, which suggest that load times have gotten marginally faster for desktop users, and up to 30% faster for mobile users.
Here at Strangeloop/Radware, we’ve found the opposite. Using WebPagetest, we’ve been testing the same 2,000 top Alexa-ranked ecommerce sites since 2010, and our data tells us that top ecommerce pages have gotten 22% slower in the past year.
I point out these differences not to present them for debate, but to indicate just how challenging it is to definitively answer the question, “Are we getting faster or slower?” The greatest challenge is finding a usable yardstick that accurately measures the real end-user experience. There’s pretty much unilateral agreement that such a yardstick doesn’t currently exist. Instead, most of us get by with cross-referencing a variety of RUM and synthetic tools that let us combine high-level data with on-the-ground analysis.
One thing is certain, however: regardless of whether you believe pages are getting incrementally slower or incrementally faster, our work is still cut out for us.
Takeaway: Ongoing vigilance is crucial.
Any performance gains we have made have happened only after a huge investment of time, money, and ingenuity. Given the unceasing growth in page size and complexity, this investment continues to be critical.
Steve Souders wrote an excellent post about Google’s report, in which he outlines where we’ve made advances:
- If the internet has gotten faster, we should collectively thank browser vendors first. Browser development is often a thankless job, but browser evolution is probably the single greatest reason behind performance improvement.
- Faster connection speeds. Although the payoff here is not quite as dramatic as service providers would have us believe. This quick-and-dirty case study illustrates how network speed doesn’t directly correlate to load time. For example, download bandwidth increases 333% from DSL (1.5Mbps) to cable (5Mbps), yet the performance gain is only 12%.
Steve also points out where we’ve fallen behind:
- “The adoption of performance best practices has been flat or trending down.” We’re seeing declining numbers in the adoption of core techniques such as optimizing header caching and avoiding redirects. This could be caused by a number of things: from lack of performance education to lack of dev control over resources (e.g. due to content management systems).