SEO

UPDATED: 13 questions (and answers) about Google, site speed, and SEO

A year ago, I wrote a Q&A post answering the most common questions I get asked about page speed and SEO. As a followup to that post, I recently wrote an updated version for SEOMoz, with new/expanded information about questions such as:

  • How does Google use its toolbar to crowdsource page measurement?
  • Does preloading hurt your ranking?
  • Does a faster start render time help your ranking?

And ultimately the biggest question:

  • How much does site speed really matter when it comes to SEO?

If this is a topic of interest for you, I invite you to check out the post and let me know if you have any new questions.

Related posts:

8 things the front-end optimization community can learn from Matt Cutts’s cloaking video

For years, I have been answering questions about front-end optimization (FEO) and search engine optimization (SEO). (See this post from earlier this year.) Recently, I’ve had several calls with customers who want a definitive answer to the following questions:

In the past, we’ve worked with the Google team to get answers to these questions. Recently, Matt Cutts posted this video on cloaking, based on our discussions. (Shout-out to Patrick Meenan for his tireless efforts here. Thanks, Pat!)

 

Like most search-related answers, the video is a general overview of cloaking, but it contains all of the information we need to answer the questions above.

(Before I get into that, though, I have to mention how much Matt reminds me of Mr Rogers in this video. He seems so approachable and kind. I’d let him take care of my kids.)

Now let’s get into things. In order to analyze Matt’s points, I’ve transcribed the video to the best of my abilities and patience. I’ll be quoting Matt, and then interpreting his comments in the context of web content optimization.

Point #1: You cannot target the bot specifically, and you cannot serve it different content.

“Cloaking is essentially showing different content to users, then to Googlebot. So imagine that you have a web server right here and a user comes and asks for a page. So you know, here is your user. [He's drawing on the whiteboard here.] You give him some sort of page and everybody is happy. And now let’s have Googlebot come and ask for a page as well, and you give Googlebot a page. Now, in the vast majority of situations, the same content goes to Googlebot and users. Everybody is happy. Cloaking is when you show different content to users and to Googlebot and it is definitely high risk; that is a violation of our quality guidelines.

I think he makes the basic principle very clear here: you cannot target the bot specifically, and you cannot serve it different content.

What is key here are two concepts:

  • What constitutes different content.
  • What situations outside of the “vast majority of situations” are acceptable use cases for sending different content to different browsers.

Let’s keep going as Matt expands on both of these concepts.

Point #2: Intent is important.

“Why do we consider cloaking bad, or why does Google not like cloaking? Well, the answer is sort of in the ancient days of search engines, when you saw a lot of people do really deceptive or misleading things with cloaking. For example, when Googlebot came, the web server that was cloaking might return a page all about cartoons, Disney cartoons or whatever, but when a user came and visited the page, the web server might return something like porn. And so if you do a search for Disney cartoons on Google, you will get a page that look like it would be about cartoons, you would click on it, and then you would get porn. That is a hugely bad experience. People complain about it. It is an awful experience for users. So we say that all types of cloaking are against the quality guidelines.  There is no such thing as “white hat cloaking”. Certainly when somebody is doing something especially deceptive or misleading that is when we care the most that is when the web spam team really gets involved.”

In this section, Matt makes it clear that intent is important. The cloaking debate really centers around the issue of intentionally misleading and deceiving users. Like most Google initiatives, the purpose of banning cloaking is to ensure that the system has integrity and does no evil. When analyzing how front-end optimization deals with bots, we need to keep this principle in mind.

Let’s keep going.

Point #3: Testing with simple hash comparisons won’t work for dynamic sites.

“Okay, so what are some rules of thumb, to save you trouble or help you stay out of a high-risk area? … Take a hash of a page — take all that different content and boil it down to one number [the hash] — and then pretend to be Googlebot. You know, with the Googlebot user agent. We even have a “fetch as Googlebot” feature in Google webmaster tools. So you fetch a page as Googlebot and you hash that page, as well, and if those numbers are different [i.e., the page hash taken from a browser versus the hash taken from the bot's perspective], then that could be a little bit tricky. That could be something where you might be in a high-risk area. Now pages can be dynamic — you might have things like time stamps, or the ads might change. So it’s not a hard and fast rule.”

Obviously, a simple hash of a dynamic page will not give you the answer to the cloaking question. Sites are so dynamic that this test in its simple form would simply not work for most pages. I tried this on 10 prominent sites and found that the hashes were completely different due to dynamic content such as ads and changing content. We need to keep searching for answers.

Point #4: Targeting the bot and serving it different content is a clear violation.

“Another simple heuristic to keep in mind is, if you were to look through the code of your web server [or in the WCO market, your friendly neighborhood automation vendor :) ], would you find something that deliberately checks for a user agent of Googlebot specifically, or Googlebot’s IP address, specifically? Because if you’re doing something very different or special or unusual for Googlebot — either its user agent or its IP address — that has the potential to, you know, maybe be showing  different content to Googlebot than to users. That is the stuff that is high risk. So keep those kinds of things in mind.”

This provides good guidance for front-end optimization. Any FEO solution that is targeting the bot specifically and serving it different content is clearly violating the rules. At Strangeloop, we don’t do this and I’m not aware of anyone that does in our industry. (I checked a number of other vendors while preparing this post.)

Next, Matt transitions to a few examples that are very relevant to our world.

Point #5: Serving different content to different clients, based on client needs, is okay.

“Now, one question we get from a lot of people who are white hat and who do not want to be involved in cloaking in any way,  and who want to make sure that they stay clear of high-risk areas [that's me, and ostensibly, if you're still following along, it's you too]: what about geolocation and mobile user IDs, so you know, phones and that sort of thing? The good news, in an executive sort of summary, is that you don’t really need to worry about that [geolocation/mobile user IDs], but let’s talk through exactly why geolocation and handling mobile phone is not cloaking.”

In other words, addressing different client needs is not cloaking. He continues with his example, but it is important to note that serving different content to users based on capabilities is clearly defined as acceptable. Like mobile phones have different capabilities, so do different browser versions.

For more clarity, let’s examine Matt’s examples.

Point #6: Treat Googlebot like any normal desktop browser.

“Okay… so until now we have had one user. Now let’s go ahead and say this user is coming from France. And let’s have a completely different user, and let’s say maybe they are coming from United Kingdom. In an ideal world, if you have your content available on a dot-FR domain or dot-UK domain or different languages because you have gone through the work of translating them, it is really, really helpful if someone coming from a French IP address gets their content in France. They are going to be much happier about that. So what geolocation does is, whenever a request comes in to the web server, you look at the IP address and you say ‘Ah, this is a French IP address. I am going to send them the French language version or send them to the dot-FR version of my domain.’  If someone comes in and their browser language is English or their IP address is something from America or Canada or something like that, then you say, ‘Aha, English is probably the best message.’ Unless they are coming from the French part, of course. [I like the shout out to my friends in Quebec.]

“So what that is doing is, you are making the decision based on the IP address. As long as you are not making some specific country the Googlebot belongs to–  ”GoogleLandia” or something like that — then you are not doing something special or different for Googlebot. At least currently, when we are making this video, Googlebot crawls from United States, so you would treat Googlebot just like a visitor from the Unites States. You’d serve up content in English and we typically recommend that you treat Googlebot just like a regular desktop browser, so you know, Internet Explorer 7 or whatever a very common desktop browser is for your particular site. So geolocation — that is, looking at the IP address and reacting to that — is totally fine, as long as you are not reacting specifically to the IP address of just Googlebot, just that very narrow range, and instead you are looking at what is the best user experience overall depending on the IP address.

This example really helps us understand our role in front-end optimization. The goal is to provide the best user experience, and this can change depending on country or browser. Matt is asking us to treat the bot like we would any normal desktop browser. Don’t do anything special for it.

His mobile example, next, provides further clarity.

Point #7: Case in point – Serving customized “squeezed” pages to mobile devices is fine.

“In the same way, if someone now comes in — and let’s say they are coming in from a mobile phone, so they are accessing it in an iPhone or Android phone — and you can figure out, okay, that is a completely different user agent. It has got completely different capabilities. It is totally fine to respond to that [mobile] user agent and give them, you know, a more squeezed version of the website or something that fits better on the smaller screen. Again, the difference is, if you are treating Googlebot like a desktop user, so that user agent doesn’t have anything special or different that you are doing, then you should be in perfectly fine shape. So, you know, you are looking at the capabilities of the mobile phone, you are returning an appropriately customized page, or you are not trying to do anything deceptive or misleading, you are not treating Googlebot really differently based on the user agent, and you should be fine.”

Matt clearly states that it is okay to give the user an experience that is tailored to the browser’s capabilities. This is strong endorsement of the fact that advanced FEO features, which only apply to one user agent, are perfectly legal and encouraged, so long as the bot is not treated in any special way.

He gives even more insight below.

Point #8: No, really, you can’t treat Googlebot differently than you treat users. Ever.

“So the one last thing I want to mention — this is a little bit of a power user kind of thing — is some people are like ‘Okay, I won’t make the distinction based on the exact user agent string or the exact IP address range that Googlebot comes from, but maybe I will, say, check for cookies and if somebody does not respond to cookies, or if they don’t treat JavaScript the same way, then I will carve out and treat that differently.’ The litmus test there is: are you basically using that as an excuse to try to find a way to treat Google differently or to try to find some way to segment Googlebot and make it do a completely different thing. So again the instinct behind cloaking is: are you treating users the same way as you are treating Googlebot? We want to score and return roughly the same page that the user is going to see. So we want the end user experience when they click on the Google result to be the same as if they’d just come to the page themselves.

“So that is why you shouldn’t treat Googlebot differently, and that is why cloaking is a bad experience, why it violates our quality guidelines, and that is why we do pay attention to it. There is no such thing as “white hat” cloaking. We really do want to make sure that the page the user sees is the same page the Googlebot saw.

“Okay, I hope that kind of helps.”

Thanks, Matt. This does help.

To summarize:

  1. It is safe to provide different pages to mobile browsers, different locations, and different user agents, so long as the Googlebot user agent, its IP addresses, or its capabilities (i.e., no cookies) are not directly targeted.
  2. Treating the Googlebot like a desktop browser with the basic acceleration treatments one would apply to any generic browser is fine and encouraged.
  3. As I was just re-watching the video, another concept stuck out: content. Matt repeatedly says not to serve different content to Googlebot. The operative question here is: if Googlebot was a full-featured browser, would it see the same page (including images, etc.) as a normal browser? If we’re not changing the content itself and only manipulating the way the content is delivered (through techniques like inlining or MHTML or DataURLs or whatever), then we’re clearly not in violation.
  4. Intent is key. Something that is geared toward speeding up sites, and that has no intention of deceiving users, is safe so long as it abides by Google’s rules and regulations.

I’m confident you are safe if:

  • You don’t target the bot or Google IP addresses specifically.
  • You don’t try to game search engines by using methods to treat browsers that don’t accept cookies or that don’t use JavaScript differently.
  • You provide the basic features that apply to all browsers to all browsers, including bots.
  • You save advanced features for the specific browsers for which they are built.
  • You don’t change the actual content (images, etc.) you serve to Googlebot.
  • Most important, your intention is to speed up pages and not do evil.

Related posts:

Why you should care about Google’s changes to its mobile AdWords algorithm

Last week, when Google announced that your mobile site’s performance is now a factor in how Google determines its AdWords quality, it didn’t get as much buzz as its 2010 announcement that site speed would affect Google search ranking. But it should have.

From The Google Mobile Ads blog:

In the coming weeks, we will be introducing the mobile optimization of a website as a new factor of ads quality for AdWords campaigns that are driving mobile search traffic. As a result of this change, ads that have mobile optimized landing pages will perform better in AdWords — they will generally drive more mobile traffic at a lower cost.

If you run AdWords campaigns on a regular basis, this is obviously big news. But it’s big news beyond the world of paid search, too. Google is sending out an early warning to site owners: make your mobile site faster, or you’ll be left behind.

Countless studies tell us that, globally, mobile is going to leave the desktop in the dust. And even more studies tell us that people expect mobile sites to be at least as fast as sites on the desktop. But looking at a sampling of leading m-commerce sites — Keynote’s latest mobile commerce performance index is 10.15 seconds – it’s hard to detect any urgency on the part of site owners to deliver a faster mobile experience.

Before we get into that, a little background.

How AdWords works

For those new to AdWords, here’s a quick breakdown of how it works (more detailed info here):

  1. As an advertiser, you create your ads and choose your keywords. You set a daily cap and a per-click cap on how much you want to spend on your AdWords campaign. Per-click costs can range from a penny to $10 or more, but they’re generally in the range of one or two bucks. Your caps are used as your bid in an ongoing auction for ad space.
  2. When people use one of your keywords for a Google search, your ad may appear next to the search results. (Note the operative word here: may.)
  3. Every time someone clicks on your ad, you pay Google.

According to Wikipedia, click-through rates (CTR) for ads are about 8% for the first ad, 5% for the second one, and 2.5% for the third one. The ordering of the paid-for listings depends on other advertisers’ bids and the “quality score” of all ads shown for a given search.

So as an advertiser, your goal is to get the top ad spot, and the only way to do this is by having a good quality score for your ad. So how do you do this?

What is the “quality score” and how is it determined?

This is Google, so of course we’ll never know the exact recipe for their secret sauce, but they have shared this description:

A Quality Score is calculated every time your keyword matches a search query — that is, every time your keyword has the potential to trigger an ad. The AdWords system calculates a Quality Score for each of your keywords. It looks at a variety of factors to measure how relevant your keyword is to your ad text and to a user’s search query. A keyword’s Quality Score updates frequently and is closely related to its performance. In general, a high Quality Score means that your keyword will trigger ads in a higher position and at a lower cost-per-click (CPC).

To recap, in order to have a good quality score — for both desktop and mobile searches — your AdWords campaign needs to have:

  • relevant keywords,
  • relevant ad text,
  • a strong CTR on Google, and
  • a decent CPC bid.

This combination of factors is meant to be a boon for small business owners, because you can’t be locked out of the ranking system based solely on your bid, and you can’t necessarily win the top spot just by driving a dump truck full of money up to Larry Page’s house.

So where does mobile come into the picture?

This new announcement means that, in addition to all the factors above, the landing page quality of your mobile site is now a major factor for AdWords campaigns that drive mobile traffic. “Landing page quality” refers to everything from layout to mobile/touch features to landing page load time.

None of this is entirely new. Google says that, last year, it began to limit ad serving on some mobile devices if the ads pointed to Flash-heavy landing pages. Interestingly, this change was rolled out quietly, with no media fanfare.

On the surface, Google’s mobile AdWords changes may not sound as dramatic as its site speed/SEO announcement, but I see these changes as extremely telling. Whether making big public announcements or quietly rolling out changes behind the scenes, Google is an inexorable juggernaut when it comes to site speed. Now Google has clearly set its sights on mobile. These early algorithm changes are just the first of many we can expect in the very near future.

Related posts:

FAQs: The 12 most-asked questions about how Google factors page speed into its search rankings

It’s a well-known fact that site speed is a critical ranking factor for organic search. The big question has been how exactly Google does this. This is probably one of the most-asked questions I receive, and the answers aren’t easy to find.

Over the last year and a bit, I’ve done quote a bit of digging to get the answers, and I thought it would be useful to create an FAQ-style repository for the answers.

(Note: Google is, understandably, not 100% forthcoming with how it works. I’ve tried my best to fact-check my answers with Google employees and outside sources. If you think my answers are incorrect, let me know in the comments.)

1. Does the Google search bot track page load time?

No. The Google search bot has nothing to do with speed.

2. Does Google use synthetic test or real end user monitoring to gather its data?

I’ve talked in the past about how misleading speed metrics can be. Google actually uses real end user monitoring (RUM) to check site speed. This is the right thing to do. They’re measuring from users’ actual web browsers from real bandwidths — no simulations.

3. How does Google gather the data?

Google crowdsources page measurement by measuring site speed using the Google toolbar with Pagerank checking activated on the public’s computers. The results are “radioed” back to the mothership.

4. What browsers does the Google toolbar use?

The toolbar is available on Internet Explorer and Firefox only. More specifically, the toolbar is available on IE 6+ and Firefox 2-4. It is not supported on Firefox 5 which has led to speculation that Google has another plan for capturing this data but no other details have emerged.

5. What does the Google toolbar measure?

It measures onload time. This also includes third-party display ads, third-party scripts, etc.

6. What pages does Google measure?

They measure all pages visited by users on your site.

7. Do they measure pages marked as non-crawlable?

Yes. They measure pages your users use, not what you have told Google is crawlable

8. What if my page is personalized and has very different content for authenticated users but the same URL?

The measurement makes no distinction for personalized content if the URL remains the same. The results will be averaged together.

9. Does Google use its new Google Analytics page speed feature?

No, to the best of my knowledge they do not use any of the new data collected in Google Analytics for this purpose, but they should as it would allow them to sample more modern browsers.

10. Will pre-loading content on a page hurt my ranking?

No, because the results are based on the onload time measurement.

11. Will deferring a call help my rankings?

Yes. Anything that helps the page get to the onload event faster will help.

12. Will having pages start render faster help my rankings?

Unfortunately, no. It would be great to augment the system with some way to benefit pages that start loading faster.

Do you have any new/different/conflicting answers? Questions that aren’t answered here? Let me know in the comments.

For more reading on this subject, check out these articles:

Related posts:

Is the J.C. Penney SEO scandal relevant to the web performance industry?

If you haven’t yet read The New York Times article The Dirty Little Secrets of Search, you should. It reads like a shady underground exposé of an art smuggling ring — making SEO sound almost glamorous. One of the most interesting things I’ve read all week.

What I found most interesting is the distinction the article makes between activities that are actually illegal and those that are “Google illegal”:

Despite the cowboy outlaw connotations, black-hat services are not illegal, but trafficking in them risks the wrath of Google. The company draws a pretty thick line between techniques it considers deceptive and “white hat” approaches, which are offered by hundreds of consulting firms and are legitimate ways to increase a site’s visibility. Penney’s results were derived from methods on the wrong side of that line, says Mr. Pierce. He described the optimization as the most ambitious attempt to game Google’s search results that he has ever seen.

In 2006, Google announced that it had caught BMW using a black-hat strategy to bolster the company’s German Web site, BMW.de. That site was temporarily given what the BBC at the time called “the death penalty,” stating that it was “removed from search results.”

BMW acknowledged that it had set up “doorway pages,” which exist just to attract search engines and then redirect traffic to a different site. The company at the time said it had no intention of deceiving users, adding “if Google says all doorway pages are illegal, we have to take this into consideration.”

J. C. Penney, it seems, will not suffer the same fate. But starting Wednesday, it was the subject of what Google calls “corrective action.”

The article doesn’t specify the exact terms of this corrective action, but it notes that J.C. Penney’s ranking has plummeted from #1 down to #68 and #71 (and worse) for several key search phrases.

A unilateral ranking change of that magnitude would be devastating to any web site. When the “mall” of the world places your store in the basement of the annex building, sales plummet and business dies.

I’ve been wondering if any of this J.C. Penney scandal is relevant to the performance industry. To date, I’ve always assumed that the automated optimization techniques used by Strangeloop and other solution providers fall into the “white hat” camp, since we derive some of our techniques from Google’s own list of performance best practices, and especially since Google has spearheaded the development of an open source tool for automating fundamental best practices called mod_pagespeed.

However, when you examine Google’s design and technical guidelines  perhaps we should not so confident. One area that caught my interest was around cloaking:

Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.

Most, if not all, of the sophisticated automated optimization solutions — Strangeloop included — serve different content to users and search engines. We do this to optimize performance, not to game search results. I have asked my friends at Google to look into this and I have been repeatedly reassured that the techniques we (and by extension our competitors) employ are safe.

Can someone at Google (Matt Cutts et al) confirm publicly for us that optimization by browser group is not “Google illegal”? That what we are doing is safe, helpful and harmonious with the Google mission of a faster web?

Has anyone else asked this question of Google? Can someone point me to a Google article that demonstrates that browser-based optimization is safe from the “cloaking” brush?

Related posts: