No Comments

Look beyond the first results page

SEO digest

Someone recently presented me with a link analysis report summary drawn from Yahoo! Site Explorer. A site that barely makes it to the second page of search results for only a moderately competitive expression supposedly has 10s of thousands of backlinks.

There’s a huge disconnect between reality and the link report, which I was able to replicate with a fairly simple query on Yahoo!. The SEO tech who had run the report didn’t bother to check the “facts” of the report.

A few quick queries would have revealed that most of the links don’t exist. These queries would also have revealed that Yahoo! and Google both knew about approximately the same number of linking pages (just under 100) from one particular site. Fewer than 100 pages, rather than thousands.

It is these types of stupid blunders in competitive intelligence that lead me to criticize and rebuke the SEO community so vehemently. You cannot possibly understand why a search query looks the way it does if you don’t bother to check the “facts” your favorite SEO tools provide you.

There are other stupid mistakes many SEOs make in their competitive analyses, including:

  1. Looking only at the first page of results in a site query and concluding a site really has whatever number of pages indexed that the search engine estimates
  2. Assuming that Google’s “these terms only appear in links pointing to this page” statement in page cache data means the terms really do only appear in links pointing to the page
  3. Using Google’s date range to find recently indexed content
  4. Using Google’s Blogsearch for link analysis

Bogus Results Estimates - Much though I would like to trust those initial estimates the search engines provide us for number of results, Matt Cutts has said on more than one occasion that they are just rough estimates.

In other words, if you ask Google how many pages from SEO Theory it has indexed, it will tell you an initial estimate that differs from what it actually is willing to report.

Neither number is necessarily an accurate reflection of how many pages Google has crawled from a particular site, however. They could have just dropped some pages from their index and may not yet have recrawled them. Or they may have data on additional pages that has not yet been fully processed. This behavior appears to be typical of all the major search engines.

You’ll see equally suspicious results estimates if you just type in random queries. There may or may not be 7,000,000 documents that are relevant to any particular query.

Bogus Link Data - Just because a cache report says that terms only appear in links pointing to a page does not actually mean there are any links pointing to the page with those terms in anchor text. Search Engine Roundtable first reported a related behavior for the Supplemental Results Index (this was in late 2006, prior to the Google 3.0 or Searchology Update).

Although Google re-engineered its service in 2007, to this date I can still find examples of document cache listings that falsely state query terms are only found in inbound links. Not only do the documents contain the query expressions, I have tested this effect with documents where I control all the links.

It’s easy to guess that these may be pages in the Supplemental Results Index, but that may not necessarily be the case. Google seems to incorporate page data into its index in a gradual, staged process. I’ve found some of these page caches eventually do report the query terms correctly after a few days.

Using Google’s date range function - This function has never worked properly and it amazes me no end to see certain people in the SEO community continue to recommend it as a useful resource. Clearly, people are not comparing the results that the date range queries return against other results. You can find pages in the index that were only created within the past few days or weeks that do not appear in date range queries.

Blogsearch for link research - This idea is bizarre. The Blogsearch Index includes a lot of content that doesn’t appear in the Main Web Index, so assuming you do find links through Blogsearch that are not reported by Web search, what does that tell you?

Answer: Nothing useful.

Using one database to analyze patterns in another database doesn’t work. It doesn’t matter if Google operates both databases. If Blogsearch gives you access to content that doesn’t appear in Main Web search, then your Blogsearch-based backlink reports are useless for analyzing Main Web search results.

If they report it, don’t believe it

Every search engine returns data that is suspect. Before you assume the numbers you’re seeing are realistic, test them against other queries. You can run many different types of queries on the same search engine. Sometimes you can find comparable data being reported by multiple search engines. It’s not impossible to do accurate link research on Google but you can’t just use the basic query operations the SEO community has embraced for years.

If you don’t question and test the data you obtain from competitive analysis, you have no way of knowing how much (or how little) you have actually learned about what your SEO competitors are doing.

www.seo-theory.com

published @ October 29, 2008

Similar posts:

Sorry, the comment form is closed at this time.