Google query functions that don’t work

With everything Google does, I would not expect everything that Google does to work. That’s just the way it goes with software. You can always find something that is screwy with any application and search engines are no different.

Still, the SEO community puts a great deal of time and effort into using these queries and it amazes me that so few people have realized just how broken they are. You have to ask about the quality of the advice you’re being given when so many advisors tell you to use these queries.

Query 1: Date range search

How you use it: You can click on the Advanced Search option beside any search box on the Google system and set a date range. Some people just embed Julian dates in their query strings.

What the problem is:

The search filter only returns — at best — a sampling of content actually published or crawled or indexed in the specified date range.
Other query operators stop working when used in conjunction with this query operator.

Scope of the problem: I have found the problem in Web search, Scholar search, and Blogsearch. It may extend to other search tools but I haven’t tested for it, except in News search, which is so sensitive to news source availability and restrictions I cannot really tell if there is a date range problem or not.

How you can confirm the problem for yourself: If you have the patience for such tedium, you can compare your server log data (showing when Google first fetched new content from your site) to Google’s cache date (showing when Google first published the new content from your site) to the results of a site: search combined with a date range. You may or may not see the new content. You can also try extending the date range back to as much as a year. I have confirmed on numerous sites (even comparing to referral data in Google Analytics) that the date range function omits a lot of content that first appeared within its range and for which Google sent traffic referrals.

Every time I have found these discrepancies, I have checked to see if the missing pages were indexed in Google. On every occasion I was able to find them in the search results. However, I don’t know if the omissions are limited to pages found only in the Supplemental Results Index. Since Google won’t clearly tell us what is Supplemental, we cannot troubleshoot this problem for them.

How it impacts search engine optimization: The complete unreliability of this function makes it a waste of your time, if you’re using it to find new content. You’ll miss the majority of new content from most sites if you rely upon this function. If you study enough date-range results for a lot of sites, you may be able to detect some sort of pattern. I don’t have that much patience.

Query 2: Scholar search

How you use it: Type keywords into the search box and Google will return results that are drawn from “scholarly” resources (including academic journals, conference presentations, library and university databases, etc.).

What the problem is: There is no way to filter out the paid-access, subscription-only results from the free results.

Scope of the problem: Anyone with an academic or research budget is probably working for or with an institution that subscribes to all or many of these services. Everyone else is screwed. This is like locking the user into viewing only Webspam results. There is still a lot of academic research that is published for free, even organized for free perusal. Getting to it through Google Scholar is very, very difficult.

How you can confirm the problem for yourself: Run a random search and click on any five results.

How it impacts search engine optimization: Obviously, if you’re trying to do research into search engine theory you’ll miss out on a lot of ideas. But if you’re working with an engineering, research, or otherwise technical client you may find your ability to develop content for them limited by the scope of your access to online research data. I have had more than one customer ask me if we can get to this kind of data without having to pay for access to multiple portals and journals. I have to tell them every time that I just don’t know. It’s time-consuming to find good research information that doesn’t require you to sign up for services you may never use again.

Some clients are willing to pay for the access, so you can bill it back to them. Some clients are not willing. This is not just an SEO’s frustration, it’s a business community’s frustration. Data data everywhere, and not a byte to read.

Query 3: Image search

How you use it: Type a query into the box.

What the problem is: The technical limitations of indexing images is well documented. I don’t fault Google for not knowing it showed me a picture of “Barbara Carrera” on a page that mentions “Lucy” and “the lawless masses” in a paragraph far removed from the Carrera picture when I am searching for “Lucy Lawless” pictures. Nonetheless, Image Search frequently shows us images where the filenames are obvious keyword-dependent (like carrera-001.jpg). Google does offer us the option of helping grade images anonymously but their system sucks.

Scope of the problem: I find this problem exists on all image-indexing search services, although some actually do a much better job than Google.

How you can confirm the problem for yourself: Just type in random searches for pictures. The common the keywords you use, the more likely you’ll find irrelevant images.

How it impacts search engine optimization: The jury is still out on whether it’s worthwhile to optimize for image search. Those of us who host images on our sites have to worry about hotlinking, and most (if not all) hotlinkers find their images through image search. Image search is more of a pain than a source of joy for me as a Webmaster.

Nonetheless, some types of sites (such as stock photo sites, image archive sites, photographers showing off their portfolios, etc.) can benefit from a structured image search. Part of the solution to this problem has to rest with the Webmasters and perhaps the engineers who develop HTML coding standards. But I think the search engines can offer us a better tool for optimizing image search.

What I’d like to see search engines do: Let us submit XML image sitemaps. Let us name the images and provide brief image meta descriptions. Yeah, we should be using ALT= text, but not everyone does. An SEO can help a client organize image data with XML image sitemaps. A normal sitemap won’t suffice. Just showing you an image exists won’t solve the problem. I want to NAME the image and DESCRIBE IT. Limit the keywords we can use. Limit the characters we can use. Just let us tell you what the images are, which pages use them, and maybe that will help improve everyone’s experience.

No, you cannot prevent us from lying about the content of our images. But I would not hold any search engine liable for the lies a Webmaster makes about his content. That’s a Webspam issue and the search engines certainly want to ensure that I am not seeing lots of Webspam in my queries. I’m game with that principle, but I think every search engine could offer people a structured image search that flags XML-indexed data as such and allows the user to indicate whether the XML-supplied information is accurate for quality assurance.

Yes, people could figure out ways to abuse a quality rating function, but it would be a better system than the current one.

Query 4: Blogsearch link query operator

How you use it: Go to blogsearch.google.com and use the link: query operator just like you normally would.

What the problem is: It doesn’t work. And sometimes it tries to work when I don’t want it to. Why does the daggumed thing turn every query for a domain name reference into a link search? I HATE that. I’ll let you know if I want a bogus link report from Blogsearch, thank you.

Some SEOs advocate using the Blogsearch interface for link research. Again, you don’t learn much from link research anyway, but if you’re hoping to get better results from Blogsearch than from Web search, you don’t understand what is going on. Blogsearch operates from a different index than Web search. That means you’re not searching the Web indexes when you’re using Blogsearch.

Scope of the problem: The autoconversion to a link search is a worse issue than the fact that the link reports are useless.

How you can confirm the problem for yourself: Go to Google blogsearch and type in a domain name. And then do a link search for a page where you know all the links (on blogs) are. Compare the results to what you know about the page’s link profile. Use pages that have links from unpopular blogs (the normal, every day blogs that we all write). TechCrunch is more likely to appear in a Google link than most other sites, so don’t put blinders on and only search for sites that huge, well-linked, frequently updated blogs link to.

How it impacts search engine optimization: Well, if you’re a link-search nut you’re wasting your time. If you are linking for linking resources in blogs, then just look for blogs that discuss the topics you want links for. Use your keywords, not link:. If you’re trying to do competitive analysis, well, start here.

What I’d like to see Google do: Please stop autoconverting every domain name I enter (when I forget that you do this) to a link search. I’ll use the link: operator when I’m good and ready to waste my ti–er, search for links.

Query 5: Site search

How you use it: Type site:some-URL. This is actually a URL search operator, since you can restrict the results to just one subdomain or subdirectory.

What the problem is: The Omitted Results filter is killing me. I get that there may be duplicate content out there on the Web, but when I’m running a SITE search, I usually WANT to see everything I can (up to the first 1,000 pages).

Scope of the problem: It kicks in when page titles and meta descriptions are the same. It often does NOT kick in when everything else is duplicate but the page titles and meta descriptions are unique. Google, can you be more obvious about how shallow this “duplicate content” filter really is?

How you can confirm the problem for yourself: Run a site search on any blog that doesn’t “optimize” for search. Or do it on an ecommerce site. Most of you have probably seen this by now (more than once).

How it impacts search engine optimization: It’s frustrating, when you’re trying to analyze someone’s site, to have to click on that daggumed Omitted Results link — especially when you’re dealing with fewer than 20 results because often you have to come back and do the query over again from scratch in order to get to the second page. Three clicks and you’re back at the first page is not an ideal user experience in my oh-so-humble-SEO opinion.

And imagine what this must be doing to the thousands, perhaps millions of Webmasters who use Google’s search box for their site search functionality. That Omitted Results thingee is not worth the money we paid for it (indirectly through all the merchandise we’ve bought from merchants who have paid Google money for advertising).

What I’d like to see Google to do: Why can you not allow me to turn this dumb thing off in my preferences? Why can you not assume, “Hey, it’s a SITE search so maybe this Bozo WANTS to see everything on the site”? Come on, Google — get rid of Omitted Results in SITE search, or at least let us turn it on or off through preferences.

Frankly, you’d have done the Web community a much bigger favor by getting rid of “Omitted Results” than by pretending that the Supplemental Results Index doesn’t exist any more.

Wrapup

There are other query operator behaviors I’d like to see Google fix, alter, or improve but too long a list discourages my hope that anything will change. Googlers, please feel my pain. I assure you, I am NOT the only person who feels it.

www.seo-theory.com

published @ September 4, 2008

Home Page

SEO news

Search

Categories:

Ads:

Archives:

Ads:

Meta:

No Comments