Copyright and Cacheing: What Happens If You Change Your Mind About Letting a Search Engine Cache Your Site?
A federal court recently ruled that a lawsuit against Yahoo! and Microsoft for displaying cached versions of websites after the website owner has complained can go forward.
Some of you may be experiencing a tingling feeling of deja vu. That's because the same plaintiff who brought this lawsuit against Yahoo! and Microsoft brought a copyright infringement lawsuit against Google several years ago. SeeParker v. Google, Inc., 422 F. Supp. 2d 492
(E.D. Pa. 2006), aff’d, 242 Fed. App. 833 (3d Cir. 2007) (non-precedential), cert. denied, 128 S. Ct. 1101 (2008). In that case, the Court ruled that Google was not liable for direct copyright infringement by archiving and displaying usenet postings that contained copyrighted material, and also by displaying excerpts of websites in a list of search results. The case was one of many major search engine "wins" validating the way search engines operate and return content.
However, the case did not resolve all issues regarding search engine listings. The Parker v. Google court didn't make a ruling on whether Google committed direct copyright infringement by republishing "cached" copies of web pages on Google's own site. Parker's case against Yahoo! and Microsoft, however, directly examines the cacheing issue.
The Parties
Yahoo! and Microsoft need no introduction, so I'll skip straight to the plaintiff.
Gordon Roy Parker (AKA Ray Parker) is the author of several copyrighted works, included Outfoxing the Foxes and Why Hotties Choose Losers. Both are published online and freely available from Parker's website. Parker, a rather frequent litigant, represents himself in his lawsuit against Yahoo! and Microsoft. It's important to note that Parker did not employ the appropriate robots exclusion protocol to prevent search engines from crawling, indexing or displaying his content. Further, he did not send either search engine a take-down notice requesting that they remove the content. He went straight to filing this lawsuit.
Parker is suing Yahoo and MS because they create and republish allegedly unauthorized "cached" copies of his works.
The Claims
Parker claims that by making cached copies of his websites available to their users, both Yahoo and Microsoft republish his works in their entirety without his permission. Accordingly, Parker brought a bunch of claims, but I'm only interested in direct copyright infringement for purposes of this post. (The rest of the claims get dismissed outright anyway).
It's worth noting that the case is nowhere near a trial stage. The decision I'm writing about today deals with legal technicalities about whether it is even possible to bring these kinds of claims. Basically, Yahoo and MS asked the judge to dismiss the case before it really got started because Parker's claims aren't valid and don't make sense.
The Ruling
Generally, the judge agreed with Yahoo and MS and dismissed most of the claims outright. Surprisingly, the Judge allowed the direct copyright infringement claim to go forward. Well, sort of anyway.
Based on the law previously established in the Google case, the judge ruled that Yahoo and Microsoft are not breaking the law when they initially download Parker's website for the purpose of indexing (assuming they follow robots exclusion protocols). The only unresolved issue, thus, is whether Yahoo! and Microsoft commit copyright infringement by displaying cached copies of Parker's website.
The judge ruled that, at least initially, search engines do not infringe copyright by displaying cached copies of websites that don't utilize robots exclusion protocols. According to the judge, search engines are allowed to index and display cached copies because it is a reasonable assumption that if a website owner doesn't want his or her site to be indexed and displayed, he or she will use robots.txt to communicate to the engines. Thus, there is presumed permission, or "implied license," to let search engines do their thing. The onus is on the website owner to tell them no.
Thus, the Judge ruled that to the extent that Parker was seeking to hold search engines liable for initially indexing and displaying his cached content, the case is dismissed; Parker gave the search engines implied license by not using robots.txt.
HOWEVER, the judge did not completely dismiss the case. She allowed part of the case to go forward.
The judge ordered that Parker can only continue the case on the issue of whether Parker revoked his permission by filing this lawsuit. Thus, the court left open the proposition that the search engines may be liable for infringement once they knew or should have known that they no longer had permission to display the cached content. The unresolved question is what does a website owner have to do, if anything, to put the search engines on notice that she doesn't want her site's cached content to be displayed?
Changing the robots.txt protocols and waiting for the search engines to re-crawl could take months, but for most people in most situations, that will be sufficient. Sending a take-down notice is almost assuredly the quickest way to get your cached content removed from the search engines. That's certainly what I would do in an emergency. It seems to me like Parker chose the most laborious and expensive way to do it, filing a lawsuit. Could it be he doesn't really care about the cached content being displayed? Perhaps he's just more interested in the attention? In which case, posts like mine do nothing but encourage wasteful lawsuits.
Conclusion
The fall-out of this particular legal circus is that search engine practices are even further legitimized.
I for one think this is a good thing. As a consumer, I get tremendous value out of search engines and how they operate, including cached pages. I'm a little perplexed by Parker's motivations. He knows he can opt out, he is just choosing not to do so. I suppose he has a world view that puts more emphasis on private property rights than the democratization of knowledge. I'd be somewhat miffed if there were not opt-out mechanisms. But there are. So I'm not.
I don't think the unresolved issue of whether filing a lawsuit is revocation of an "implied license" will have an impact on the way search engines cache and display websites. Most content owners will continue to employ robots exclusion protocols and take-down notices to assist them in managing their content. Realistically, how many people would choose filing a lawsuit as their first choice for communicating their content management preferences to search engines? I'd hazard to guess that Parker is just about the only one.
Thus, even if Parker does manage to pull out a 'win' against the search engines on this narrow issue, it probably won't have a great impact on search engine cacheing strategies. Over all, the opinion is a win for search engines because it further legitimizes their practice of crawling, archiving and displaying web content.
Best Regards,
Sarah Bird
Case Library
More about the Parker v Google decision.
New Media & Technology blog on the Parker v. Yahoo! decision.
Eric Goldman, my hero, also has a great post on Parker v. Yahoo!
If you want to know more about digital media and implied licenses, read this excellent article.
www.seomoz.org
published @ October 28, 2008