Link Flow Analysis - How to do Link Flow Analysis
“Link flow analysis” and “link juice” are two of the SEO industry’s currently hot buzz expressions. Link juice is a nonsense term, a euphemism people use for that undefined or undisclosed valuation of document relationships that search engines utilize for their own internal purposes. Link flow analysis, on the other hand, has several legitimate meanings (and probably a few nonsense associations as well). To understand Link Flow Analysis better, let’s first rule out the things it is not and the things it won’t do.
What Link Flow Analysis Is NOT
What Link Flow Analysis Will NOT Do
Let’s begin looking at what Link Flow Analysis is and does by first defining link flow (as best we can define ‘link flow’).
- Google appears to define link flow as the PageRank that is passed through a link. (NOTE: I’ll correct this if any Googlers want to offer a clarification.)
- I define link flow as the “pathways between pages”1,2.
- Some SEOs map Toolbar PageRank.
Although we could find other definitions for link flow, these three examples should make it clear that there is no general agreement on what link flow is. But I think it should also be self-evident that we’re really not talking about the movement of links around or between documents; rather, we’re talking about the movement of value through links from document to document.
Googlers seem to really be talking about internal PageRank, although I suppose it’s possible they might be trying to use the Toolbar PR as a common ground with the SEO community. They certainly use it as a stick to beat up on people with.
Halfdeck’s tool works with the only available measure of PageRank — the Toolbar, so we can hardly fault him for not working with the internal stuff.
But there is more value to links than PageRank (which, after all, is only a Google thing). Other search engines definitely evaluate linking relationships, and it’s most likely safe to assume they all measure some sort of link equity (another currently hot SEO industry buzz expression that I actually like, although it is also poorly defined). I often say “PageRank-like” when talking about the values that other search engines may compute for Web pages.
A link passes 6 types of value:
- Traffic
- Visibility
- Crawling
- Trust
- Anchor Text
- PageRank (or something like it)
The SEO community can only quantify one of those types of value, and that would be the first: traffic. Of course, we can use Toolbar PR to speak about a derivative value, but we don’t know how current any particular Toolbar PR value is (even if it has only just been published). Matt Cutts has said more than once that Toolbar PR data is published after the value it represents has already been factored into their database.
So under the broader definition that I have structured, link flow refers to any value that is passed through links — and you cannot measure that flow. There are no Trust Points that you can assign to any given link to know how much trust it passes.
Now, if you know about a link and its anchor text, you can determine (to a limited extent) whether the link may pass anchor text. Just search for the anchor text. If it’s unique to the linking page and the destination, at most you should see only those two results. In practice we tend to point more than one link at a page in order to beef up its relevance.
There are many tools available for link research. Of course, the SEO community does a poor job in general of acknowledging the limitations of these tools. Search for discussions and blog posts about link analysis and you’ll usually find people recommending the use of Yahoo! Site Explorer and Google Webmaster Tools. For what it’s worth, there is also Live’s Webmaster Center.
All of these resources provide you with information about your Web site. None of these resources provides you with perfect information. Let’s ignore the brand names for a moment and look at the limitations that all the resources share:
- Each resource can only report on data its respective search engine provides
- Each resource operates on a delayed-reporting basis (this is NOT real-time data)
- Each resource fails to disclose SOMETHING about the link data it reports
You cannot use Link Report A to analyze how competitive a site may be in Search Engine B. Link Reports are only relevant to the search engines that produce them. Hence, your link analysis has to be search engine-specific. Now, we know that the search engines all impose some limits on the value that links in their databases may pass. For example, Google doesn’t allow all links to pass PageRank. Yahoo! says the first link from a domain counts more than the others. And so on.
Hence, your search-engine-specific link analysis is not going to be very accurate. Now, while it would be great if we could get accurate, timely data from the search engines, it’s clearly not in their best interests (nor the best interests of their searchers) to put the candy out where the kids can get into it completely unsupervised.
But what if we could use something else for our link analysis? Suppose Company X offers its own linking data, based on their own crawling. There are several SEO tools that have been around for quite some time that purport to do this. But their tools are only relevant to their own databases. That is, these tools will (in typical SEO fashion) query Google for Toolbar PR data and query Yahoo! for backlink data (both perfectly useless measures of value) and they’ll let you sort their data by various options. But they still don’t know if they know everything that either Google or Yahoo! knows.
In fact, we can say with complete certainty that no search engine has complete knowledge of any other search engine’s data and algorithms. It doesn’t matter if you’re comparing Yahoo! to Google or some SEO tool to Google, you’re comparing apples to oranges. You cannot obtain any insight into what Google knows or thinks its knows through another search engine or any SEO tool.
But let’s assume that the absolute very best SEO tool out there (call it SEO Brand X) really does a great job of approximately mirroring the database for any search engine. You’ve got an acceptable reconstruction of Search Engine A’s database. Great! You’re ready to roll.
Except for the Synchronicity Issue.
Synchronicity occurs when two more-or-less equivelant or related events happen (at the same time) for unrelated reasons. If two search engines were to crawl and index the Web in generally the same way at the same time, those would be synchronous events. In practice, this just doesn’t happen. So all our assumption does is place us in SEO Fairy-tale Land.
In other words, there is no SEO Brand X resource that creates a Web map that looks like the Yahoo! Web Map, or the Google Web Map, or the Live Web Map in any useful way. SEO Brand X is just another search index building its own Web Map. Great for Brand X. Bad for you and me.
So we have no way of measuring how much PageRank flows from document to document. We can, at best, find pages that use noindex, nofollow, rel=’nofollow’, and Javascript or Flash links but we cannot map where the PageRank-like value flows.
In other words, SEO Brand X still doesn’t know which pages pass value in any search engine’s database. SEO Brand X can do a perfectly fine job of showing you which domains have the most inbound links (according to its own crawling) and it can grab backlink reports from other search engines (like Google and Yahoo!) but it cannot tell you where the PageRank-like value is or where it is passed.
Visibility is a different issue. I said there is no way to quantify Visibility but we can partially quantify it for search (not for links). Measuring search visibility is easier to do with pages that have relatively few inbound links and which contain little to no indexable text. That is, there is a Search Visibility Curve for every document that is shaped by the number of queries it appears in on every search engine.
Every query can potentially show up to 1,000 results (on the major search engines — results limits may differ on smaller or newer services). You can quantify single-query Visibility in any of several ways. For example, you could assign a weighting of .001 to the 1000th listing and a weighting of 1 to the 1st listing. If you could determine that a page is visible for 300 queries, you could sum up all its Visibility weights.
This is a crude measurement, because a document that is poorly visible for 1,000 queries could potentially outweigh a document that has perfect Visibility for 10 queries. But that is Search Visibility.
As with Search Visibility, you could devise any number of ways to measure Link Visibility. Let’s say we assign a value of 1 for every 1,000 page views that occur within a calendar month for any link on a Web page. For example, suppose I embed a Javascript link (that points to Google) on a specific SEO Theory page. Suppose SEO Theory generates 100,000 page views for that particular page in the month of December. In that case, the link earned 100 Visibility Points.
Now suppose that Javascript embed was placed 1,000 sites altogether and collectively those sites generated 10,000,000 page views in the month of December. The Link Visibility score would (under this proposed model) be 10,000.
To be counted, a page view would have to ensure that the visitor could actually see the link — so the link has to be clearly visible, not buried in a footer or behind an image. Any hidden link would earn a Visibility score of 0 by most measurements.
These types of measurements are too crude to be useful but many people seek branding value from the mere display of links in Javascript ads and other ads that are not expected to pass PageRank-like value.
Another way to measure link Visibility would be to count all the pages on which a link is present, where those linking pages rank well for trafficked queries. But you’re not measuring how value flows through links. At least with the page view model we could assess some sort of brand value passing to the destination sites.
Crawl and trust are managed by the search engines internally. You never see a search engine fetch that includes the source page for the link the search engine followed. Nor do search engines disclose which pages confer trust. It is reasonable to ask whether a document that passes anchor text also passes trust. Any search engine could be constructed to pass anchor text, trust, and PageRank-like value completely independently of each other. In such a scenario, the most powerful links would pass all three values.
So now that we’ve looked at the limitations of what we can do with search engines, what else can we do to measure and analyze link flow? First, we have to define a quantifiable and measurable value to track. That doesn’t include PageRank-like value or trust. It might include anchor text, but in most cases you’ll have too many links pointing the same anchor text at a destination to determine which links work in any search engine.
You can measure how many link sources are indexed by search engine, however.
You can measure how many unique anchor text expressions are passed from document to document by search search engine.
You can measure which page caches are updated on a daily, weekly, monthly, or longer basis by search engine.
These measurements don’t show you which documents are passing value through their links, but you can estimate which documents may be available to pass value (indexed), seem to be passing some value (unique anchor text), and when they may pass value (cache frequency).
You can also measure rates of page indexing with new sites. How long does it take pages to be crawled and indexed. Which pages show up first? When you run site searches on the new site, which pages are ranked first for specific queries? If you know which pages the search engines deem to be the most important on your site, you can test those pages to determine if they pass some type of search-specific value through their links (rather than just drop links randomly across a site).
You can measure fetch rates for your pages and compare those fetch frequencies to the number of reported backlinks (DO NOT USE ANOTHER SEARCH ENGINE FOR THIS). If page A is fetched twice as often as page B, and page A has fewer reported links than page B, what do you think that says about page B’s backlinks?
You cannot answer that question knowledgeably if you use search engine A to analyze backlinks for search engine B. You can form an opinion on the basis of ignorance and misinformation — many people do — but to be competitive in this industry you must discipline yourself to look for the answers you need for each search engine within the data the search engine will share with you.
Every search engine tells you which of your pages it fetches, how often it fetches those pages, where it fetches those pages from, and if those pages appear in its search results. That information has to serve as the foundation of your link flow analysis. You need to know if your internal links help get your site crawled and indexed faster than external links; you need to know if your internal links help pass anchor text to your own pages; you need to know if you can influence the rate of crawling and caching for any given page on your site by adding or dropping crawlable links.
Knowing how much PageRank-like value your links pass won’t tell you anything about whether a page is likely to rank for any particular query. However, knowing that you can influence search engines to update their databases for any particular document within a specific timeframe empowers you. You need to settle upon your own definition of link flow.
But, more importantly, you also need to develop your own analytics to help you evaluate what the search engines are telling you. The SEO Brand X search tools cannot do that for you. They are not designed to offer proper analysis based solely on what each search engine discloses about itself.
SEO Tool designers consistently fail at these kinds of projects because they don’t understand why crossing the data streams doesn’t work.
www.seo-theory.com
published @ October 13, 2008