The concept of using links as a way to measure a site’s importance was first made popular by Google with the implementation of its PageRank algorithm (others had previously written about using links as a ranking factor, but Google’s rapidly increasing user base popularized it). In simple terms, each link to a web page is a vote for that page. But it’s not as simple as “the page with the most votes wins.” Links and linking pages are not all created equal. Some links are weighted more heavily by Google’s PageRank algorithm than others. The key to this concept is the notion that links represent an “editorial endorsement” of a web document. Search engines rely heavily on editorial votes. However, as publishers learned about the power of links, some started to manipulate links through a variety of methods. This created situations in which the intent of the link was not editorial in nature, and led to many algorithm enhancements.
What is the Google Pagerank Algorithm?
The concept of using links as a way to measure a site’s importance was first made popular by Google with the implementation of its PageRank algorithm (others had previously written about using links as a ranking factor, but Google’s rapidly increasing user base popularized it). In simple terms, each link to a web page is a vote for that page. But it’s not as simple as that.
PageRank is an algorithm used by the Google web search engine to rank websites in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages. According to Google: PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
It is not the only algorithm used by Google to order search engine results, but it is the first algorithm that was used by the company, and it is the most well-known. Google uses an automated web spider called Googlebot to actually count links and gather other information on web pages.le as “the page with the most votes wins.” Links and linking pages are not all created equal. Some links are weighted more heavily by Google’s PageRank algorithm than others. The key to this concept is the notion that links represent an “editorial endorsement” of a web document. Search engines rely heavily on editorial votes. However, as publishers learned about the power of links, some started to manipulate links through a variety of methods. This created situations in which the intent of the link was not editorial in nature, and led to many algorithm enhancements.
PageRank is a link analysis algorithm and it assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of “measuring” its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is referred to as the PageRank of E and denoted by A PageRank results from a mathematical algorithm based on the webgraph, created by all World Wide Web pages as nodes and hyperlinks as edges, taking into consideration authority hubs such as cnn.com or usa.gov. The rank value indicates an importance of a particular page. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it (“incoming links”). A page that is linked to by many pages with high PageRank receives a high rank itself.
How search engines use Links?
The search engines use links primarily to discover web pages, and to count the links as votes for those web pages. But how do they use this information once they acquire it? Let’s take a look:
Search engines need to decide what pages to include in their index. Discovering pages by crawling the Web (following links) is one way they discover web pages (the other is through the use of XML Sitemap files). In addition, the search engines do not include pages hat they deem to be of low value because cluttering their index with those pages will not lead to a good experience for their users. The cumulative link value, or link juice, of a page is a factor in making that decision.
Search engine spiders go out and crawl a portion of the Web every day. This is no small task, and it starts with deciding where to begin and where to go. Google has publicly indicated that it starts its crawl in PageRank order. In other words, it crawls PageRank 10 sites first, PageRank 9 sites next, and so on. Higher PageRank sites also get crawled more deeply than other sites. It is likely that other search engines start their crawl with the most important sites first as well. This would make sense, because changes on the most important sites are the ones the search engines want to discover first. In addition, if a very important site links to a new resource for the first time, the search engines tend to place a lot of trust in that link and want to factor the new link (vote) into their algorithms quickly.
Links play a critical role in ranking. For example, consider two sites where the on-page content is equally relevant to a given topic. Perhaps they are the shopping sites Amazon.com and (the less popular) JoesShoppingSite.com. The search engine needs a way to decide who comes out on top: Amazon or Joe. This is where links come in. Links cast the deciding vote. If more sites, and more important sites, link to it, it must be more important, so Amazon wins.
How Search Engines Judge Links?
Many aspects are involved when you evaluate a link. As we just outlined, the most commonly understood ones are authority, relevance, trust, and the role of anchor text. However, other factors also come into play. Let,s discuss some of the more important factors search engines consider when evaluating a link’s value.
A link from your own site back to your own site is, of course, not an independent editorial vote for your site. Put another way, the search engines assume that you will vouch for your own site. Think about your site as having an accumulated total link juice based on all the links it has received from third-party websites, and your internal linking structure as the way you allocate that juice to pages on your site. Your internal linking structure is incredibly important, but it does little if anything to build the total link juice of your site. In contrast, links from a truly independent source carry much more weight. Extending this notion a bit, it may be that you have multiple websites. Perhaps they have common data in the Whois records (such as the IP address or contact information). Search engines can use this type of signal to treat cross-links between those sites that are more like internal links than inbound links earned by merit. Even if you have different Whois records for the websites but they all cross-link to each other, the search engines can detect this pattern easily.
Keep in mind that a website with no independent third-party links into it has no link power to vote for other sites. If the search engine sees a cluster of sites that heavily cross-link and many of the sites in the cluster have no or few incoming links to them, the links from those sites may well be ignored. Conceptually, you can think of such a cluster of sites as a single site. Cross-linking to them can be algorithmically treated as a single site, with links between them not adding to the total link juice score for each other. The cluster would be evaluated based on the inbound links to the cluster. Of course, there are many different ways such things could be implemented, but one thing that would not have SEO value is to build a large number of sites, just to cross-link them with each other.
Getting a link editorially given to your site from a third-party website is always a good thing. But if more links are better, why not get links from every page of these sites if you can? In theory, this is a good idea, but search engines do not count multiple links from a domain cumulatively. In other words, 100 links from one domain are not as good as one link from 100 domains, if you assume that all other factors are equal. The basic reason for this is that the multiple links on one site most likely represent one editorial vote. In other words, it is highly likely that one person made the decision.
Furthermore, a sitewide link is more likely to have been paid for. So, multiple links from one domain are still useful to you, but the value per added link in evaluating the importance of your website diminishes as the quantity of links goes up. One hundred links from one domain might carry the total weight of one link from 10 domains. One thousand links from one domain might not add any additional weight at all. More links might not mean more importance, but the editorial vote regarding the topic of a page through the anchor text remains interesting data for the search engines, even as the link quantity increases.
Getting links from a range of sources is also a significant factor. We already discussed two aspects of this: getting links from domains you do not own, and getting links from many different domains. However, there are many other aspects of this.
For example, perhaps all your links come from blogs that cover your space. This ends up being a bit unbalanced. You can easily think of other types of places where you could get links: directories, social media sites, university sites, media websites, social bookmarking sites, and so on.
You can think about implementing link-building campaigns in many of these different sectors as diversification. There are several good reasons for doing this.
One reason is that the search engines value this type of diversification. If all your links come from a single class of sites, the reason is more likely to be manipulation, and search engines do not like that. If you have links coming in from multiple types of sources, that looks more like you have something of value.
Another reason is that search engines are constantly tuning and tweaking their algorithms. If you had all your links from blogs and the search engines made a change that significantly reduced the value of blog links, that could really hurt your rankings. You would essentially be hostage to that one strategy, and that’s not a good idea either.
Search engines also keep detailed data on when they discover the existence of a new link, or the disappearance of a link. They can perform quite a bit of interesting analysis with this type of data. Here are some examples:
When did the link first appear?
This is particularly interesting when considered in relationship to the appearance of other links.Did it happen immediately after you received that link from the New York Times?
When did the link disappear? Some of this is routine, such as links that appear in blog posts that start on the home page of a blog and then get relegated to archive pages over time. However, perhaps it is after you rolled out a new major section on your site, which could be an entirely different type of signal.
How long has the link existed? You can potentially count a link for less if it has been around for a long time. Whether you choose to count it for more or less could depend on the authority/trust of the site providing the link, or other factors.
How quickly were the links added? Did you go from one link per week to 100 per day, or vice versa? Such drastic changes in the rate of link acquisition could also be a significant signal. Whether it is a bad signal or not depends. For example, if your site is featured in major news coverage it could be good. If you start buying links by the thousands it could be bad. Part of the challenge for the search engines is to determine how to interpret the signal.
Although anchor text is a major signal regarding the relevance of a web page, search engines look at a much deeper context than that. They can look at other signals of relevance. Here are some examples of those:
Do the closest links on the page point to closely related, high-quality sites? That would be a positive signal to the engines, as your site could be seen as high-quality by association. Alternatively, if the two links before yours are for Viagra and a casino site, and the link after yours points to a porn site, that’s not a good signal.
Is your link in the main body of the content? Or is it off in a block of links at the bottom of the right rail of the web page? Better page placement can be a ranking factor. This is also referred to as prominence, which has application in on-page keyword location as well.
Does the text immediately preceding and following your link seem related to the anchor text of the link and the content of the page on your site that it links to? If so, that could be an additional positive signal. This is also referred to as proximity.
Closest section header
Search engines can also look more deeply at the context of the section of the page where your link resides. This can be the nearest header tag, or the nearest text highlighted in bold, particularly if it is implemented like a header (two to four boldface words in a paragraph by themselves).
Overall page context
The relevance and context of the linking page are also factors in rankings. If your anchor text, surrounding text, and the nearest header are all related, that’s good. But if the overall context of the linking page is also closely related, that’s better still.
Overall site context
Last is the notion of the context of the entire site that links to you (or perhaps even just the section of the site that links to you). For example, if hundreds of pages are relevant to your topic and you receive your link from a relevant page, with relevant headers, nearby text, and anchor text, these all add to the impact, more than if there happens to be only one relevant page on the site.
Indications are that there is no preferential treatment for certain top-level domains (TLDs), such as .edu, .gov, and .mil. It is a popular myth that these TLDs are a positive ranking signal, but it does not make sense for search engines to look at it so simply. Matt Cutts, the head of the Google webspam team, commented on this in an interview with Stephan Spencer (http://www.stephanspencer.com/search-engines/matt-cutts-interview): There is nothing in the algorithm itself, though, that says: oh, .edu—give that link more weight. And: You can have a useless .edu link just like you can have a great .com link.
There are many forums, blogs, and other pages on .edu domains that spammers easily manipulate to gain links to their sites. For this reason, search engines cannot simply imbue a special level of trust or authority to a site because it is an .edu domain. To prove this to yourself, simply search for buy viagra site:edu to see how spammers have infiltrated .edu pages. However, it is true that .edu domains are often authoritative. But this is a result of the link analysis that defines a given college or university as a highly trusted site on one or more topics. The result is that there can be (and there are) domains that are authoritative on one or more topics on some sections of their site, and yet can have another section of their site that spammers are actively abusing. Search engines deal with this problem by varying their assessment of a domain’s authority across the domain. The publisher’s http://yourdomain.com/usedcars section may be considered authoritative on the topic of used cars, but http://yourdomain.com/newcars might not be authoritative on the topic of new cars.
One technique that link brokers (companies that sell links) use is the notion of presell pages.These are pages on an authoritative domain for which the link broker has obtained the right to place and sell ad copy and links to advertisers. The link broker pays the domain owner a sum of money, or a percentage of the resulting revenue, to get control over these pages. For example, the link broker may negotiate a deal with a major university enabling it to place one or more pages on the university’s website. The links from this page do have some of the inherent value that resides in the domain. However, the presell pages probably don’t have many (if any) links from other pages on the university site or from other websites.