Canonicalization is the process of conforming to accepted standards in IT, used for e-mail addresses, filenames, Unicode strings, XML, and URLs. It solves the problem of multiple formats representing the same article and is abbreviated as c14n. The Canonical XML specification establishes a method for identifying separate documents, while URL canonicalization refers to consistently referring to a specific web page from a URL. Matt Cutts recommends using a permanent redirect (301) to solve this problem.
The word canon means something that conforms to an accepted standard. Canonicalization, or canonization in British English, is the process by which something is brought into conformity with the accepted standard. In the computing field, the term canonicalization is used to refer to compliance with standards in several areas. It’s often thought of as the problem, when in fact it’s the solution to a variety of problems. Because it is such a long word, the canonicalization is abbreviated using the first and last letter and the number of letters in between: c14n.
Canonicalization is used in Information Technology (IT) in several contexts. It refers to e-mail sender addresses, the construction of filenames, the encoding of strings in Unicode, the use of XML (EXtensible Markup Language), and the construction of URLs (Uniform Resource Locator). Either way, the problem is the ability for multiple formats to represent the same article, with canonicalization being the route to consistency and standardization.
Take XML as an example. XML allows for syntactic changes. This means that two non-identical documents could have the same canonical form, and thus be functionally equivalent. The Canonical XML specification was designed to solve this problem by establishing a method by which the identity of separate documents can be established. The method for generating the canonical form for a given XML document is called the XML canonicalization method.
For URL canonicalization, the idea is to consistently refer to a specific web page from a URL. The simplest example is two versions of a homepage, one of which has the three w’s and the other doesn’t:
http://www.wisegeek.com
against
http://wisegeek.com
This is a problem for SEO (Search Engine Optimization) because it splits reports by traffic, which is actually all going to the same place. The result is that the site with multiple URLs for the same pages appears to perform more poorly than it actually is.
There are other problems besides the ws. These include trailing slashes and the differences between the cased versions of the URL. Matt Cutts of Google® recommends solving this problem by using a permanent redirect (301) of all alternate URLs to the desired URL, allowing search engines to judge which URL is canonical.
Protect your devices with Threat Protection by NordVPN