This would be one way to capture a bunch of search volume that you may not be capturing if you were to canonicalize back up to the core page because you’re not able to get it into the index.
In this situation, when Google goes to this URL, I would be saying to Google: “Hey, actually this page is the master copy. Please put it in your index.”
We do this because the page is important to us. It’s a unique and good experience for users and it has search volume behind it. So we decided to keep it in the index.
When you do this, you want to watch out and make sure that the content of this page not an exact duplicate of the original men’s shoes page. Make sure there’s some differentiation there.
Why canonicalization is important
The way to think about the rel canonical tag and canonicalization in general is: “what would search results look like if Google didn’t have a way to remove duplicate content?”
Next time you’re on an eCommerce site, every time you click anything, watch the URL bar. It is a massive, massive, massive problem.
Think of it: I have a ton of URLs that are effectively all the same thing (mens-shoes, mens-shoes?size=10, etc.). Think of every single possible permutation of this.
It can start to get really messy, really quick.
The internet would suck without this.
It’s really good that Google has built in the technical way to handle this kind of issues.
Canonical URLs tips to keep in mind
Aside from the basics explained above, there are a few more tips you might want to keep in mind:
Self-referential canonical tags are fine
There’s a lot of debate out there at a really high technical level with massive web applications.
Some people like to not do self-referential canonical tags. I’ve seen evidence for and against this. It really depends on your situation. For example, TripAdvisor is doing some really interesting stuff on this. Take a look at their source code and check out what they are doing.
If you’re just getting into this, self-referential canonical tags are fine.
Canonicalize your homepage
Personally, I find homepages to be the most oddly-linked-to thing.
There are a ton of different ways to render homepages, there’s: http://www.website.com, https://website.com, www.website.com/index.php, www.website.com/index.html, and so on.
People mess this up all the time.
Pick a core version to render your homepage and canonicalize every possible version to the core one.
Canonical tags are a suggestion, not a directive
Robots.txt files are a directive. 301 redirects are a directive. Using these things, you’re telling a search engine “Hey, you have to do this”.
Canonical tags are, however, a suggestion.
Google’s point on this is that webmasters mess this up a lot, so Google actively admits they’re allowed to ignore you if they think you’re shooting yourself on the foot.
Canonical tags are one additional suggestion that we give to the search engines to advise them on how to handle duplicate content.
So, if you have put a canonical tag, if it’s not working and if you’re wondering what happened, dig a little deeper at what you have going on because Google may be getting mixed signals from you.
Google says they won’t absolutely enforce the canonical tags—but more often than not, I see that they do.
Cross-domain canonicalization is OK
Let’s say you run a publishing site and you have 20 different sites and every time you write a new blog post, it cascades across all of your different domains.
It’s totally fine to write a blog post for website1.com and then post that on website2.com, website3.com, and website4.com and then cross-domain canonicalize back to the original.
Anything that does cross-canonicalize won’t show up in the index—so, by doing this, you basically tell Google “Hey, don’t push this in the index. The original version is over here.”
However, keep in mind that any links you get on any of those pages should be attributed back to the original master copy.
Don’t send mixed signals
There are a lot of ways to mess this up.
For example, you could take two pages and canonicalize them to each other. Or you could take two pages and canonicalize one to the other one and 301 redirecting one to the other one.
Don’t send mixed messages. Figure out what your plan is, figure out what you want your master copy to be, and hammer that plan.
Make sure it’s all very clear in every element.
Keep in mind that 301 redirects and canonicalization have different effects
A lot of people ask “OK, essentially, I want to kill a bunch of duplicate pages and I want to consolidate them all into one page. Should I use a canonical tag or a 301 status code?”.
First of all, remember that a 301 redirect seems to send a stronger signal in terms of link equity. For example, if you have two deprecated pages, you want to kill them, and you want to pass all those links over to a different page, a 301 redirect will feel more helpful.
At the same time, keep in mind that 301 redirects and canonicalization offer different experiences for the user.
- With a 301 redirect, the user moves to the new end page.
- With a canonical link, they do not. Basically, you’re telling Google “Hey. The page you’re on is a copy. Ignore it and send any links over to this other page”. However, the user will still stay there—so keep this in mind.
You can’t use canonical tags for link manipulation
Some people say “OK, well, then all I have to do is get a bunch of links to a page and then canonicalize it to a completely unrelated page. Then I can rank super-high and win the Internet.”
It doesn’t work that way.
In fact, it looks like Google uses document relevance to handle this. They want to make sure that what you’re sending to is an actual copy.
If the page you’re canonicalizing to is dramatically different than the one you’re currently on, it will be ignored.
For example, if you have a page about blue widgets and you’re canonicalizing to a page about gorillas, it’s not going to work.
That’s it—that’s really all there is to canonical URLs and the rel canonical tag. Using the information in this post, you will be able to correctly canonicalize your duplicate content, which is a very important element of the entire Search Engine Optimization process.