For many website owners, the duplicate content issue is a bit of a murky one. While most know it’s frowned upon by Google, few have a grasp of the details and even fewer are sure if there is even a penalty attached to it at all.
Here to clear the murky waters for good are the CleverClicks Myth Busters!
But before we let them loose on it, let’s just get clear on the basics…
What is duplicate content?
Duplicate content is blocks of content within or across domains that either completely match other content or are appreciably similar, to use Google’s own words. So essentially, it’s content which appears elsewhere on the web.
Internal vs. External
Duplicate Content comes in two forms:
Internal duplicate content: Duplicate content within one website or domain
External duplicate content: Duplicate content that exists between two or more different sites across the web.
Why should you care about it?
Well, if Google picks up that you have duplicate content (which it feels is malicious in intent) then your website and offending pages are in for a hard time.
Firstly, you won’t rank for the offending page/s where duplicate content is found.
Secondly, the weight of the page/s will be negligible.
Thirdly, a point against the site as a reliable source of quality, unique content will be registered.
These are, obviously, not ideal outcomes, but before you panic (thinking about those wikipedia definitions you copy-pasted in your last blog post), there is slightly more to this duplicate content ‘penalty’ than meets the eye.
Our mythbusters will take it from here….
Myth: Duplicate means having scraped content or the same text on multiple pages.
Truth: But wait, there’s more…
Pages accessible via multiple URLs will also register as duplicate content.
When the bots crawl your pages they’ll visit each individual URL and expect to find individual content. If this isn’t the case then it’s considered duplicate content.
This can be both internal (you have 2 URLs leading to the same page) or external (content is showing up in more than one location across the web).
If it’s legitimate and you’re just doing something like sharing an article (with permission) on your site, you can mark these pages with the rel=”canonical” tag, the URL parameter handling tool, or 301 redirects. However, if the content is internal the best way to deal with it to make sure that each piece of content has only one URL associated with it.
This not only dupe-proofs you, but it also makes it less confusing for users to navigate.
Myth: There is no such thing as a duplicate content penalty
Truth: Well, um…
This one is actually pretty close to the truth.
‘What?” I hear you cry. ‘Then why am I even reading this article??’
Because your rankings can be seriously affected by duplicate content. But, yes, the term ‘penalty’ isn’t ‘technically’ correct.
In reality it’s more of a filter, but it’s become known as a penalty because when you’re on the receiving end of it your rankings will all but disappear.
So while it isn’t considered a full-blown penalty you’ll certainly feel as though you have been penalised.
Myth: Having disclaimers of information across multiple pages counts as duplicate content
Truth: Google have thought about this
Matt Cutts has said that having a Terms and Conditions template or a Disclaimer message across all pages of your site won’t get you penalised.
“If it’s required, I wouldn’t stress about that… Unless the content that you have is spammy or keyword-stuffed, then an algorithm or a person might take action.”
Myth: One should block crawlers’ access to duplicate pages
Truth: Don’t do it!
If multiple URLs point to the same content, but it’s not malicious in nature (republishing a blog post with permission, for example) it can still be flagged as external duplicate content, even though it’s perfectly permissible.
A misleading piece of advice given to webmasters in these cases is often to ‘block crawler access’, however, Google warns against doing this.
“If search engines can’t crawl pages with duplicate content, they can’t automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages.” Says Google.
A better solution is to allow search engines to crawl these URLs, but mark using the rel=”canonical” link element, the URL parameter handling tool, or 301 redirects.
This tells Google that you acknowledge the duplication and allows them to overlook it (provided they don’t find it malicious).
Myth: All syndicated content is duplicate content
Truth: Not necessarily
There are two types of syndication sites:
Type 1. Legitimate news sites and websites that share content which has already been published elsewhere. These sites have permission to re-publish this content and / or give credit to the original author.
Type 2: Sites that produce no original content and just scrape, steal and borrow text or images from other websites without giving credit.
Obviously, Type 2 is a disaster waiting to happen, but Google respects that Type 1 websites are offering a valuable service which can benefit both readers and the original authors of the content.
So, yes, while their site is technically full of external duplicate content, Google can judge the intent and give the good guys a break. It would be very hard for sites like Buzzfeed to function if they didn’t.
Myth: Translated copy isn’t duplicate content
Truth: Sometimes it is
You’d think that changing something into a different language would mean it isn’t duplicate content, but it can be picked up as such. Especially if your content has been directly translated (newsflash: language doesn’t work that way).
If you have a version of your site in a different language you’ll need to change the sentence structure, alter the content a bit and use a different regional domain, for example:
Myth: It’s easy to get ‘penalised’ for duplicate content
Truth: It only happens in extreme cases
It takes quite a lot to make the duplicate content alarm sound at Google. Most webmasters haven’t come across many cases where a site’s rankings dropped because of duplicate content alone.
As Google themselves say, “mostly, [duplicate content] is not deceptive in origin.”
Which means they’re able to identify it when they see it.
When Google looks for duplicate content it take the following into consideration:
- Volume: How many duplicates of the same text exist. In most cases, it needs to be hundreds of pages before Google takes notice.
- Timing: If all the hundreds of duplicate content appear at the same time you’re bound to raise a few eyebrows. If it happens gradually you’re less likely to gain any attention.
- Context: If the duplicate copy is on a brand new domain or is from a high profile page such as the home page, then it looks fishy. If it’s a press release or a blog post from an established site which is being shared across the web, there’s less likely to be a fuss.
Generally, the only sites which incur a duplicate content penalty are ones which:
- Have nothing but scraped or plagiarised content
- Provide no accreditation or sources
- Steals images, auto-translates pages, or uses dodgey automated tools to alter plagiarised content
- Purposefully creates pages with nearly identical content (done in order to rank for locations/keywords)
- Are bad quality and spammy in nature
That being said, you shouldn’t become too lax about it. Even though Google has come a long way in figuring out malicious vs benign duplicate content, if don’t keep your site structure dupe free and ensure your key landing pages are highly unique, you will struggle to rank well.
You may not drop off the face of the internet, but you won’t reach your full potential either. Something to keep in mind.
Hopefully the mists have lifted and left you looking at a clearer picture of what Duplicate Content is and when you’re at risk of being punished for it.
Do you feel slightly calmer about it? Good. That’s what our mythbusters are here for. Let us know if you have any other questions in the comments section below!
Stay In The Know
Cut the clutter and stay on top of important news like this. We handpick the single most noteworthy news of the week and send it directly to subscribers. Join the club to stay in the know…