2013-08-13

This is a guest contribution by Felipe Kurpiel, an internet marketer

I came across this topic by accident. One day I was monitoring my analytics data I noticed a big drop on my traffic stats and I didn’t understand why.

Actually, I had a hint because I was starting to interlink my posts. That gave me a clue that the problem was internal which I thought was a good thing. But that is not enough because then I had to analyze what Google is focusing on now.

If you have been involved with SEO at all you know that duplicate content is a bad thing. But how can you identify the duplicate content on your site?

Ok, let’s get started with that.

Identifying Internal Duplicate Content!

That is a little advanced because we are about the crawl our website the way Google does. That is the best way to analyze the source of any problems.

To do that I like to use a Free Tool called Screaming Frog SEO Spider. If you never used this tool it can be a little complicated but don’t let that scares you.

You just have to follow some steps. Actually you can analyze a lot of factors using this tool but for our example, we are just considering duplicate content.

First Step: Add your URL website into the software and let it run.

It can take a while depending on how big your website is, but after that we are ready to filter what we are looking for.



Second: Go to the Page Titles tab and then filter by Duplicate

If you are lucky you will not have any result showing when you choose this filter. But unfortunately that was not my case and I saw dozens of results which were the proof that my website had internal duplicate content.

Third Step: It’s time to analyze what is generating the problem

You can do this on Screaming Frog or you can export the file to Microsoft Excel (or similar) in order to deeply analyze what you have to do to solve the issue.

In my case, the duplicate content was being generated by comments. Weird, isn’t?

That is what I thought and I also noticed that the pages with comments were being flagged by Google because they disappeared from search results.

When that happens, you have no turning back but fix the source of the problem.

Understanding Comments

Every comment on my website was generating a variable named “?replytocom”.

You don’t need to understand exactly what this variable does but put it simple; it is like each comment you have on your posts has the ability to create a copy of this particular post in your site. It can be considered as a pagination problem. And that is terrible because when Google crawl your website it can see that your site has the same content being repeated over and over again.

Do you think you are going to rank with that blog post? Not a change!

How to solve this problem

More important than to identify this issue is to create a clear solution to get rid of this pagination issue.

In order to deal with this variable there are two solutions. The first is really simple but not so effective and the second can be seen as complicated but it’s really the ultimate solution.

But let’s cover the easy solution first.

I run my blog on WordPress and one of the few essential plugins I use for SEO is WP SEO by Yoast. If you are using this plugin you just have to go to the plugin dashboard and then click on Permalinks. Once you do that just check the box to “Remove ?replytocom variables”.



This is really simple but sometimes you won’t get the results you are expecting, however, if you are having this kind of problem with comments you MUST check this option.

Second Option

After that you can run your website URL using Screaming Frog to see if the problem was solved. Unfortunately this can take a while but if after one day or two you are still noticing problems for duplicated content you have to try the second option.

Now we just have to access Google Webmaster Tools and select our website.

Then under Configuration we must go to URL Parameters.

We will see a list of parameters being crawled by Google in addition, here we have the chance to tell Google what to do when a parameter in particular is affecting our website. That is really cool.

For this replytocom problem I just have to click Edit and use the following settings.



Click Save and you solved the problem!

Now if you tried the first option using the plugin, then you used Webmaster Tools to tell Google what to do with this parameter and after a few days you still see duplicate content, there is one more thing you can try!

Now I am talking about Robots.txt!

Don’t worry if you don’t have this file on your website, because you just have to create a txt file and upload it on the root of your domain. Nothing that complicated!

Once you have created this file you just have to add a command line in the file.

If your Robots.txt is blank, just add these commands there:

User-agent: *

Disallow: /wp-admin/

Disallow: /wp-includes/

Disallow: *?replytocom

If you already had this file, just add the final line: “Disallow: *?replytocom”

It will for sure take care of everything!

Final Thoughts and Monitoring

The best way to avoid this or similar problems is monitoring your data. So here are my three tips to keep your website Google friendly.

When working On-Page be careful with the settings you are using on Yoast WordPress SEO plugin. Don’t forget to review Titles & Metas tab and check the “no index, follow” option for every little thing that can be considered as duplicate content.

An example is the “Other” tab where you MUST check this “no index” option so your Author Archives will not be seen as duplicate content when Google crawls your site. Remember, you have to make your website good for users and for search engines.

At least twice a week, analyze your traffic on Google Analytics. Go to Traffic Sources tab then Search Engine Optimization and keep an eye on Impressions.

You should also use an additional tool to track your keywords rankings so you can see if your search engine positions remain intact or if some of them are facing some drops. When that happens you will know it’s time to take some action.

Every two weeks, use Screaming Frog to crawl your website. This can be really important to check if the changes you made on-site already had the impact you were expecting.

When it comes to duplicate content the most important tabs to monitor on Screaming Frog are Page Title and Meta Description. However, in order to have a website that can be considered Google friendly it’s vital to analyze the Response Codes as well and eliminate every Client Error (4xx) and Server Error (5xx) you identify when crawling it.

Felipe Kurpiel is an internet marketer passionate about SEO and affiliate marketing. On his blog there are great insights about how to rank your website, link building strategies and YouTube marketing. 

Originally at: Blog Tips at ProBlogger

How To Stop Your WordPress Blog Getting Penalized For Duplicate Content

Show more