Make Google Happy: Weed Out Duplicate Content

I spend a lot of time these days trying to master search engine optimization, the practice of making Google's great and terrible algorithm give you the love you never received from your emotionally closed off, impossible to please father1. A commercial venture like Wargames.Com, which I'm running as a bootstrap with no advertising budget, would be utterly hopeless without search traffic.

Towards this end, I've created a sitemap for each of my sites -- an XML file that tells Google and other search engines where to find new and updated content. Here's the Wargames.Com sitemap, which lists the URLs of each of the store's products along with the date the pages were last edited. (Have I mentioned that large-scale combat simulations make an excellent gift for Mother's Day?)

The process of creating these maps taught me that I was making a huge mistake in the design of my sites: Offering the same content on multiple pages, all of which were indexed by Google.

A lot of sites make this blunder. You can see an example from The Onion fake news site with this search. A column by Ben Tiedemann explaining why he blogs has been picked up a bunch of different times by Google.

Google's algorithm aggressively hunts down duplicate content, relegating it to a supplemental index where it gets no love at all from searchers. Another Google search shows that all but one copy of the blogger column are considered dupes.

If you want to maximize a page's prominence in Google, make one copy available to Google and hide the rest by adding the following header to their pages:

<meta name="robots" content="noindex, follow" />

The noindex keyword tells search engines not to index the page, and follow says the engine should crawl the page's links to find other pages.

If possible, you also should redirect URL requests so that each of your pages is loaded at its main URL, which for The Onion appears to be links in the form content/node/number. A page that can be loaded a bunch of different ways will be bookmarked at each of those different URLs, reducing the amount of Google juice it receives for being linked.

Employing these techniques on Wargames.Com has produced search results that are a lot more likely to be useful, reaching specific product pages. I'm not seeing an uptick in sales yet, but the Google results for the site were so useless 90 days ago I was afraid the other search engine optimizers would find out and make fun of me.

P.s. Just kidding, dad!

Comments

This takes me back. Taught me something too! Thanks Rogers.

I once tried helping a relative with a web commerce venture. At least a quarter of the effort was working on finding and adjusting design based on murmured incantations that promised to guide search engine heuristics to crawl the site better.

I quickly found my level of incompetence and retreated back to what I pass off as ability, and the website died its natural death.

Regards,
etc.

That's OK, son.

I set up a site using the Mediawiki software, and found that "Special:Random" gets spidered by Google more than anything else ("Look! New content again!"), resulting in some disappointed searchers getting random pages that, are rarely what they were actually looking for. I added 'rel="nofollow"' to the link, to hopefully stop it from happening anymore. It used to be that Google was smart enough to know that a URL with a lot of options (index.php?one=1&two=2...etc.) was more dynamic and less reliable, but URL rewriting has made some very unpredictable pages seem static.

Does having duplicate content do really help. I read somewhere that you should avoid duplicating content as google will ban your site.

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).