Blog scraping

As a blogger, I’ve encountered the issue of trackback/pingback spam coming from splogs that scrape content. It is sometimes frustrating when a new post is immediately reproduced on spammy sites that do nothing but scrape content from other sites.

It used to be that these splogs would just quote the content word-for-word and add a link to the original source. As time went on, search engines and other tools became aware of them, and these sites were short-lived, as many of them made it onto blacklists.

Yesterday I posted about the Mercedes GLK, a sponsored blog post that was quickly and automatically copied by a splog.

I noticed something new. Instead of simply copying the original text, this splog uses computer intelligence to replace words with synonyms, evading duplicate content detection and content theft tools.

The first screenshot is my original post from yesterday. The lower screenshot is from the splog.

Original content from my blog

Original content from my blog

Splog's stolen content

Splog's stolen content

Now what? What will we see next from Webmasters who attempt to profit from scraping content?

Comments are closed.