Like hundreds of WordPress bloggers over the past week, I was horrified to find the entire contents of my blog had been copied to the content scraping Tygpress site. I now know the importance of protecting my site to prevent content scraping.
Some bloggers weren’t particularly worried, as it really didn’t affect them all that much. Many others, myself included, were outraged at this flagrant breach of copyright. Earlier in the week I posted an article outlining how I deleted my content from this site. I’ve since researched further into how this happened in the first place. I am hoping this post will help WordPress.com bloggers to prevent it happening again, because it certainly raises a few questions. As a self-hosted blogger, I can take steps to make sure it doesn’t happen to me again.
What is content scraping?
I only learned about content scraping through having to deal with the Tygpress debacle, so I am by no means an expert. However I learned that content scrapers use a couple of methods. Some simply “copy and paste” material from your site to theirs. Many others use plugins to access RSS feeds and bots are capable of copying an entire website in a matter of seconds.
I’ve since installed security plug-ins on my site to prevent this from happening again, including disabling the “right-click” function to prevent material being copied and pasted.
Most articles you read will tell you to post a clear copyright notice on your website. Well I do have this, which was blatantly ignored.
What was the big deal about content scraping?
For bloggers such as myself who are running their blog as an adjunct to their business, this had serious ramifications. Not only was this site duplicating my content, potentially damaging my SEO. They were also attempting to make money out of my content through AdSense, causing potential financial loss to myself. It is also a serious copyright breach.
I also wasted several days of valuable time in dealing with this issue, rather than writing content both for my own site and for clients. In fact I didn’t want to post any more content on my blog until I was sure the issue had been resolved.
Yes, the combined efforts of bloggers in lodging DMCA complaints were successful in having this site taken down. This crisis is over, but you can be sure there will be another similar site set up to take it’s place.
How did Tygpress gain access to my content?
So how did this happen? Many bloggers noticed that it was only bloggers with WordPress.com blogs that had been targetted. Yes and no, as I am a self hosted site and I was affected. It is pretty obvious however that the scrapers gained access to our material through the WordPress.com reader feed.
Like many bloggers, I initially started my site as a hobby, using the free WordPress application and an emptynestersinsights.wordpress.com url. As my blog began to grow and I decided to attempt to treat it as a mini-business I applied for my designated emptynesterstravelinsights.com domain name, moving to a premium plan. A fundamental error I appear to have made was not deleting the original blog.
I eventually took the leap to move to a self hosted site on WordPress.org. So you can imagine how perplexed I was to find that my material had been scraped along with WordPress.com bloggers.
I had kept my WordPress.com blog to redirect to my new site and to serve as an archive if my current server ever went down. To be honest I had forgotten about the original free blog.
Scrapers accessed my old WordPress.com blog
However this is what I found:
- On July 23, I had a huge spike in traffic, which I should have paid more attention to. Someone was visiting every page on my website. However it wasn’t until the Tygpress debacle that I delved deeper. You can see that this traffic was coming through the original emptynesterstravelinsights.wordpress.com site. It then redirected to my current site. You can see there has been no traffic to this site for a very long time. The small traffic on August 5 and 6 was me searching “incognito” to see if this site was redirecting to my current site, which it was.
- On that day I also had a new “follower” who appeared to be Indian on this old blog. Why would you follow an old blog?
- I changed the settings on this original blog to “private”, meaning Google couldn’t see it. Bingo! my content magically disappeared from Tygpress.
- So on July 23, someone followed my old free WordPress.com and proceeded to help themselves to my content.
Clearly this is where the scrapers gained access to my material. Fortunately I have the luxury of transferring my domain to my current server and deleting this original site to prevent it happening again. Something I probably should have done some time ago to prevent duplicate content harming my SEO.
However for bloggers who want to continue to use the free platform this begs the question: What is WordPress doing to prevent another mass harvesting of WordPress bloggers content?