Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Pages: 1
massive content scraper
a few days into the new site and kwangjubiennale. org A lot of our content has been scraped.
Any advice?
>Edit removed the link but kept it readable. What the hell I was thinking linking to it!
Last edited by colak (2016-10-25 16:13:01)
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: massive content scraper
You do have few options to dealing with content scrapers:
- You can contact them and ask them nicely to remove the content they stole from their sites.
- You can hire a lawyer and pay them to do their best.
- You can contact their hosting provider and ask them to remove it.
If it’s ongoing automated scraping, you have more interesting options. I found two websites that stole my content a few months back. They’d stupidly automated it by pulling my syndication feed and using it to discover new pages and then go and visit those pages to steal their content. I wrote up the full story on my blog, but to cut a long story short: I found the IP addresses they used and used that to automatically generate an incredible amount of garbage content. They happily scraped it and sunk their own sites in the process.
Offline
Re: massive content scraper
Hi DA,
Thanks for the reply. They are hosted at aws(amazon web services) whom I contacted. I also found a name in the whois database whom I’ll contact through our lawyer – also a member of our NGO.
The problem is that they are now housing the whole content locally in their server using static urls.
In the past I used to block their ip and redirect them somewhere else.
I also used javascript when they were hot-linking our js from our site to redirect their visitors back.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: massive content scraper
I would also use Google’s Remove URLs Tool and if that’s not the right place then there’s Removing Content From Google.
Offline
Pages: 1