FidoSysop Blog

Advanced WordPress Bad Bot Blocking Using Bad Behavior

Are you running a WordPress blog and suffering from resource draining and content theft?

How To Block Malicious Content Thieves

How To Block Malicious Content Thieves

If so here are two things you can implement that will show the abusers the door.

These days there are a multitude of resource hogs that will eat up your websites hosting resources really fast. If you are social sharing your blog articles the risk of getting your website shut down (especially if on shared hosting) is high.

The net is like the old wild wild west days with little to no regulation. While some US laws may add some copyright protection, US law does not apply to other countries. Bad bots are at an all time high, and the threat is increasing monthly. Tweet or Google+ share an article and bam.. They come rushing over to index and possibly harvest your post. This is a serious problem with no end in sight.

You can apply deny bot rules in .htaccess that will show a bunch of them the door. Here is the text of my .htaccess file. Just copy and paste.

Backup your existing .htaccess file before attempting this modification as your site WILL CRASH if this is not implemented properly.

# Begin Bad Bot Blocking
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase yeti bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider bad_bot
BrowserMatchNoCase nutch bad_bot
BrowserMatchNoCase toscrawler bad_bot
BrowserMatchNoCase mj12bot bad_bot
BrowserMatchNoCase rogerbot bad_bot
BrowserMatchNoCase magpie-crawler bad_bot
BrowserMatchNoCase sistrix bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase blekkobot bad_bot
BrowserMatchNoCase zh-cn bad_bot
BrowserMatchNoCase ejeniobot bad_bot
BrowserMatchNoCase seokicks-robot bad_bot
BrowserMatchNoCase compspybot bad_bot
BrowserMatchNoCase sistrix bad_bot
BrowserMatchNoCase garlikcrawler bad_bot
BrowserMatchNoCase grapeshotcrawler bad_bot
BrowserMatchNoCase careerbot bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase special_archiver bad_bot
BrowserMatchNoCase acoonbot bad_bot
BrowserMatchNoCase aboundex bad_bot
BrowserMatchNoCase wotbox bad_bot
BrowserMatchNoCase proximic bad_bot
BrowserMatchNoCase discoverybot bad_bot
BrowserMatchNoCase yisouspider bad_bot
BrowserMatchNoCase zumbot bad_bot
BrowserMatchNoCase turnitinbot bad_bot
BrowserMatchNoCase python-requests bad_bot
BrowserMatchNoCase exabot bad_bot
BrowserMatchNoCase jetslide bad_bot
BrowserMatchNoCase ccbot bad_bot
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase larbin_2.6.3 bad_bot
BrowserMatchNoCase mandalay bad_bot
BrowserMatchNoCase urlappendbot bad_bot
BrowserMatchNoCase curl bad_bot
BrowserMatchNoCase libcurl bad_bot
BrowserMatchNoCase wget bad_bot
BrowserMatchNoCase archive.org_bot bad_bot
BrowserMatchNoCase semantifire1 bad_bot
BrowserMatchNoCase pollbot bad_bot
BrowserMatchNoCase spbot bad_bot
BrowserMatchNoCase Butterfly/1.0 bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase heritrix bad_bot
BrowserMatchNoCase comodo-webinspector-crawler bad_bot
BrowserMatchNoCase www.integromedb.org bad_bot
BrowserMatchNoCase hrbot bad_bot
BrowserMatchNoCase 200pleasebot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase botw bad_bot
BrowserMatchNoCase openhosebot bad_bot
BrowserMatchNoCase paperlibot bad_bot
BrowserMatchNoCase livelapbot bad_bot
Order Deny,Allow
Deny from env=bad_bot
# End Bad Bot Blocking

Part two is to download and enable this WordPress plugin: Bad Behavior. Bad Behavior prevents spammers from ever delivering their junk, and in many cases, from ever reading your site in the first place.

Bad Behavior 403 Access Denied Notification Display

Bad Behavior 403 Access Denied Notification Display

The easiest way to download this plugin is from your WordPress plugin menu add new plugin link. Just search for bad behavior and click install then activate. Once activated you will need to whitelist two specific ip ranges that are the Googlebot: 66.249.64.0/19 and Google structured data testing tool: 173.194.0.0/16. And lets not forget about the lonely Bingbot: 65.52.0.0/14 and Bing Imagebot: 199.30.16.0/20.

Next thing is to get an access key from Project Honey Pot that allows access to their dirty IP blacklist. And if your running behind CloudFlare it’s best to check the box Reverse Proxy / Load Balancer and enter this user agent: Cf-Connecting-Ip.

Bad Behavior Blocked IP Log Snippet

Bad Behavior Blocked IP Log Snippet

This will greatly improve your site loading speed. And if your not using CloudFlare i suggest implementing that solution as well.

Any questions comment below 😉