Find Articles By Category
Tampa Bay Weather Is Great!
Special Interest Audio Podcasts
9/11/2001 Air Traffic Control Recordings
Trayvor Martin's Calls For Help Gunshot
Old 56k Dial Up Modem Sync Sound
Cool Social Networking Websites
Tampa Bay Area Storm Radar
Site Visitors Online Now Live
Where Our Site Visitors Are From
Free counters!

Are Bad Bots Eating Up Your Sites Resources? Block Em Out!

Bookmark and Share

Bad Bot Go Away!Bots come in many flavors.

Some like the Google Bot and Bing Bot are welcome visitors.

Most of the well behaved bots will obey your robots.txt file, then there are other snoops and bandwidth leeches that could care less about the rules as written in your robots file.

I have been dealing with some performance issues lately and discovered the Chinese Bot BaiduSpider has been ignoring my robots.txt file and scraping my site, and sucking up my bandwidth and system resources. There are others as well that need to be shown the door, but Baidu is the worst offender i have seen so far!

Baidu is an image thief and if you have a lot of photos on your site they will make multiple hits from different IP’s and literally bring your site down to a crawl. And they steal more than images, videos, mp3′s, html and text documents. Anything on your server is fair game!

The best blocking technique is using your .htaccess file, if your server is running under Linux (most common shared hosting operating system). If your hosting account has a file manager, look in your public_html root folder for a file called .htaccess and make a copy of it in a temp folder! You can also usually download a copy of it to your local drive if your host provides cPanel or GoDaddy’s FTP file manager. Or use a FTP program to download it to your local drive. Make a copy and put it somewhere safe just in case your server croaks after modifying the file.

Updated 02/01/2012. Here is my updated exclusion list. These bots are my personal preference. I suggest you Google the bot’s user agent to research what it does, then decide if you want to block it or not.

Paste this text below any other existing .htaccess entries.
# Begin Bad Bot Blocking
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase Moreoverbot/x.00 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase MaxPointCrawler/Nutch-1.1 bad_bot
BrowserMatchNoCase willow bad_bot

Order Deny,Allow
Deny from env=bad_bot
# End Bad Bot Blocking

Also if you are using any add on domains, you need to add this to there .htaccess files too.

If your add on domains do not have an .htaccess file in there directory just create a new file and paste this block snippet into it. Be sure to add a dot (.) to the beginning of the file name “.htaccess”

You can also use the following rule to block by IP address.
# Begin IP Blocking
order allow,deny
allow from all
deny from 24.45.228.190
# End of IP Block

If you are running WordPress there is a great plugin called “Visitor Maps and Whos Online” this app can show you what bots are visiting your site. Then you can Google up the bots user agent string to see what it is up to.

Below is a screen grab i did before adding my block.

Log Of Bad Bots Accessing My Site Before Adding .htaccess User Agent Block

Log Of Bad Bots Accessing My Site Before Adding .htaccess User Agent Block

Another screen shot after giving them to bums rush out the door!

Another Screen Grab Showing All My Pesky Bots Are Gone.. Hooray!!

Another Screen Grab Showing All My Pesky Bots Are Gone.. Hooray!!

Comments

comments

3 Responses to Are Bad Bots Eating Up Your Sites Resources? Block Em Out!

  • Thanks. Baidu has been driving me nuts, using up all my CPU and I’ve tried several different ways to block it.

    I’ve added your code to htaccess, so …fingers crossed. Hop eit works.

    Mark

    • FidoSysop says:

      That should do the trick.

      Also be sure to block via robots.txt too.

      user-agent: *
      disallow: /cgi
      disallow: /images

      user-agent: baiduspider
      disallow: /

      user-agent: yeti
      disallow: /

      user-agent: omniexplorer_bot
      disallow: /

      user-agent: yandeximages
      disallow: /

      user-agent: Spinn3r
      disallow: /

      user-agent: sogou
      disallow: /

      user-agent: jikespider
      disallow: /

      user-agent: ia_archiver
      disallow: /

      user-agent: PaperLiBot
      disallow: /

      user-agent: Amazon
      disallow: /

      user-agent: boardreader
      disallow: /

      user-agent: boardtracker
      disallow: /

      user-agent: R6_CommentReader
      disallow: /

      user-agent: R6_FeedFetcher
      disallow: /

      user-agent: Radian6
      disallow: /

      user-agent: ScoutJet
      disallow: /

      User-agent: spbot
      Disallow: /

      User-agent: Sogou web spider/3.0
      Disallow

      User-agent: Sogou web spider/4.0
      Disallow

  • anonymous says:

    Be sure to make a backup of your .htaccess file!

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>