Join Free
+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 24

Thread: Block Bad Bots

  1. #1

    Block Bad Bots

    Anyone have code to block bad bots like Baidu etc. that actually works?

    I have been trying all day yesterday and I have not succeeded so far.

    Please explain exactly how to use it if you have good code!


  2. #2

  3. #3

  4. #4
    Thanks for posting that.

    Afterwards I realized that my code was working, but I did not know that those bots would still appear in the access logs. I thought they were still getting access to my pages as I did not see they were actually getting a 403 code!

  5. #5
    Senior Member
    Join Date
    Apr 2011
    Location
    127.0.0.1
    Posts
    544
    Blog Entries
    2
    Quote Originally Posted by monalisa View Post
    I'm a huge fan of associate-o-matic. Justin wrote a nice API storefront script for Amazon's API. I'm not a fan of how Amazon treats their affiliates like a pile of steamy poo.

    "The greatest limitation in coding is imagination."
    -- Amazon Browse Node Database: .
    -- Create a Free Publisher Account

  6. #6
    You need to block them at the firewall. Otherwise they are still consuming resources when they make a request and get a 403 page. Apache has to load completely and serve that page where as blocking them at the firewall, they never use any of your server resources.

  7. #7
    So you mean I need all the IP ranges?

  8. #8
    Correct. The way you did it will work, but the bots are still consuming server resources which isn't ideal.

  9. #9
    how many IP's can iptables handle before it gets slow/pushes your server down?

  10. #10
    I go a less ideal route than the firewall and use .htaccess mod rewrite rules to block naughty agents. It works well enough except against scrapers.

    Apache still loads and handles the request... that's why is less than optimal.

    Code:
    <IfModule mod_rewrite.c>
    RewriteEngine On
    
    ## deny agents
    RewriteCond %{HTTP_USER_AGENT} Yandex [OR]
    RewriteCond %{HTTP_USER_AGENT} Yeti [OR]
    RewriteCond %{HTTP_USER_AGENT} radian6 [OR]
    RewriteCond %{HTTP_USER_AGENT} Twiceler [OR]
    RewriteCond %{HTTP_USER_AGENT} MJ12bot [OR]
    RewriteCond %{HTTP_USER_AGENT} Baiduspider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
    RewriteCond %{HTTP_USER_AGENT} ^appie [OR]
    RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
    RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]
    RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
    RewriteCond %{HTTP_USER_AGENT} ^DepSpid [OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Wonder [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
    RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMSE_Robot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
    RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
    RewriteCond %{HTTP_USER_AGENT} ^InnovantageBot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
    RewriteCond %{HTTP_USER_AGENT} ^java [OR]
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
    RewriteCond %{HTTP_USER_AGENT} ^JOC [OR]
    RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
    RewriteCond %{HTTP_USER_AGENT} ^libwww-perl [OR]
    RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
    RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^MQbot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Missigua\ Locator [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetZip [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
    RewriteCond %{HTTP_USER_AGENT} ^HouxouCrawler [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
    RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ShopWiki [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
    RewriteCond %{HTTP_USER_AGENT} glrsales\.com [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xenu\ Link\ Sleuth [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon
    RewriteRule /*$ http://www.spam.com [L,R]
    </IfModule>
    Last edited by zaphod; 07-01-2011 at 10:21 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
coupons | coupons and deals