View Full Version : google bot dos
charles
06-09-2011, 11:56 PM
google bot can go a little overboard some times....its dos-ing my server.
I have a server that got crippled today. after hours of it to be able to log into the admin, I started checking some stats... typical example....
bandwidth usage for one example site for this month... 1.16 gig
googlebot 1.08 gig over 17,000 hits. WTF
ok, some if it might be the api limit changes add more stuff to the sites.
all the sites I spot checked had google as the top bandwidth puller buy a wide margin.
I know google crawling is a good thing but come on... 17k hits in 8+ days.
wow.
So, any one have any good tricks to slow G down but not piss them off so they dont come back?
AcidRaZor
06-10-2011, 12:07 AM
I do about 130000 bot hits daily. It can get hairy at times, but that's why a dedicated server is so nice.
You can limit the crawl-rate, either in webmaster tools, or robots.txt by setting the crawl-delay. http://en.wikipedia.org/wiki/Robots_exclusion_standard
Zaphod
06-10-2011, 06:13 AM
You can limit the crawl-rate, either in webmaster tools, or robots.txt by setting the crawl-delay. http://en.wikipedia.org/wiki/Robots_exclusion_standard
Even that's only on a site by site basis, so if you have 40k sites on a server limiting the crawl rate won't do much.
This is not a good idea, but it's better than your server going down. You can do something like this that will cut off bot traffic past a certain load threshold ... it will still allow humans through. Put this at the top of the first script users see... in MFPmu this would be index.php. WP is also index.php. (I started using this with WP back in the day because WP is a big fat smelly beast of a pig that will crush a server just the same as 600 pound pigs fat ass would :D
<?php
$strServerLoad = @file_get_contents("/proc/loadavg");
$strServerLoad = trim(substr($strServerLoad,0,strpos($strServerLoad ," ")));
if ($strServerLoad > 4 and strstr($_SERVER['HTTP_USER_AGENT'],"Googlebot")){
header("HTTP/1.0 503 Service Temporarily Unavailable");
echo "HTTP 503 - Service Temporarily Unavailable";
exit();
}
?>
This is really for a linux dedicated server or a VPS... I have no idea if shared hosting accounts give you access to "/proc/loadavg", and I have no idea how Windows hosting displays load :)
prosperent brian
06-10-2011, 06:19 AM
Imagine what I go through trying to handle 24 million requests a day from google.
charles
06-10-2011, 10:41 AM
This is a dedicated server - 8 cores - 4gb. been OK until recently.
has 100+ WP sites with a Prosperent script on them too.
All 1 year plus old (no new sites on this server, its my oldest one).
So they do get quite busy.
Thanks for all the suggestions and code. I will give it a try.
AcidRaZor
06-10-2011, 12:19 PM
IMO WP has quite an overhead, plus if you're running CPanel on the server it also doesn't help.
Imagine what I go through trying to handle 24 million requests a day from google.
you can send a few my way if u are full:o
Wow nice code there Zaphod, cheers I might use that in the near future. I've had a monstrous $700/mo dedicated server get crippled by only 100 Wordpress sites, might use that code in the future.
AcidRaZor
07-15-2011, 01:14 AM
Oh FYI, I tweeted Matt Cutts about the crawl-rate you can set in the robots.txt. He said Google doesn't honor that since a lot of people fuck it up. So you're better off using Zaphod's code.
If you spending $700 on a server and it can't handle 100 Wordpress sites, I won't implement this code. I'd spend some time optimizing the server
Powered by vBulletin™ Version 4.0.8 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.