After a couple days of robots.txt love, I have now much less crap in my logs. A good opportunity to see which bots are well-written. Based on what I am seeing with /robots.txt, I am sure glad I blocked most of these festering piles of dung from my site.
not using conditional get while requesting /robots.txt
Only kinjabot, OnetSzukaj/5.0 and Seekbot/1.0 get this right. All other bots, including Google and Yahoo, do not. Lame.
requesting /robots.txt too often
The biggest offender is VoilaBot, checking /robots.txt every 5 minutes, every day. You gotta be kidding me. Google and Yahoo are not much better, you’d think they’d figured out a way by now to communicate the state of /robots.txt across different crawlers. Other bots fare better by virtue of being less desperate.
Problems like this are economic opportunities.