Wikipedia Search Engine Wikiseek | Main | Yahoo! To Lower Minimum Bids in UK to 5P (£0.05)

Yahoo! Slurp on the Loose?

A WebmasterWorld and Search Engine Watch Forums threads are both reporting issues with Yahoo! Slurp (Yahoo!'s Crawler) indexing pages they should not be, and in quantities that may be harmful.

It appears that only specific bots are not obeying the robots.txt file and indexing pages are rates that can potentially cause server issues.

The specific IP addresses appear to be in the 74.6.x block. They do reverse DNS to inktomi, which is correct.

Forum discussion at WebmasterWorld and Search Engine Watch Forums.



Like The Story? Vote For It On Yahoo Buzz! Or On Sphinn!

posted rustybrick in Yahoo! Search Optimization at January 17, 2007 7:57 AM Comments (5)

Comments

My blog is the Maytag man of the blogosphere. Yet Yahoo! (Inktomi) crawls it every day faithfully.
Just me and Inktomi but I am thankful they notice, LOL.

 

This was discussed on the LED Digest last week - the original post is in #2321: http://www.led-digest.com/content/view/1701/55/ with responses in the next 3-4 issues. As far as I know the OP never resolved this, but he did offer a piece of advice:

"Feature request for SE spiders: Provide a referrer. Please. It would make me and I expect other site owners feel grateful when odd URL requests are noticed. If more than one referrer, then just any one -- the last one, the first one, doesn't matter which. Referrer information could save people a lot of time, and let them keep their
hair a while longer."

Hope this info helps...

 

Thanks Adam, sorry for missing it.

 

I answered the specific question on webmasterworld. It does not seem that there is an issue with the crawler in this instance but an incorrect interpreatation of the robots.txt syntax by the publisher.
Tim

 

Is Yahoo's bot based on WGet? I get this doubt because there was the following line in my log file

2007-01-17 17:44:24 W3SVC105 NT-110 XX.XX.XX.XX GET / - 80 - 66.228.165.49 HTTP/1.0 Wget/1.8.2 - - www.mydomain.com 200 0 0 11155 111 578

The IP Reverse DNSes to i18ndev23.yst.corp.yahoo.com.

 

Post a comment (Note: Can Take 120 Seconds For Your Comment To Show Up)

Do you want us to save your personal Information?


To subscribe to the Search Engine Roundtable, click here