MSN / Live bot spamming with fake referrals
Looks like Microsoft’s web spider is crawling sites repoting fake (spam) referrals coming from live.search.com. These referrals are completely bogus and if you look at the keywords, it’s quite obvious.
I’m quite sure a lot of analytics users who host blogs, wikis or other high visibility sites have noticed this as well. Googling seems to reveal similar posts. [1, 2]
A quick grep of my logs reveals about 1800 such requests with fake referrals. This can make your referral statistics quite biased and/or completely useless, at least on small sites.
First time I saw these requests was on June 2008, and they’ve continued all the way until now. So it’s definitely not a mistake, whatever Microsoft likes to claim.
I’ve gotten these request from 514 different IPs (some real users I guess), the most usual coming from the 65.52.0.0/14 range (allocated to Redmond, not surprisingly). More specifically, all requests in that block have been from 65.55.0.0/16, but to be on the safe side I’d just recommend blocking the whole /14.
So what do the requests look like? Some examples here:
65.55.109.87 – - [06/Mar/2009:10:26:33 +0200] “GET /XXX HTTP/1.0″ 200 18020 “http://search.live.com/results.aspx?q=index” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)”
65.55.110.184 – - [06/Mar/2009:10:47:55 +0200] “GET /about/ HTTP/1.0″ 200 21070 “http://search.live.com/results.aspx?q=about” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)”
If you look at the sent referrals, most contain very usual keywords, like:
35 http://search.live.com/results.aspx?q=project 36 http://search.live.com/results.aspx?q=gmail 36 http://search.live.com/results.aspx?q=index&mrt=en-us&FORM=LIVSOP 40 http://search.live.com/results.aspx?q=internet 51 http://search.live.com/results.aspx?q=google 114 http://search.live.com/results.aspx?q=index&form=QBHP 522 http://search.live.com/results.aspx?q=index
For the referral keywords, the bot seems to use words (and blog tags for example) found on the crawled pages and/or URLs. The “&form=QBHP” request parameter has been in the old requests, but it seems to have disappeared since then.
As for the user-agent, the most used ones are:
180 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322) 1620 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)
It was about time I banned this spammer.