Update 11/25/2007: I am now blocking the Litefinder bot with this PHP script:
if(stristr($_SERVER['HTTP_USER_AGENT'], "litefinder")) {
die("You are not welcome! Contact cocaman_at_gmail.com if you think this is an error");
}
I regularly check my logs for referrers and links readers are coming to my blog. Today I had unusually many page views. After checking my logs I have found out that a bot has searched (and indexed?) almost any page of my blog:
(This is just an excerpt)
What is LiteFinder Network Crawler?
LiteFinder Network Crawler is a research project started by a group of Indian candidates from the cities of Bangalore, Patna and Jaipur. The project serves as a testing ground for information search technologies and programs, developed by a group of young scientists. LiteFinder Network Crawler was started as the means of simplifying the search of specific information at the sites that can be found with the help of general-purpose search engines. The project was developed due to the grants provided by several foreign companies specializing in research in the field of information search.
LiteFinder Network Crawler search engine is a set of program components launched at a productive computer cluster with a high-speed Internet access. Initial relevant resources are taken from general-purpose search engines such as Google, MSN and Yahoo. After that the robot surfs through these resources creating the database of detected information. This surf is followed by the launch of the next component that creates the glossary and the component that searches through the glossary entries in the database finding the entry that corresponds with the initial inquiry as much as possible.
Of course I was curious to see the “search engine” behind that bot. After reading their about page I did a search with it. First, their front page is just … em … stupid.
And no matter what, any search gives you just one result:
LiteFinder.net – Just another stupid approach to make some dirty and quick bucks.
I just also ran into LiteFinder, and I thought I recognised a pattern; earlier this year, I got plundered by something called IDBot. From http://id-search.org/bot.html :
“ID-Search.org is a research project started by a group of Russian candidates from the cities of Saint-Petersburg, Nizhnii Novgorod and Novosibirsk. The project serves as a testing ground for information search technologies…”
From http://www.litefinder.net/about.html :
“LiteFinder Network Crawler is a research project started by a group of Indian candidates from the cities of Bangalore, Patna and Jaipur. The project serves as a testing ground for information search…”
Neither will reveal their robots’ IPs because of “company policy”.
‘Nuff said, methinks. Blocked.
@Roel
Thanks for you input.
Yeah, texts look very the same. Glad I am not the only one having a problem with their “service”.
Have just include your small PHP script on my website – thanks 🙂
The following IP addresses was identified as originating from LiteFinder:
60.190.240.73, 67.19.114.226, 70.84.212.114, 70.85.113.242, 74.53.249.34, 74.86.14.10, 74.86.209.74, 75.125.47.162, 208.101.44.3, 216.40.222.50, 216.40.222.98
Martin.
Hi,
Here is another one following the same pattern:
“gigamega.net is a research project started by a group of Russian candidates from the cities of Saint-Petersburg, Nizhnii Novgorod and Novosibirsk. The project serves as a testing ground for information search technologies and programs, developed by a group of young scientists.”
They don’t want to reveal their IP addresses:
“Can I learn the IP addresses, which Gigamega-Bot comes from?
Unfortunately, You can’t since it is against the rules of our company.”
I highly recommend blocking their abusive bot. Its User-Agent is:
Mozilla/5.0 (compatible; Gigamega.bot/1.0; +h*t*t*p://*w*w*w*.gigamega.net/bot.html)
(Asterisks* in URLs added by me, I don’t want this transformed into a link)
Hi Alphane Moon
Thanks for the info. Haven’t seen that one yet. But I think it is only a matter of time before they would show up here.
“litefinder.net” is a harvester. He collects email addresses from web sites and adds them to spam mailing lists.
He visited our web page http://www.abx.de/error.php on 06th November from 74.86.209.74. We showed him an email address generated only for him. On 11th December we received the first spam for this email address.
He visited us on 12th November from 216.40.222.82. First spam on 18th December.
Visit on 24th November 70.85.113.242. First spam on 15th December.
Regards
Andreas Gabler.
“Gigamega.bot” is an other harvester.
He visited our web page
http://www.abx-radeberg.de/error.php
on 25th December 2007 at 02:13 CET.
He identified himself as
“Mozilla/5.0 (compatible; Gigamega.bot/1.0;
+http://www.gigamega.net/bot.html)”.
His IP address: 74.86.209.74.
We showed him an email address generated only for him.
We received the first spam for this email address on
27th December 2007.
The email sender was 195.234.132.37.
Regards
Andreas Gabler.