We all use search engines several times a day, left, right and centre. Most of us use our favourite engine, which is most likely Google, and are happy with the results it fetches. But the fact remains that, particularly considering the disconnectedness of the web, we’re indeed missing out on several results. Popular things — people, companies, search engines — always have to pay a price. Just like most of the viruses are written for windows, most of the search engine hacks are targetted at Google.
As a simple example, a search for miserable failure leads you to Biography of President George W. Bush (also by searching for failure) and Biography of Jimmy Carter! This is a typical example of a googlebomb:
“A Google bomb or Google washer is a certain attempt to influence the ranking of a given page in results returned by the Google search engine. Due to the way that Google’s PageRank algorithm works, a page will be ranked higher if the sites that link to that page all use consistent anchor text. A Google bomb is created if a large number of sites link to the page in this manner. Google bomb is used both as a verb and a noun.”
—Wikipedia entry on Google Bomb.
Wikipedia has an entry on miserable failure, as well.
It’s no big deal that a search for failure pops up such a result, since you are quite unlikely to google for such terms. But it clearly emphasises the fact that, despite spam filters etc, it is still likely that you end up seeing results that really do not matter to you in the context of your search. Note context.
That brings me to the point of clustering search engines. Considering how big Yahoo! powered by Inktomi was, it was indeed foolhardy to venture to build another search engine, but that’s what great technology can do, and we end up with a legendary Google. The next burning question is to question the hegemony of Google. Every newcomer probably now builds on the google experience of having an extremely uncluttered start page. So that’s good. Next, they try various other things, one of them being clustering. Clustering is certainly useful for disambiguation (you’ve seen this term often if you regularly use wikipedia!). Searching for ‘Cricket’ would give you entries on the game as well as the insect, certainly more of the former, and you would be hard-pressed to find entries on the latter! Cricket is perhaps a wrong example, in that the entries are just so many, that the game bulldozes the insect out of the search engine!
Vivisimo is a clustering search engine, that does a decent job.
Previewseek (“the world’s most advanced search engine” is their slogan) is a very very good looking search engine, that throws up quite interesting results. For example, a search for cricket directly gives you several useful links. It already says at the top that “Previewseek know this about cricket …” at the top, which would be of great use if you are looking to understand a term or search (I almost said google!) for information about something. The wikipedia entry for cricket is also on the first page. It does use some bandwidth by displaying screenshots for the pages (the parent page of the page you wish to see, I feel), but for broadband users, that’s nothing to worry; you can preview a page before you jump into it. It looks a bit of an AJAX-type interface that’s cool and allows you to add search terms based on the results (a plus next to India, in cricket results , when clicked, will take you to cricket india results). Google does not give the wikipedia entry for cricket in its search, in the top ten. In fact, it’s at a rather bad 34 on the results list!
While I am on the topic of result ranks, I must mention this website, synerge, which compares the ranks of the results between Yahoo! and Google:
PageRank is good no doubt, but spamming etc has its effects on it. I am certain that any algorithm for search, by definition, can be tricked. It’s just like encryption: no key is unbreakable (I’ll ignore quantum cryptography for the moment!). However, 1024-bit encryption does offer practically foolproof security. It’s just that we must continue to strive for such a search algorithm, that is practically foolproof.
There’s another search engine Kosmix, and this is what they have to say about themselves:
“At Kosmix, we’re passionate about building a world class search engine that lets people search less, and discover more great stuff. There are billions of pages on the web that are useful, but never see the light of day through a standard search engine. We want to help you find those great pages, and make it easy and fun to do in the process.
Right now we’re in the early stages of Kosmix, and at this point only cover a handful of categories. Our list is growing fast, so check back with us to see what’s new.”
While Google searches pages based on popularity; Kosmix promises to work differently, and will have categorization based on content. With Kosmix, users will be asked to define a search category, and the search engine will then find Webpages that are closely associated in meaning with the search term. Kosmix – which has already started testing a health search on its website, will launch several other search categories over the next year.
The puropse of this article is just to make sure you have an eye on other search engines, including MSN, and not blindly run behind Google, however difficult that may be. None of them is way behind the other, except in popularity! Of course, I’d use Google Scholar ahead of any other search engine, but even there, there’s Entrez PubMed (or HubMed), Scirus and the like which are quite good.
The race is well and truly on, and we, as consumers, are in for a treat, for the problem of plenty is good to have!