randfish

This week marked the arrival of Cuil on the search engine scene. Being a huge fan of search technology and how search engines work in general, I've been spending some time playing around with the new service and thought it would be valuable to expose my data on how the classic market leaders - Google, Yahoo!, Live & Ask compare to the newcomer.

When judging the value and performance of a major web search engine, there's a number of items I consider critical to the judging process. In order, these are -- relevancy, coverage, freshness, diversity and user experience. First, let's take a quick look at the overall performance of the 5 engines, then dive deeper into the methodology used and the specific criteria.

Overall Performance

Interesting Notes from the Data:

Methodology: For each of the inputs, I've run a number of searches, spread across different types of query strings. This is an area where understanding how search engine query demand works is vital to judging an engine's performance. Some engines are excellent at returning great results for the most popular queries their users run, but provide very little value in the "tail" of the demand curve. To be a great engine, you must be able to answer both.

Search Query Demand Curve

In most instances, I've used search terms and phrases that mark different points along the query-demand scale, from the very popular search queries (like "Barack Obama" and "Photography") to long-tail query strings like ("pacific islands polytheistic cultures" and "chemical compounds formed with baking soda") and everything in between. You can see a full list of the queries I've used below each section. During the testing, I used the following scale to rate the engines' quality:

Rating Scale for Comparing Search Quality

Now let's dive into the lengthy data collection process...

Relevancy
--------------------
Relevancy is defined by the core quality of the results - the more on-topic and valuable they are in fulfilling the searcher's goals and expectations, the higher the relevancy. Measuring quality is always subjective but, in my experience, even a small number of queries provides insight into the relative value of the engine's results. To collect relevancy, I simply judged the degree to which the top results resolved my inquiry, and weighted those that provided the best answers in the first few positions higher than those that had better results further down.

Relevancy

The following are the queries I used to judge each of the engines on performance:

Coverage
--------------------
Coverage points towards a search engine's index size and crawl speed - the bigger the index and faster the engine crawls, the more pages it can return that have relevance to each query. To judge this metric, I focused on the coverage of individual sites (both large and small) as well as queries in the tail of the demand curve.

Coverage

Queries used for evaluation:

Freshness
--------------------
Although coverage can help to indicate crawl speed and depth, freshness in results shows a keen effort by the engine to place relevant, valuable news items and other trending topics atop the results. I used a number of queries related to recent events both popular and long tail (including new pages from relatively small domains) to test the quality of freshness offered by the engine's index.

Freshness

Queries used for evaluation:

Diversity
--------------------
When search queries become ambiguous, lesser engines often struggle to provide high quality results, while those on the cutting edge can serve up much higher value by providing diversity in their results or even active suggestions about the query intent.

Diversity

Queries used for evaluation (I've only used 3 queries per level here, as more ambiguous query strings are very challenging to identify):

User Experience
--------------------
The design, interface, features, speed and inclusion of vertical results all play into the user experience. An engine that offers a unique display may rank well or poorly here, depending on the quality of the results delivered and whether the additional data provides real value. Rather than separate queries, I've judged each of the engines based on their offerings in this field (using both the data from the previous sets and my own past knowledge & experience).

User Experience

User experience was based on each of the following:


For those who'd like to provide their own input about how to judge a search engine, Slate.com is running a reader contest to ask How do we know if a new search engine is any good? - I'd strongly encourage participation, as I know the audience here can contribute some excellent insight :-)

If you're interested, here's a screenshot of the Google Docs spreadsheet I created to conduct this research (and I've published the doc online here):

Screenshot of Spreadsheet used for Ratings

This kind of thing is a lot of work, and although this isn't scientifically or statistically significant, and clearly biased (as I'm the only one who did the judging), I think the results are actually fairly useful and accurate, though it would be fascinating to run public studies like this on a defensible sample size.

p.s. Want to use any of the images or content from this post? Go for it - just please provide a link back :-)


Do you like this post? Yes No

" />
your link to better business solutions

Comparing Search Engine Performance: How does Cuil Stack Up to Google, Yahoo!, Live & Ask

Posted by randfish

This week marked the arrival of Cuil on the search engine scene. Being a huge fan of search technology and how search engines work in general, I’ve been spending some time playing around with the new service and thought it would be valuable to expose my data on how the classic market leaders – Google, Yahoo!, Live & Ask compare to the newcomer.

When judging the value and performance of a major web search engine, there’s a number of items I consider critical to the judging process. In order, these are — relevancy, coverage,¬†freshness, diversity and user experience. First, let’s take a quick look at the overall performance of the 5 engines, then dive deeper into¬†the methodology used and the specific criteria.

Overall Performance

Comments are closed.