Recent Posts

Comparing Search Engine Performance: How does Cuil Stack Up to Google, Yahoo!, Live & Ask

Posted by randfish

This week marked the arrival of Cuil on the search engine scene. Being a huge fan of search technology and how search engines work in general, I've been spending some time playing around with the new service and thought it would be valuable to expose my data on how the classic market leaders - Google, Yahoo!, Live & Ask compare to the newcomer.

When judging the value and performance of a major web search engine, there's a number of items I consider critical to the judging process. In order, these are -- relevancy, coverage, freshness, diversity and user experience. First, let's take a quick look at the overall performance of the 5 engines, then dive deeper into the methodology used and the specific criteria.

Overall Performance

Interesting Notes from the Data:

  • I'm not that surprised to see Yahoo! come out slightly ahead. Although their performance on long tail queries isn't spectacular, when you weight all of the items equally, Yahoo!'s right up there with Google. There's a reason why people haven't entirely switched over to Google, despite the far stronger "brand" they've created in search.
  • Google is good across the board - again, not surprising. They're the most consistent of the engines and perform admirably in nearly every test. To my mind, despite Yahoo! eeking out a win in the numbers here, Google is still the gold standard in search.
  • Ask has some clear advantages when it comes to diversity and user experience, thanks to their 3D interface, which IMO does provide some truly excellent results, particularly in the head of the demand curve.
  • When it comes to index size, Yahoo! appears to have the win, but I think my test is actually a bit misleading. Although Yahoo! clearly keeps more pages on many of those domains indexed, I suspect that Google is actually both faster and broader, they simply choose to keep less in their main index (and that may actually help their relevancy results). Google's also excellent at canonicalization, an area where Yahoo! and the others all struggle in comparison.
  • The biggest surprise to me? Microsoft's Live Search. I'm stunned that the quality and relevancy of Live Search is so comparatively high. I haven't done a study of this scale since 2006 or so, but the few dozen searches I run on Live each month have always produced far worse results than what I got this time around. Clearly, they're making an impact and getting better. Their biggest problem is still spam and manipulative links (which their link analysis algorithms don't seem to catch). If they fix that, I think they're on their way to top-notch relevancy.
  • Cuil doesn't permit a wide variety of very standard "power" search options like site:, inurl:, intitle:, negative keywords, etc. making it fairly impossible to measure them at all on index size (though the lack of any results at all returned for terms & phrases where the other engines had hundreds or thousands speaks volumes). It also put their technical and advanced search scores in the doldrums - none of the "technorati" are likely to start using this engine, and that's an essential component of building buzz on the web Cuil's missed out on.
  • Cuil was foolish to launch now. Given the buzz they had and the potential to take market share (even a fraction of a percent is worth millions), they should have had lots of people like me running lots of tests like this, showing them how clearly far behind they were from the major engines. You only get one chance to make a first impression, and theirs was spoiled. I won't predict their demise yet, but I will predict that it will be a long time before Michael Arrington or anyone in the tech or mainstream media believes their claims again without extremely compelling evidence. Their index, from what I can see, is smaller than any of the major engines and their relevancy is consistently dismal. I feel really bad for them, personally, as I had incredibly high hopes that someone could challenge Google and make search a more interesting marketplace. Oh well... Maybe next time (assuming VCs are willing to keep throwing 30+ million at the problem). 

Methodology: For each of the inputs, I've run a number of searches, spread across different types of query strings. This is an area where understanding how search engine query demand works is vital to judging an engine's performance. Some engines are excellent at returning great results for the most popular queries their users run, but provide very little value in the "tail" of the demand curve. To be a great engine, you must be able to answer both.

Search Query Demand Curve

In most instances, I've used search terms and phrases that mark different points along the query-demand scale, from the very popular search queries (like "Barack Obama" and "Photography") to long-tail query strings like ("pacific islands polytheistic cultures" and "chemical compounds formed with baking soda") and everything in between. You can see a full list of the queries I've used below each section. During the testing, I used the following scale to rate the engines' quality:

Rating Scale for Comparing Search Quality

Now let's dive into the lengthy data collection process...

Relevancy
--------------------
Relevancy is defined by the core quality of the results - the more on-topic and valuable they are in fulfilling the searcher's goals and expectations, the higher the relevancy. Measuring quality is always subjective but, in my experience, even a small number of queries provides insight into the relative value of the engine's results. To collect relevancy, I simply judged the degree to which the top results resolved my inquiry, and weighted those that provided the best answers in the first few positions higher than those that had better results further down.

Relevancy

The following are the queries I used to judge each of the engines on performance:

  • Top Buzz: gas prices, iphone, facebook, dark knightbarack obama
  • Popular: laptops, photography, rental cars, scholarship, house plans
  • Mid-Range: fire prevention, calendar software, snow tires, economic stimulus payment, nintendo wii games
  • Long Tail: pacific islands polytheistic cultures, chemical compounds formed with baking soda, genuine buddy 50 scooter reviews, google toolbar pagerank formula, getting a novel published
  • Technical: metalworking inurl:blog, cricket -site:.co.uk -site:.com.au, dark crystal site:imdb.com, top * ways, definition sycophant

Coverage
--------------------
Coverage points towards a search engine's index size and crawl speed - the bigger the index and faster the engine crawls, the more pages it can return that have relevance to each query. To judge this metric, I focused on the coverage of individual sites (both large and small) as well as queries in the tail of the demand curve.

Coverage

Queries used for evaluation:

  • Large Sites: site:government.hp.com, site:research.ibm.com/leem, welsh rugby site:bbc.co.uk, search engine optimization site:w3.org, tango tapas seattle site:nytimes.com
  • Mid-Size Sites: site:seomoz.org/blog, site:news.ycombinator.com, site:education.com/magazine, bumbershoot site:thestranger.com, snowboards site:evogear.com
  • Small Sites: site:downtownartwalk.com, site:amphl.org/, site:totebo.com, dockboard site:loadingdocksupply.com, site:microsites.audi.com/audia5/

Freshness
--------------------
Although coverage can help to indicate crawl speed and depth, freshness in results shows a keen effort by the engine to place relevant, valuable news items and other trending topics atop the results. I used a number of queries related to recent events both popular and long tail (including new pages from relatively small domains) to test the quality of freshness offered by the engine's index.

Freshness

Queries used for evaluation:

  • Top Buzz: los angeles earthquake, obama germany, gas prices, ted stevens, beijing olympics
  • Popular Queries: new york city weather, dow jones average, seattle mariners schedule, cuil launch, nasa news
  • Mid-Range Queries: warp speed engine, unesco world heritage, movie times 98115, comic con 2008, most charitable us cities
  • Long Tail Queries: melinda van wingen, over the hedge comic 7/28, seomoz give it up blog, scrabulous facebook, internet startups that failed miserably

Diversity
--------------------
When search queries become ambiguous, lesser engines often struggle to provide high quality results, while those on the cutting edge can serve up much higher value by providing diversity in their results or even active suggestions about the query intent.

Diversity

Queries used for evaluation (I've only used 3 queries per level here, as more ambiguous query strings are very challenging to identify):

  • Highly Ambiguous: mouse, ruby, drivers
  • Moderate Ambiguity: comics, shipping, earth
  • Relative Clarity: ibm, harry potter, graphic design
  • Obvious Intent: seattle children's hospital map, color wheel diagram, great gatsby amazon

User Experience
--------------------
The design, interface, features, speed and inclusion of vertical results all play into the user experience. An engine that offers a unique display may rank well or poorly here, depending on the quality of the results delivered and whether the additional data provides real value. Rather than separate queries, I've judged each of the engines based on their offerings in this field (using both the data from the previous sets and my own past knowledge & experience).

User Experience

User experience was based on each of the following:

  • Query Speed - the average time from hitting the search button to having a fully-loaded results page
  • Results Layout - including the organization of results, ads, query options, search bar, navigation, etc.
  • Vertical Inclusion - the inclusion of valuable vertical or "instant answer" style results where useful
  • Query Assistance - the use of disambiguation, expansion, and similar/related queries
  • Advanced Features - the ability to conduct site specific searches, search for terms only in specific URLs or titles, and narrow by website type, a given folder on a domain, etc.  

For those who'd like to provide their own input about how to judge a search engine, Slate.com is running a reader contest to ask How do we know if a new search engine is any good? - I'd strongly encourage participation, as I know the audience here can contribute some excellent insight :-)

If you're interested, here's a screenshot of the Google Docs spreadsheet I created to conduct this research (and I've published the doc online here):

Screenshot of Spreadsheet used for Ratings

This kind of thing is a lot of work, and although this isn't scientifically or statistically significant, and clearly biased (as I'm the only one who did the judging), I think the results are actually fairly useful and accurate, though it would be fascinating to run public studies like this on a defensible sample size.

p.s. Want to use any of the images or content from this post? Go for it - just please provide a link back :-)


Do you like this post? Yes No

read more

Google AdWords

A business can only meet success when its products or services on sale meet their necessary marketing or advertisement. Now, when someone talks about advertising their product/service online, nothing can be more appropriate than AdWords. AdWords was launched by Google in the year 2000 and by 2007 it was their main source of revenue. Initially an advertiser had to pay a monthly amount for advertising their product/service, but later in order to make room for small businesses and advertisers who wanted to manage their own promotion; Google launched the AdWords self-service portal. Google’s AdWords aim has been to provide the most efficient advertising to businesses irrespective of their size.

read more

SearchCloud Weights Keywords To Improve Search Relevance

SearchCloud, a new search engine that launched on July 17, has a new take on search refinement that it hopes will make it a useful alternative to the likes of Yahoo and Google. Instead of simply entering multiple keywords, users can rank how important each term is to the search. Each term is ...

read more

7 Reasons To Go for Internet Marketing Than Using Conventional SEO Tools

When we think of new upcoming economy under the Indian Economic Ambiance, we definitely think of either setting up a new business or opt for expanding our current business, in both perspective of the market we need customers, Industries today do not go For Door 2 Door conventionalizing but instead they would rather prefer ...

read more

Information Architecture – a site review is nothing without it.

Posted by Duncan Morris

Will and I have a recurring argument about what should and shouldn't be in a site review. My argument has been, and remains that before you can do a proper site review you need to do keyword research, in order to validate that the site architecture is correct. Whereas Will says his argument is that you can separate a "site review" into two separate parts: technical review and keyword targeting review - which could be separate deliverables and for only the second of which do you need keyword research.

I have been doing a fair few site reviews recently and one thing has stuck out. Yes, almost every site I've ever looked at has technical issues that should be fixed. Yes, people are still using font tags in a deliberate attempt not to pass semantic information to the search engines , and yes some people still insist on creating a hideous flash monstrosity. However, the biggest issue (ignoring the hideous flash monstrosity which deserves everything it gets) is not something that can be fixed by tweaking a template here, or adding a mod re-write rule there.

I'd love to stir this up into a big issue, but unfortunately it really isn't. You see, Will knows i'm right but never likes to admit he's wrong. Will wants us to first do and send the client a technical site review. After that he argues we can look at the keyphrase research and information architecture. I'm a firm believer that step one should be keyphrase research which can then feed into a site review which not only looks at if there is a h1 tag on the page, but whether the keywords in the h1 tag are the right keywords.

I see a correlation between big sites and fairly few or fairly small technical issues. However, the opposite is true of site architecture, where the bigger the site the more site architecture problems there tend to be.

Site, or Information architecture issues fall into a number of camps. duplicate content, keyword cannibalisation, and a distinct lack of keyword targeting. All of which in my opinion are a bigger hurdle to ranking than most of the issues that are picked up in a technical site review.

As an example, I was looking at a site the other day that is one of a number of trusted cisco partners, providing cisco training. Technically the site was ok (ish), but whoever wrote the content of the site certainly didn't have the search engines in mind, come to think of it, i'm not sure they had anyone in mind.

They had a page linked from the homepage of the site talking about the training they offered. The title tag of the page was Company Name | Training. The header of the page was Training with Company Name, and the page didn't mention Cisco once.

<not very subtle jibe>Obviously Will understands, just as well as you all do that updating a header (coded in a font tag) to a h1 tag won't make the slightest bit of difference if the keyword isn't in the header.</not very subtle jibe>

With this issue in mind, I'd like to propose the following methodology for a full site review, and see what you guys think.

Step 1 - Keyphrase research. I think its vital to get this done as early as possible in any process. Keywords drive seo, so you want to know these as early in the process as possible. I'm as guilty as anyone for thinking I can get by without keyword research. Keywords are obvious right up until the point that someone point out you are wrong.

At this stage if you can end up with more than just a list with search volumes you are on to a winner. Try to spot patterns in the way people search. You want to start with short tail keywords and find a hierarchy leading you to your pages.

Step 2 - Site Architecture. This step is, in my opinion where the big bucks are earnt. Coming up with a site architecture can be very tricky. At this stage you need to look at your keyword research and the existing site (in order to make as few changes as possible). You can think of this in terms of your site map. You need a hierarchy that leads you to each of your "money pages" (ie those pages where conversions are most likely to occur). Obviously, a good site hierarchy allows the parents of your money pages to rank for relevant keywords (which are likely to be shorter tail).

Most products have an obvious hierarchy they fit into, but when you start talking in terms of anything that naturally has multiple hierarchies it gets incredibly tricky. The trickiest hierarchies in my opinion occur when there is a location involved. In London alone there are london boroughs, metropoliton boroughs, tube stations and postcodes. For you fact junkies out there, London even has a city ("The City of London") within it.

In an ideal world you will end up with a single hierarchy that is natural to your users, and gives the closest mapping to your keywords. Whenever there are multiple ways that people search for the same product it makes coming up with a hierarchy that much harder. Rand touched on this (relating to blogs) when he was talking about solving indexation problems

Step 3 - Keyword mapping. Once you have both a list of keywords and a list of pages, spending the time mapping one to another is well worth it. It suddenly becomes a very easy job to spot pages that aren't targeting a keyword and arguably more importantly keywords that don't have a page.

Its worth pointing out that between step 2 and step 3 you will remove any wasted pages. Rand covers exactly this problem in his 2nd Headsmaking tip. How to come up with top level navigation naming conventions.

If this stage is causing you issues, I suggest you revist step 2. Your site architecture should lead naturally to a mapping that is both easy to use, but, importantly for the search engines includes your keyphrases.

Step 4 - Site review. Once you are armed with your keyword mapping a site review becomes a lot easier. Take a look at Tom in whiteboard studios who talks you through a site review process. Now when you are looking at title tags, and headings you can refer back to your keyword mapping and not only see if the heading is in a h1 tag, but also if it includes the right keywords..

So, to help finish my debate with Will, I'd love to hear your thoughts on how you go about a site review. Do you prefer to send through one document with everything included, or would you rather send multiple documents over time, but with the first technical site review being delivered earlier?

Do you like this post? Yes No

read more

Beware the Hype for Software as a Service(SaaS)

Time to dispel a few popular myths. SUVs are not cool. They never were. You Hummer guys were drawing snickers a few years ago. Now, with the price of gas nearing $5 a gallon, we're laughing out loud. And Microsoft's Vista is not a failure. To date, the software company has sold more than 150 million units. Vista has made Microsoft a ton of money. Yes, yes -- it's preloaded on every new computer.

read more

Open Web Foundation to Play Freedom Cop for Net Specs

The Open Web Foundation introduced itself to the world last week at OSCON, the Open Source Convention, held in Portland, Ore. The consortium of individuals and Internet companies is an effort to build a home for community-driven specifications on the Web. The organization follows open source models already seen in the Apache Software Foundation.

read more
Page 3 of 6123456