Search me! Searching the Internet more effectively

We all know how to search the Internet, but how often do you get optimal results? Could you be using a better search engine, or better technique?

Internet search has been synonymous with Google for so long that there might seem little else to say. As far as visitors to this blog are concerned, it is the only search engine in town: 96% of visits which come through a search engine arrive from Google, with around 1% each from Yahoo Search, Bing, and DuckDuckGo.

However, it is a common complaint that searches do not return the desired results, either missing them altogether, or overwhelming with a vast number of hits. Searching for non-verbal content such as images, movies, and audio can be a shot in the dark, yet success can save much time and effort.

How they work

The principles behind Internet searching are essentially the same as those used in local Spotlight search. When you enter a text string, the search engine looks them up in its compiled index, and returns an ordered list of pages (or, for Spotlight, files) containing the search string.

Indexes are compiled by a worm-like ‘crawler’ fetching content, which is then digested into a vast system of indexes. Different search engines compile their indexes in different (proprietary) ways, search them differently, and use different systems for determining the order in which hits are listed.

General search

There are currently four major generalist web search services, based on three different search engines:

  • Google, the most famous, born in the late 1990s from a Stanford University research project and by far the largest, with nearly 70% of the market;
  • Bing, Microsoft’s relative newcomer since 2009, which grew to around 30% of market share but has now fallen back to 12%;
  • Yahoo, developed from an older web directory service in 2004, prior to which it used Google as its search engine, now using the Bing engine and in steady decline;
  • DuckDuckGo, which has grown slowly since its launch in 2008, but because of its efforts to protect user privacy and prove ‘family safe’ has recently become more popular.

To compare like with like, I ran a series of searches across these, search agglomerators like Dogpile, and specialist services. These were based on two terms: “alum bay wight”, and “sorley maclean” (the Scottish Gaelic poet, also known by his Gaelic name of Somhairle MacGill-Eain). These were chosen to generate analysable results, and I know the key sites and images which should have featured high in the hits.

isgoogle

Inevitably, when asked to perform text searches, Google consistently returned the largest claimed number of hits, attaining 113,000 for tiny Alum Bay, and 55,800 for our obscure (but wonderful) poet. Interestingly, when tested four years ago, Google returned far more: 1,800,000 and 97,500 respectively. Since then, Google seems to be returning far fewer duplicates, or multiple hits from the same site, so although in sheer hit numbers it may not appear as good, quality has improved a lot.

isbing

Bing now comes close, though, with 66,100 and 19,000 hits respectively (down from 1,620,000 and 37,400 four years ago), and Yahoo appeared almost identical to Bing at 66,100 and 18,900 (down from 263,000 and 23,300). As with Google, Bing now returns few duplicates or multiple hits from the same site, and Yahoo looks the same.

isddgo

I did not test DuckDuckGo four years ago, and at present it returns only a very small fraction of the hits achieved by the other engines: no more than 100 for each of the two terms.

Many other generalist and specialist search engines have had a glorious past, or aspire to an exhilarating future, such as those listed on Wikipedia. Four years ago I checked AltaVista, now owned by Yahoo and redirected to it, and Ask, which seems to have dropped out of this race altogether.

isdogpile

Search aggregators, another sector in the market, farm out your requests to multiple engines at the same time, collating the results. Dogpile did not reveal how many hits had resulted, and topped its list with several ads. Metacrawler now redirects to Excite, which is in turn powered by Dogpile, and returned the same as Dogpile.

Image search

isgoogleims

Searching for images is harder to assess now than it was four years ago. Then each engine was proud to report the number of hits, but now they do not. The previous gap between Google and Bing seems to have reduced here, but without figures for the number of hits, this is impossible to assess. There also seems now to be little difference between the images returned.

isbingimage

Searching across the web for images is a problem, because there is no universal system for tagging images with meaningful metadata, forcing search engines to look for the titles of images, and text in their immediate context. For example, were these pages to be offered to a search engine, a search engine would probably associate Alum Bay with Sorley Maclean, leaving you wondering what contorted logic might be behind this.

isddgoim

It is much more common for image searches to return significant proportions of misses, sometimes quite comical. Try searching for images using the term “skean dhu” and you will be surprised to see photos of hotel rooms, unless you know that this is also the name of at least one famous hotel.

Improving your hits

Choosing which engine to use is only the first step in searching, though. Each engine offers more advanced features that can be used to limit results, for instance to a particular period in time, which can help you turn millions of results into more manageable lists. When you can, use Google Advanced Search – although advanced search is an option which has been dropped from Yahoo now that it uses the Bing engine.

Each service its own quirks which you need to learn in order to increase the specificity of your searches. For example, searching on Google for “alum” returned over 48 million hits, with pages about Alum Bay not appearing until well down the result pages. Add the word “bay” to the term, and hits fall to over 1 million, with far better representation for the location on the Isle of Wight. Go really specific with the terms “alum bay wight sand bracklesham” and you will get just 10,500 hits, almost all of which are right on target.

One powerful strategy for optimising search is to know enough about the pages that you seek to construct a sequence of specific words that will appear in those pages, that is to imagine the exact words that are on those pages, and enter those as the search term. Care is needed, though, to ensure that you do not overspecify, in so doing eliminating pages that contain new material that you would not anticipate. Details of using Google Advanced Search features are here.

Experiment and explore, and develop your search skills.

Tools: Text Specialists

Of the unconventional and specialist search engines, Wolfram Alpha is more unusual and potentially the most useful for many different requirements. Feeding it Sorley Maclean’s name, it identified him as a poet, gave his full name (in English, not Gaelic!), dates of birth and death, a photo of a sculpture of his head, and showed his life timeline. It also gave his brother’s name, a short summary of his Wikipedia entry, and a graphical record of the Wikipedia page hits history.

iswalphaThose working with maths, whether numeric or using algebraic symbols, should already be familiar with Alpha’s unique features in those areas, and it is an excellent portal to many large databases of statistical and other data.

isnddb

NNDB is an unusual online international ‘Who’s Who’ with many bibliographic links for those famous people included, which has an innovative ability to display linkage between the many individuals that it lists. Other historical and cultural sites often include powerful local search engines, in many cases supplied by Google.

Anyone involved in research or the sciences should be familiar with Google Scholar, which performs a metasearch of many other more specialist search engines covering everything from agriculture to surgery and zoology, and can encompass patent databases if you wish. Where research papers are available for direct download, it also offers direct links to the paper, which can save a lot of effort.

It has not supplanted the many individual subject-specific search engines, usually run by learned societies and national libraries. For example, the standard medical resource is PubMed, run by the US National Library of Medicine, whilst psychological sciences are catered for by PsycINFO among others. Many such services require memberships of subscriptions that can be very expensive, although the trend is towards open, free access.

Tools: Image Specialists

Given the difficulties inherent in searching for still images and movies using text, you need a range of different services to get the best results. Beyond Google Images or Bing, try a specialist such as Picsearch, which has advanced features allowing you to limit results to colour, by size, etc.

ispicsearchCommercial sources such as Getty can provide richer results, and have invaluable tools for honing in on just the right hits too. Although mainly geared to deliver commercial content, Getty includes royalty-free products, and vector graphic illustrations. Unfortunately it drew a blank on Sorley Maclean, but many of Getty’s images are stunningly beautiful high-resolution photographs.

If you need images that will be free to use, Creative Commons Search covers flickr, YouTube, Pixabay, Wikimedia Commons, and more.

Some search engines try to analyse an uploaded image to find similar or related images. TinEye performs this very strictly, and could be useful if you have a small or poor photo and seek a better original. It has now indexed a total of 13.5 billion images.

Additional Search Sites:

DMOZ Web Directory
Internet Archive, including the WayBack Machine

Lists of Search Engines:

Wikipedia,
The Search Engine List,
Phil Bradley’s list.

Updated from the original, which was first published in MacUser volume 27 issue 15, 2011.