Monday, June 20, 2011

Advanced Search in the New Age

I've struggled with this subject all day. It's hard to pin down why. I enjoy running a great search, and I'm good at it. I think it must be because many of the tips and tools I'm highlighting, are as natural to use as breathing. It's tough not to step over my own feet when laboriously laying out all the steps. The problem these days - on google at least - is not the absence of results. The problem is too many results from a simple search.

When the internet was new, my girlfriend showed off her google search for "superman". Her son was a comic book buff, and she and her son marvelled at the speed of the return; four hits. When she demonstrated for me a couple months later, we found twenty sites. And goggled. My, how the internet was growing in leaps and bounds. Today, a google search on the same term gave me 168 million hits. My mind boggles at that number. In truth, I won't look past a couple pages. The likelihood that I would find a significant result any deeper is just too small.

To make sense of this mass of information at our fingertips has made search an art. Find a term significant and unique enough to bring back the result I need, but not so narrow that it filters out the gold. A way to develop this fine touch is to start with the narrowest search you can think of. Try enclosing your google phrase in quotation marks. If you get no results, broaden your search ever more slightly. After  a while, you will develop a fine touch. Here are two google searches I conducted recently, that required several google tries to find me what I wanted:

  • There is an archeaological dig on she shores of Galilee, profiled by the Naked Archaeologist . There's evidence of a fishing industry, and early Christian activity. What was the name of the dig? I'd forgotten. I searched filtering only Naked results, and found the name of the fishing village. I then broadened the search for Bethsaida. Google corrected my spelling, of course. And there it was, in satisfying detail, the results of a dig briefly profiled on Naked.
  • A student mentioned BlueCielo as an Electronic Content Management (ECM) tool that manages engineering drawings in Computer Aided Design, (CAD) format. After checking out the official site, I wondered what the community is saying. I used advanced search to limit the results to "Discussion:". Google found me what I wanted, but the discussions were empty. What is it with the community? Do they sit around the water-cooler to chat? Is there no twitter feed, no chatter, no casual trail for me to follow? I remind myself that this is not all bad. People talking. In person.
Before I go any further, I'll briefly discuss the differences in a corporate electronic file search and the world-wide web. Most of the time when conducting an internal search, you are looking for something you know exists. You either put it there yourself, or it is a manual/report/document that you have referred to in the past. You resort to search because you've forgotten in the webonious structure where you've last laid it. If it is an Explorer search, a panting dog may wag his way through to help you.
Failure to find the document you are looking for will likely lead to a few hours of frustration. Because unlike a google search, you must find the document of your recollection. The average information worker spends 8.8 hours a week searching for information. (Ref. The Importance of Enterprise Search, slide 13, IDC Hidden Costs of Information Work (2005) ). It is therefore critical that the electronic information management system that you select is capable of masterful (and swift) searches.

Similarly, in an e-discovery (may you never be blessed), search results must be consistent and complete. Correspondence has an annoying habit of referencing past correspondence. Does the search find both? Missing key documents will challenge the comprehensiveness of your records, and the reputation of your corporation.

Now that my little rabbit trail is done, I can go back to discussing advanced searching techniques. Most of these help you narrow your search. As I've mentioned before, a dearth of answers is not our problem. If you don't believe me, try running a search for "report" (5 billion hits on google). Advanced techniques include wildcard searches (named after the Joker in our decks), boolean searches (AND, OR and NOT), and a few more I found during my google search today; fuzzy, proximity and range. Though google calls these features by another name, you can practice wildcard and boolean in advanced search. Google has a great help page for advanced searchers.

Wildcard is replacing a character or range of characters with a symbol ("*" on most of the systems I looked at today). I would have found Bethsaida sooner if I had typed Bet*da. I'd mistakenly looked for it as Bethseda.

The boolean link I've referenced is a great tutorial that graphically illustrates the different sorts of results you get. Google uses these same boolean terms, so check out the results. AND and NOT gives you a narrower result. If you care to check out my internet presence, try the google result "jgnat -java" (jgnat NOT java).

I've begun reading up on fuzzy, proximity and range when reading the features of Apache Lucene, an open source search engine. I won't try and pretend to explain them fully here. Range can be very helpful to narrow to a period of time, (i.e. Business Plans for the first three quarters of 2009) and tricky to get right. Fuzzy claims to bring back words that sounds like (but are not spelled like) what you've asked for. This might also have helped me find Bethsaida.

It is very worthwhile as information professionals to master these techniques. Information workers need all the help they can get to find their information swiftly and consistently. Be the expert, and we will demonstrate our worth to the organization many times over.