Posted on July 29, 2005.
One thing was clear from the many content aggregators at the SLA Conference 2005, the interfaces are beautiful to look at, taxonomies and metadata are the underpinning for all sophisticated tools, and clustering is being adopted as a search aid. All of that is happening for the specialized information products. On the Web it’s a different matter, at least judging from Google. There could be a battle building between the sophistication of specialty tools and the simplicity, sometimes deceiving, of Google.
Perhaps usability really is improving, or we’re getting used to certain designs. As Marshall McLuhan wrote, “We shape our tools and thereafter our tools shape us.” Search interfaces of the products at all the vendors seemed well laid out and straightforward to use.
ABI Inform, as one example, has a new design using of tabs for topics and publications and dropdown boxes for search options that seems cleaner and simpler. Tabbed navigation is also employed for browsing dissertations by subject and location. (www.proquest.com/division/pqnext/previews/UserInterfacePreview/)
A more complex but very effective interface is seen at Scopus, the abstract and indexing database of scientific, technical, medical and social sciences literature from Elsevier. Scopus was a multi-year project in which 300 scientists and 21 research institutions were involved in testing and evaluation. Testimonials from the University of Toronto, Oxford University and several others praised the interface as well as the connections to full-text holdings and the use of Scirus for web searching. Its success is due in large part to the easy-to-use options for refining the search by source title, author name, year, document type, and subject area. One testimonial states, “Nothing is superfluous, everything is useful, has its place. When you arrive on the site, you are not lost, you know exactly where to look and search, you know what you want and you can express it easily.”
There is a short demo at the Scopus site showing how easy it is to make choices and refine results. (www.scopus.com/scopus/home.url)
Taxonomies and Metadata
The search functionality of these tools is built upon an extensive structure of metadata to describe the records fully, taxonomies to guide the user through the topic, fielded search on the main descriptive elements, and controlled vocabulary used in related terms and indexing.
All of this apparatus is in play with the Compendia from CAB International for research and reference into forestry, crop protection, and animal health and production. For example, you can use the diagnostic search to find pests of plants by host, country, pest type, stage affected and symptoms. The datasheets have extensive profiles with text, images, statistics, and geographic mapping. There are complete taxonomic trees for locating datasheets on species. See these features at work in one of the downloadable demos from Compendia. (www.cabi.org/compendia.asp)
Taxonomies as a structured approach to viewing content are gaining ground. LexisNexis announced during the conference a taxonomy program in which LexisNexis would work with companies to set up a system to organize unstructured content using a combination of customized rule sets and the LexisNexis taxonomies. These are taxonomies LexisNexis has developed over the last 15 years that cover business subjects, industries, geographic locations, companies, organizations, and people. (www.lexisnexis.com/about/releases/0802.asp)
Some specialty information resources are also parsing the text of search results to find commonalities and cluster them into groups that are usually topically inclined.
The Life Science Research Center at Infotrieve does aggregated searches of a range of sources that cover patents, protocols, and papers on scientific subjects related to biotechnology and pharmacology. Again, this is a tool that uses categories and field searches for author, title, year, patent number, and lets you narrow by gene, organism, disease and anatomy. But LSRC also extracts terms on the fly and clusters results according to linguistic use and statistical occurrence using the clustering technology product Clusterizer from CyberTavern.
On the web, you can see Clusterizer at work at iBoogie (www.iboogie.com), a clustering metasearch engine that accesses many Web search engines including Yahoo, MSN, and FindArticles.
Vivisimo (www.vivisimo.com) has been the acknowledged leader in clustering technology. Vivisimo had a booth in the exhibit hall hoping to reach some enterprise customers. It has published several white papers on the value of clustering and goes so far as to say: “Categorization of all this data has been the obvious solution to enable users to better deal with this “information overload”. Traditional approaches to categorization involving taxonomies, however, are too expensive, time-consuming and complex for most organizations. Clustering allows CIOs and IT departments to offer their knowledge workers and researchers the ability to view categorized results without the cost and complexity of taxonomy building.”
HighWire Press (highwire.stanford.edu) , a division of Stanford University, has been a platform for not-for-profit publishers to put content online where the abstracts and often the full text are accessible for free. It offers two topical views, one through subject classification and the other on-the-fly clustering. The second is provided by Vivisimo, which produces an “instant index” for search results whether this be for a general search or of a specific journal. Vivisimo clusters the results by common words or phrases and shows the number of results in each. This co-exists with the earlier and very effective classification done with Inxight software and presented in the topical hyperbolic tree.
Thomson may be delivering the nth degree of search functionality in its products. Mike Gray from the Thomson division for Legal and Regulatory spoke in a session moderated by Mary Ellen Bates on the Future of Search Engines. Gray spoke mainly about the extensive search functionality of Novus, the integrated delivery platform for combining content from various online services. Novus is being used for the legal products, NewsEdge and NewsRoom and will be leveraged across the remaining product lines. From the description we can conclude that it is rich with such value added components as editorial summarization, categorization, field markup, entity resolution, and document versioning. Search supports all the forms that information professionals love – boolean, proximity right down to the sentence level, stemming, synonyms, field searching, customized ranking, and also results by category. There are alerts and clipping services. Much of this is seen in WestLaw’s Smart Tools, which promises “desired results without building an expert query”.
Smart Tools on Westlaw – west.thomson.com/store/product.asp?product%5Fid=Westlaw+Smart+Tools
You must see the online demo especially if you like a lively tenor saxophone.
The specialized tools can take your breath away. What does Google have? Cathy Gordon, Director of Business Development at Google, also spoke in the session on the Future of Search Engines. She sees a future where “search will become more sophisticated and more personal but users will still want simple”. Google seems to be sticking with its formula of indexing everything it can (preferably with commercial benefit) and keeping a very simple interface. Ultimately it sees search as being ubiquitous.
Google has been expanding into premium content from publishers, most evident in Google Scholar, and has even added citation analysis. This is starting to encroach on the territory of the specialized information products. Gordon explained that Google Scholar had been the “first pass at working at content behind the firewall” (meaning the for-fee subscription).
It does have aspirations to serve a “global search community” by adding capabilities to handle translations and work well in multiple-language searches.
There have been some projects to make Google more “user-centric”. Search History is a kind of My Google: it records searches and web sites viewed making it easier to track back later. There is also the Personal Homepage intended for US users. These are in beta and still rough on the edges.
But Google offers no aids to help in narrowing a search, to disambiguate the meaning through any form of classification, powered by people or machine. It has sidelined its use of Open Directory Project, admittedly now the sick directory on the Net, and has never adopted any clustering of results using Vivisimo or other technology. When asked, Carol Gordon just said that clustering is very hard to do well.
There could be a contest developing between the high-powered but possibly complex search tools like ProQuest and Google’s plain and simple keyword search.
The Googlization of search was the subject of a debate last winter at the Pennsylvania Library Association Annual Conference – Googlizers vs Resistors. One comment by Steven Bell, director of the library, Philadelphia University especially captured the conflict.
“Googlization sends our users a dangerous message. It suggests that we no longer believe in the advantages provided by traditional techniques like field search, or in the power of controlled vocabularies. If it all works like Google, why would these powerful search tools be necessary?… Consider a ProQuest Smartsearch: Even when people type in a bad first search it comes back with…some additional ways to find information on this topic. It will take a search that was typed in as a phrase and turn it into a Boolean search. It will provide you with subject headings from the controlled vocabulary. Google doesn’t care what happens.” www.libraryjournal.com/article/CA485756.html
Bill Buxton, Principal of Buxton Design, in his keynote on June 7 noted “virtually all technology that is going to have an impact in the next 5 years has been around for 10 to 15 years.” My bet is on taxonomies – however that is extended to Web searching – and clustering.
Gwen Harris teaches web-based and classroom courses on Web searching and Current Awareness Services through the Professional Learning Centre at the Faculty of Information Studies, University of Toronto. Visit www.websearchguide.ca for information on courses and the Internet News weblog.
1 Professional Learning Centre at the Faculty of Information Studies, University of Toronto, has an online resource for learning more about taxonomies. The Taxonomy Guide: Online Resource & Learning Tool will be available in the Fall 2005.