Introduction: Users may begin their search for information on a topic with a known item, but be interested in expanding their search to locate related material. In a physical collection, they can do this by locating an item on the shelf and then browsing the titles in proximity to that item to look for other material of potential interest. In catalogs, retrieval systems, and search engines, other approaches are needed to help the user locate related material once the record describing the initially sought item has been found. In this assignment you will explore how efficient and effective various search tools are in leading you to material related to the item that is the starting point for your search. In this case you will be looking for two books: a (non-fiction) book by Lawrence Lessig entitled Remix: Making Art and Commerce Thrive in the Hybrid Economy and a (fiction) book by Richard Powers, Generosity: An Enhancement.
Search and Discovery: Different Methods for Finding Related Items
Three search tools, the UIUC Classic Catalog, Amazon.com and the VuFind Catalog, have disparate visual interfaces and very different mechanisms for locating related material, all with varying degrees of precision and recall. Using a non-fiction work by Lawrence Lessig and a fiction title by Richard Powers as starting points, no absolute conclusions can be made about their effectiveness, although the library catalogs tended to show better precision while Amazon.com had better recall, albeit with some cluttered results.
The Classic Catalog interface is the simplest of the three. It displays descriptive metadata including author, title, publishing information and location, and is almost exclusively plain text on a white background, with the exception of a small cover image.
The Amazon.com interface, on the other hand, immediately makes it clear that it is an exploratory search system with plenty of user interaction. Since it is commercial, price information is displayed most prominently, but there is also much more than just basic book details, including user-created lists and reviews.
Lastly, the VuFind Catalog interface attempts to combine the authority of a library catalog with the attractiveness and interactivity of Amazon.com. It includes most of the same metadata at Classic but also Web 2.0 features such as favorites, tagging and comments. None of these features appear to be frequently used, however.
The mechanisms for locating similar items in the three systems were fundamentally different. The Classic Catalog allowed the user to search by LCSH, browse the call number range and retrieve other works by the same author. The VuFind Catalog was similar but did not provide call number searching. It also featured a more precise LCSH search mechanism than Classic in which the user could browse by either a broad or specific heading to control the number of items retrieved. The interface also supports finding related items by user-supplied tags, but as previously stated, this functionality is very rarely used.
Amazon.com offered a completely different set of search mechanisms, many of which were based on community-negotiated information instead of expert-supplied controlled vocabulary. There is a system to search for similar books by user-tagged keywords; this even included a method to control relevance by allowing users to vote for or against a particular tag. There were also two mechanisms that suggested related items by displaying the titles most frequently purchased with the original. Another very exciting feature called “Inside this Book” most closely resembles full-text information retrieval. It allows the user to browse for similar titles based on whether they have certain keywords in the author’s name, title or in the body text of a growing collection of books (currently over 120,000 titles). Finally, there is a “Search by Category” function.
For non-fiction, such as the Lessig book, the controlled vocabulary in the search mechanisms of the Classic and VuFind catalogs result in a higher precision than Amazon.com. Library of Congress Subject Headings are painstakingly applied by professionals and the Dewey Classification system is well organized by classes and detailed subcategories, so only very relevant titles are returned. Actual user information seeking needs, however, do not always correspond to these subject headings or call number ranges. A related item that might be perfect for a user will not show up unless it has been catalogued with these precise headings or numbers. Therefore, while precision may be high, recall is not necessarily so.
Fiction, however, is nearly impossible to classify with any level of detail using the same controlled vocabulary, which results in poorer precision and recall. For the Powers book, for example, the corresponding LCSH keyword “Genetics – Research – Fiction” returns a low number of only 6 results. There is no guarantee of their precision, however, since with fiction it is often elements such as narrative style, form and tone of voice that make an item similar and relevant for users, not extremely broad subject matter. The other LCSH given was “College Teachers – Fiction” which returns 98 results that do not appear to be similar or of much cohesive relevance. In contrast, Amazon’s user tagging mechanism with its system of tag peer-evaluation could potentially improve precision for fiction titles, but it still has not caught on enough to be effective. The Powers book, for example, has yet to be tagged.
On the other hand, Amazon.com’s related item mechanisms generally brought lower precision and higher recall. The “Frequently Bought Together” and “Items Customers Also Bought” are based solely on purchasing patterns and not anything to do with the content of the book. While this type of algorithm is improving, at this stage they can be manipulated by too many factors to be considered reliable and as a result their precision is low. Since they lack professionally applied controlled vocabulary such as LCSH, the precision of Amazon’s mechanisms in non-fiction searches is lower than the library catalog. “Inside the Book” could also do wonders for precision and recall for more specific searches, but it will need to continue to increase the size of its index. Most of Amazon.com’s mechanisms return many more items than narrow LCSH searches, suggesting higher recall, but the user would likely be discouraged by the amount of non-relevant sources he or she would have to sift through in the recommended books and extremely broad category searches. In addition, he or she would have to look all over the page to find the different mechanisms and the category search, for example, is buried near the bottom and barely noticeable.
In conclusion, the library catalogs showed fairly good precision, but suffered with finding related items for fiction. Amazon.com’s mechanisms had better recall but tended to clutter the screen. Some of its mechanisms have the potential to significantly improve both precision and recall for both fiction and non-fiction, but they will need the support of a large community in order to do so.