Tool Review Hathi Trust Research Center

Hathi Trust Research Center strives to make mass amounts of data accessible to researchers through a variety of digital tools. Its sources are accessible not only due to the visualizations and computational options the site provides but because it allows researchers to work with the data of texts digitized by Hathi Trust even if the full texts may not be available due to current copyright laws. The digital tools on the site are easily customizable and can be used to fit a variety of research interests. HTRC offers detailed explanations of what each tool is meant to do, making many tools beginner-friendly. In addition to a variety of customizable input options, most tools give researchers the option of several different outputs. The downside of a site that tries to tackle such a large amount of data is its speed, which can be slow at times on regular performance machines.

HTRC’s most accessible digital tools are its algorithms. One such tool is called the InPho Topic Model Explorer, which attempts to show the topics that exist within a dataset. The input is simple and the option of an HTML visual output makes the topic modeling easy to read and interpret. The user can toggle between numbers of topic clusters, and the expanded view option increases readability. Output can be saved and viewed in a variety of file formats in addition to HTML, which allows the output to remain available even outside of HTRC’s platform. Though the tool is useful in its visual elements and user-friendly interface, the topics themselves don’t appear to be entirely useful and the colors that “correspond” with topics appear to be relatively arbitrary. I can imagine that the overlap of topics in various clusters and the inability to create a list of stop words could prove to be a problem for some researchers.

Another algorithm is the Named Entity Recognizer, which scans a dataset for any named people, places, organizations, etc. This could serve as a useful tool to see who the important players are in particular texts or where certain groups or people were located at different times. Unlike the previous tool, the Named Entity Recognizer allows the user to choose between languages, though it only allows for one language to be used at a time. For the most part, the tool classifies things in a helpful way. However, some words in the list are repeated and some words aren’t actually named entities like the word “Her,” which was listed as a person. The output of this tool is only available as data and could be slightly harder to read than a visualization.

Another tool is the Token Count and Tag Cloud Creator, which creates a word cloud based on the input criteria that the user enters. This tool is straightforward but it seems to stop producing an HTML visual output after a certain number of tokens are requested.

In addition to the algorithm tools, HTRC’s site has an “Explore” tab which includes several tools that are in the process of being tested and could potentially become permanent features of the site. Some are more accessible than others as some require prior coding knowledge. The explore tab is useful because it gives the user insight into the changes HTRC is making in its digital tools and the new areas of research analysis that they may explore in the future.

HTRC offers a variety of tools that perform different tasks, which allows the user to accomplish many of their research tasks on one platform rather than switching between platforms. This ease of access to tools is paired with the ease of accessing information through Hathi Trust’s digital archives. The site is simple enough that some of the tools could be used by beginners and complex enough to process more difficult tasks if needed. Overall, HTRC seems to be a useful tool for researchers and I can only imagine that it will get better with more time and effort dedicated to expanding its reach.

Written on November 13, 2019