Text Mining with the HathiTrust: Empowering Librarians to Support Digital Scholarship Research - Northwestern University

Text Mining with the HathiTrust: Empowering Librarians to Support Digital Scholarship Research - Northwestern University

By HathiTrust Research Center

Date and time

Wednesday, October 25, 2017 · 10am - 4pm CDT

Location

Northwestern University Seeley G. Mudd Library, Room (2210)

2233 Tech Drive Evanston, IL 60208

Description

The Northwestern University Libraries will be hosting a Text Mining Workshop for librarians in the Chicago area on Wednesday October 25th on the Northwestern University Evanston Campus in Mudd Library, room 2210. This workshop is part of an IMLS grant titled Digging Deeper, Reaching Further: Libraries Empowering Users to Mine the HathiTrust Digital Library Resources that is developing a workshop curriculum that will be used to introduce participants to text mining and related issues, leverage tools and data from the HathiTrust Research Center, and empower librarians to become active research partners on digital projects at their institutions. Here are some of the exciting things you can expect to learn and become familiar with during this session:

  • Creating a HTRC workset and use it to run text analysis on a collection of works.
  • Web scraping and cleaning data
  • Learning about dirty OCR and clean OCR
  • Experience using Python for text mining.
  • Topic modeling


This workshop will run from 10:00 am to 4:00 pm with a one hour break for lunch (not included). In just a few hours you can open up a whole new world of possibilities in scholarly research that you can share with library users. All are encouraged to attend, and no experience is necessary!

Updates and room directions will be sent to registrants. Attendees can bring a laptop, or the classroom will have machines available.

Contact htrc_workshop@library.illinois.edu if you have questions.

Funded by IMLS RE-00-15-0112-15.

Organized by

The HathiTrust Research Center (HTRC) enables computational access for nonprofit and educational users to published works in the public domain and, in the future, on limited terms to works in-copyright from the HathiTrust.

The HTRC is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Library, to help meet the technical challenges of dealing with massive amounts of digital text that researchers face by developing cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge.

Leveraging data storage and computational infrastructure at Indiana University and the University of Illinois at Urbana-Champaign, the HTRC will provision a secure computational and data environment for scholars to perform research using the HathiTrust Digital Library. The center will break new ground in the areas of text mining and non-consumptive research, allowing scholars to fully utilize content of the HathiTrust Library while preventing intellectual property misuse within the confines of current U.S. copyright law. 

Sales Ended