Skip to Main Content

Analyzing Text Data

Finding Text Data in Library Databases

 

A flow chart of whether you can scrape a library resource for text data. If there is no dedicated API or terms of service, contact a librarian.

 

The Library's databases do not allow web scraping due to license agreements with the publishers. However, you may be able to collect data from text sources within the databases, as long as you are using the data exclusively for academic purposes. Each publisher has its own terms, conditions, and copyright provisions, which should be followed at all times. Please contact us if you have questions about using the Library's databases as a data source for your research.

If you are considering collecting data from Library databases, please keep in mind:

  • Some publishers will require you to use tools they provide to mine their content, or will do the research for you. In this way, they can manage the data being accessed and the impact on their servers.

  • Downloading large amounts of data can trigger automatic lockouts and prevent access to resources by other users. The Library or the user can also be fined for unauthorized use of the databases. (Yes, they really do follow up on this!)

 

Definitions

  • Text mining is an umbrella term for using computer programs and algorithms to dig through large amounts of text, like books, articles, websites, or social media posts, to find valuable and hidden information. As such, it refers to the methods applied to a corpus of textual data, rather than to the methods of obtaining such data.

  • API's, in this context, refer to a means of obtaining data (textual or otherwise) in an efficient and automated fashion, though they typically require some programming knowledge to use.

  • Bulk downloading refers to a means of obtaining large quantities of data from the database's public user interface, either automatically through a feature of the interface specifically provided for that purpose, or manually (i.e., by downloading many separate batches of results). Individual databases vary in how much a user can download at one time, please read the guidelines or consult with a librarian before bulk downloading.

  • Web scraping refers to automating the extraction of data from a public-facing website. Web scraping, unlike bulk downloading, requires some programming knowledge. This is not allowed in most library databases.

Library Databases

GW Libraries • 2130 H Street NW • Washington DC 20052202.994.6558AskUs@gwu.edu