Research Guides: Analyzing Text Data: Library Databases

Finding Text Data in Library Databases

The Library's databases do not allow web scraping due to license agreements with the publishers. However, you may be able to collect data from text sources within the databases, as long as you are using the data exclusively for academic purposes. Each publisher has its own terms, conditions, and copyright provisions, which should be followed at all times. Please contact us if you have questions about using the Library's databases as a data source for your research.

If you are considering collecting data from Library databases, please keep in mind:

Some publishers will require you to use tools they provide to mine their content, or will do the research for you. In this way, they can manage the data being accessed and the impact on their servers.
Downloading large amounts of data can trigger automatic lockouts and prevent access to resources by other users. The Library or the user can also be fined for unauthorized use of the databases. (Yes, they really do follow up on this!)

Definitions

Text mining is an umbrella term for using computer programs and algorithms to dig through large amounts of text, like books, articles, websites, or social media posts, to find valuable and hidden information. As such, it refers to the methods applied to a corpus of textual data, rather than to the methods of obtaining such data.
API's, in this context, refer to a means of obtaining data (textual or otherwise) in an efficient and automated fashion, though they typically require some programming knowledge to use.
Bulk downloading refers to a means of obtaining large quantities of data from the database's public user interface, either automatically through a feature of the interface specifically provided for that purpose, or manually (i.e., by downloading many separate batches of results). Individual databases vary in how much a user can download at one time, please read the guidelines or consult with a librarian before bulk downloading.
Web scraping refers to automating the extraction of data from a public-facing website. Web scraping, unlike bulk downloading, requires some programming knowledge. This is not allowed in most library databases.

Library Databases

Academic Search Complete This link opens in a new window
This academic multi-disciplinary database provides than 8,500 full-text periodicals, including more than 7,300 peer-reviewed journals. In addition, it offers indexing and abstracts for more than 12,500 journals and a total of more than 13,200 publications including monographs, reports, conference proceedings, etc. Coverage spans virtually every area of academic study and offers information dating as far back as 1887.
HathiTrust This link opens in a new window
HathiTrust is a partnership of academic and research institutions, offering a collection of millions of titles digitized from libraries around the world. To log in, select The George Washington University as your institution, then log in with your UserID and regular GW password.
Nexis Uni This link opens in a new window
Formerly LexisNexis Academic. Access to major newspapers from around the world, as well as: industry and market news; company financial information; general medical topics; accounting, auditing, and tax information; legal news, law reviews, and case law; and the U.S. and state codes. Please use Google Chrome or IE browsers with this database.
Oxford English Dictionary This link opens in a new window
Access to the definitive dictionary of the English language. Revisions and additions ongoing.
ProQuest Central This link opens in a new window
Multidisciplinary database covering both scholarly sources and popular content. Includes the contents of ProQuest Research Library, ABI-Inform, and U.S. and International Newsstreams, as well as various ProQuest disciplinary collections.
TDM Studio This link opens in a new window
Text and data mining solution allowing TDM analysis of most ProQuest subscriptions. Access a wealth of content across disciplines including newspapers, dissertations and theses, journals, and primary sources. Support both teaching and research with a Python and R Jupyter coding interface as well as pre-configured visualizations. Researchers must create a free account with their GW email to access TDM Studio.
Web of Science (WOS) This link opens in a new window
Includes several citation indices covering sciences, social sciences, arts, and humanities. Search by a specific index, or across all indices. Citations to articles in more than 8,000 major research journals. Also permits cited reference searching (searching for articles that cite a particular author or work).

JSTOR Data for Research
As part of our mission to support new forms of scholarship, JSTOR’s Data for Research (DfR) program accommodates text analysis and digital humanities research by providing datasets for the journals, books, research reports, and pamphlets in the digital library.