Skip to Main Content

Analyzing Text Data

Language Corpora

Language corpora are a subset of text corpora, which are collections of texts stored electronically and used for statistical, computational analysis, testing, and training algorithms. The term "language corpus" can mean any set of linguistic data, whether it's written, spoken, signed, or in multiple forms. It can also refer to collections specifically gathered for a particular purpose, such as characterizing languages.

GW Libraries • 2130 H Street NW • Washington DC 20052202.994.6558AskUs@gwu.edu