Skip to Main Content

Data & Statistics Research Guide

Getting Started

This research guide is designed to help you strategizefind, cite, and work with data, statistics, and related software. 

On This Page:

What are data?

Data are a set of values of qualitative or quantitative variables collected together for reference or analysis. Unprocessed, or "raw data" often requires some amount of cleaning before being analyzed. Researchers in many disciplines, from chemists, to political scientists, to digital humanists, draw conclusions based upon the results of their data analysis.


In spreadsheet-style data, the variables are represented by columns and the records or cases are represented by rows. Interpretation or analysis of data requires proper documentation, which is often achieved through the creation of codebooks and data dictionaries. These outline the codes and values used to label observations. For example, there may be a variable for Gender in the dataset where '1' represents Male and '2' represents Female. This information would be listed in a codebook so that we can read and interpret the data in the Gender variable. Additionally, the columns may be named in shorthand, without documentation no one will know the significance of "health1" or "health2" and so on.

An example of data in SPSS with column headers: id, gender, state, age, health1, health2, health3, health4, health5, health6Source: NYU Data Services - Introduction to SPSS


Data come in many different file formats. Some files can only be opened by specific software while others can be opened by many programs. Selecting popular, open standards whenever possible will give you more independence from expensive software and a greater likelihood of being able to access your file many years in the future (e.g. .CSV instead of .XLS). See this guide from Stanford if you want to learn more.

 

Definition of Research Data from the Concordat on Open Research Data

Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical). These might be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, modelling, interview or other methods, or information derived from existing evidence. Data may be raw or primary (e.g. direct from measurement or collection) or derived from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set), or derived from existing sources where the rights may be held by others. Data may be defined as ‘relational’ or ‘functional’ components of research, thus signalling that their identification and value lies in whether and how researchers use them as evidence for claims. (See full text of the Concordat on Open Research Data.)

 

What are statistics?

Statistics are processed information obtained through mathematical calculations of the raw data.  They are often quick facts and figures presented in tables and charts without giving users much freedom to customize and calculate as they wish. 

In other words, statistics summarize the data. Examples of statistics would be graphical representations (like charts and tables) commonly seen in articles and popular media.

Example of statistics.

Source: Netflix in Statista

Open Educational Resources and Affordable Course Materials

Sources

Information for this research guide was adapted, with thanks, from:

NYU Data & Statistics Guide

MSU Data & Statistics Research Guide

GW Libraries • 2130 H Street NW • Washington DC 20052202.994.6558AskUs@gwu.edu