Data & Statistics Research Guide

Getting Started

This research guide is designed to help you strategizefind, cite, and work with data, statistics, and related software. 

What are data?

Data are a set of values of qualitative or quantitative variables collected together for reference or analysis. Unprocessed, or "raw data" often requires some amount of cleaning before being analyzed. Researchers in many disciplines, from chemists, to political scientists, to digital humanists, draw conclusions based upon the results of their data analysis.

In spreadsheet-style data, the variables are represented by columns and the records or cases are represented by rows. Interpretation or analysis of data requires proper documentation, which is often achieved through the creation of codebooks and data dictionaries. These outline the codes and values used to label observations. For example, there may be a variable for Gender in the dataset where '1' represents Male and '2' represents Female. This information would be listed in a codebook so that we can read and interpret the data in the Gender variable. Additionally, the columns may be named in shorthand, without documentation no one will know the significance of "health1" or "health2" and so on.

An example of data in SPSS with column headers: id, gender, state, age, health1, health2, health3, health4, health5, health6Source: NYU Data Services - Introduction to SPSS

Data come in many different file formats. Some files can only be opened by specific software while others can be opened by many programs. Selecting popular, open standards whenever possible will give you more independence from expensive software and a greater likelihood of being able to access your file many years in the future (e.g. .CSV instead of .XLS). See this guide from Stanford if you want to learn more.

What are statistics?

Statistics are processed information obtained through mathematical calculations of the raw data.  They are often quick facts and figures presented in tables and charts without giving users much freedom to customize and calculate as they wish. 

In other words, statistics summarize the data. Examples of statistics would be graphical representations (like charts and tables) commonly seen in articles and popular media.

Example of statistics.

Source: Netflix in Statista

