Research Guides: Data & Statistics Research Guide: Getting Started

Getting Started

This research guide is designed to help you strategize, find, cite, and work with data, statistics, and related software.

On This Page:

What are data?

Data are a set of values of qualitative or quantitative variables collected together for reference or analysis. Unprocessed, or "raw data" often requires some amount of cleaning before being analyzed. Researchers in many disciplines, from chemists, to political scientists, to digital humanists, draw conclusions based upon the results of their data analysis.

In spreadsheet-style data, the variables are represented by columns and the records or cases are represented by rows. Interpretation or analysis of data requires proper documentation, which is often achieved through the creation of codebooks and data dictionaries. These outline the codes and values used to label observations. For example, there may be a variable for Gender in the dataset where '1' represents Male and '2' represents Female. This information would be listed in a codebook so that we can read and interpret the data in the Gender variable. Additionally, the columns may be named in shorthand, without documentation no one will know the significance of "health1" or "health2" and so on.

Source: NYU Data Services - Introduction to SPSS

Data come in many different file formats. Some files can only be opened by specific software while others can be opened by many programs. Selecting popular, open standards whenever possible will give you more independence from expensive software and a greater likelihood of being able to access your file many years in the future (e.g. .CSV instead of .XLS). See this guide from Stanford if you want to learn more.

Definition of Research Data from the Concordat on Open Research Data

Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical). These might be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, modelling, interview or other methods, or information derived from existing evidence. Data may be raw or primary (e.g. direct from measurement or collection) or derived from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set), or derived from existing sources where the rights may be held by others. Data may be defined as ‘relational’ or ‘functional’ components of research, thus signalling that their identification and value lies in whether and how researchers use them as evidence for claims. (See full text of the Concordat on Open Research Data.)

What are statistics?

Statistics are processed information obtained through mathematical calculations of the raw data. They are often quick facts and figures presented in tables and charts without giving users much freedom to customize and calculate as they wish.

In other words, statistics summarize the data. Examples of statistics would be graphical representations (like charts and tables) commonly seen in articles and popular media.

Source: Netflix in Statista

Open Educational Resources and Affordable Course Materials

Data Journalism Handbook
A book to improving your data literacy.
Open Textbook Collections
A guide to exploring open textbook options as well as other open education resources for GW faculty.
MIT OpenCourseWare
A web-based publication of virtually all MIT course content, OpenCourseWare is open, available globally, and a permanent MIT activity.
PhET Interactive Simulations
Free, online interactive simulations and activities for teaching and learning science, technology, engineering, and math concepts
School of Data
Free, online courses to increase your data literacy
StatTrek
Free, online modules to teach yourself statistics

Sources

Information for this research guide was adapted, with thanks, from:

NYU Data & Statistics Guide

MSU Data & Statistics Research Guide