This research guide is designed to help you strategize, find, cite, and work with data, statistics, and related software.
On This Page:
Data are a set of values of qualitative or quantitative variables collected together for reference or analysis. Unprocessed, or "raw data" often requires some amount of cleaning before being analyzed. Researchers in many disciplines, from chemists, to political scientists, to digital humanists, draw conclusions based upon the results of their data analysis.
In spreadsheet-style data, the variables are represented by columns and the records or cases are represented by rows. Interpretation or analysis of data requires proper documentation, which is often achieved through the creation of codebooks and data dictionaries. These outline the codes and values used to label observations. For example, there may be a variable for Gender in the dataset where '1' represents Male and '2' represents Female. This information would be listed in a codebook so that we can read and interpret the data in the Gender variable. Additionally, the columns may be named in shorthand, without documentation no one will know the significance of "health1" or "health2" and so on.
Source: NYU Data Services - Introduction to SPSS
Data come in many different file formats. Some files can only be opened by specific software while others can be opened by many programs. Selecting popular, open standards whenever possible will give you more independence from expensive software and a greater likelihood of being able to access your file many years in the future (e.g. .CSV instead of .XLS). See this guide from Stanford if you want to learn more.
Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical). These might be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, modelling, interview or other methods, or information derived from existing evidence. Data may be raw or primary (e.g. direct from measurement or collection) or derived from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set), or derived from existing sources where the rights may be held by others. Data may be defined as ‘relational’ or ‘functional’ components of research, thus signalling that their identification and value lies in whether and how researchers use them as evidence for claims. (See full text of the Concordat on Open Research Data.)
Statistics are processed information obtained through mathematical calculations of the raw data. They are often quick facts and figures presented in tables and charts without giving users much freedom to customize and calculate as they wish.
In other words, statistics summarize the data. Examples of statistics would be graphical representations (like charts and tables) commonly seen in articles and popular media.
Source: Netflix in Statista
Information for this research guide was adapted, with thanks, from: