Skip to Main Content

Data & Statistics Research Guide

Define a Research Question for Data

Start by defining your topic

Be specific about your topic so that you can narrow your search, but be flexible enough to tailor your needs to existing sources.

Identify the Unit of Analysis

This is what you should be able to define:

#1 - Who or What?

Social Unit: This is the population that you want to study.
It can be:

  • People: 
    • For example: individuals, couples, households
  • Organizations and Institutions:
    • For example: companies, political parties, nation states
  • Commodities and Things:
    • For example: crops, automobiles, arrests

#2 - When?

Time: This is the period of time you want to study.
Things to think about:

  • Point in time:
    • A "snapshot" or one-time study
  • Time Series:
    • Study changes over time
  • Current information:
    • Keep in mind that there is usually a time lag before data will be published.  The most current information available may be a couple years old.
  • Historical information

 #3 - Where?

Space: Geography or place.
There are two main types of geographic classifications:

  • Political boundaries 
    • For example: nation, state, county, school district, etc.
  • Statistical/census geography
    • For example: metropolitan statistical areas, tracts, block groups, etc.

Remember to define your topic with enough flexibility to adapt to available data!
Data are not available for every thinkable topic. Some data is hidden (behind a pay-wall for example), uncollected, unavailable. Be prepared to try alternative data.

Define a Research Question for Statistics

What do I need to know about my topic before I start looking for statistics?

No matter the subject statistics are limited by both time frame and geography.

Time:  Are you looking for information about a single point in time?  Do you want to look at changes over time?  Do you need historical information?  Current information?

Be prepared that the most current statistics may actually be a year or more old!  There can be multiple year lags before some information is released depending on how often the information is collected, the time it takes to process and crunch numbers, and the public release schedule.

GeographyGeographical areas can be defined by political boundaries (nations, states, counties, cities) or statistical boundaries (mainly Census geography such as metropolitan statistical areas, block groups, or tracts). 

Remember to define your topic with enough flexibility to adapt to available information!

Considerations When Searching

Questions to ask yourself when getting started:

  • Who would want these data or statistics?
    • Collecting data and creating statistics costs money. Most data collection and statistics are paid for by private organizations, marketers, governments, and advocacy groups.
    • Scientific datasets produced by researchers are often shared in association with publications, so looking for relevant articles can be a good place to start. Additionally, consider reviewing discipline specific data repositories.
  • Is this the most recent data/statistic?
    • It takes time to analyze, clean, and summarize data after it is collected, so most statistics are not real-time. Sometimes data and statistics can be a year or more old and still be the most recent available.
  • Is this the best source of this data/statistic?
    • If you are looking at data or statistics from a particular group, make sure to consider the source. For example, if the statistics come from an advocacy group, they may be biased toward the group's efforts. Look for the unprocessed data that support provided statistics, and when possible review the collection and processing methodology.
    • What is the purpose of the website providing access to the data? Many data science training sites exist for curious people to practice new skills, but the data sets on these sites is often inappropriate for research assignments. Look for data provided by trustworthy authorities on the subject area.
  • Am I reading this data/statistic right?
    • Make sure to read the data or statistic carefully. You don't want to misinterpret or accidentally misrepresent the data or statistic in your own research. If you have questions about the data, carefully read through any documentation or codebooks.

How to Search for Data and Statistics

Search strategies to try once you identify the focus of your analysis:

  • Identify producers/publishers of the data or statistics

    • Who might produce/capture these data/statistics? (ex: Businesses, governments, advocacy groups, academic institutions).

    • Many organizations provide online access to data they capture or produce either as downloadable datasets or via APIs (application program interfaces). If you need assistance using an API to download a dataset you have identified, check out our Programming & Software Development Consultation Services.

    • Nonprofit organizations, academic institutions, governments, etc. will publish data and statistics to inform the public about current trends, projections, new findings, etc. In the case of U.S. government data, you can usually find the data on their website or via a data portal like data.gov.

    • Not all data producers will make their data publicly available. Remember, data are expensive to collect or may contain sensitive information. In these instances, there may be access restrictions or data use agreements placed on the data. If you have questions about getting access to a specific dataset, please contact the Data Services Librarians at libdata@gwu.edu.

  • Search through a research guide

    • Gelman Library creates and maintains a variety of useful research guides on many subjects. If you know the subject area you want to focus on, try looking through a research guide on your topic.

    • See also, the subject list within this research guide for subject specific data resources.

    • For Health Sciences data and statistics, please look at Himmelfarb Health Sciences Library's Research Guides.

  • Search in a data archive or data portal

    • Data archives often focus on a specific discipline and can be a particularly good place to start to learn about data trends and popular topics in the discipline. They handle access, documentation, and preservation of data. For example, some data is publicly available for download while other datasets may not be available without a data use agreement.

    • Data portals provide access to multiple archives, databases, publications, and websites. They are usually multi-disciplinary and allow you to search across all of these platforms at one time. A very popular example is re3data.org

    • See also, the Data by Type page of this research guide.

  • Search in a statistics reference resource

    • Statistics reference resources pull together statistics from multiple sources and allow you to search across them in one location. Consider starting with the list of statistical resources within this guide.

  •  Follow the trail!

    • Read the literature in your discipline about your topic. Identify statistics and data referenced in these articles. Statistics come from data, so if you see a particular statistic that interests you, look for its source.

  • Ask for assistance

    • If you have any questions or need assistance, please contact us!

GW Libraries • 2130 H Street NW • Washington DC 20052202.994.6558AskUs@gwu.edu