Skip to main content

Data Management

Data management at George Washington University

Documentation

When managing your data, it is useful to keep detailed documentation in appropriate folders. A few examples of documentation are:

  • Questionnaires or other survey instruments
  • Codebooks or data dictionaries
  • Computer code
  • Methodology
  • Consent forms
  • Quality assurance or activity tracking log
  • Additional supplementary materials

You do not need to archive all of these materials; however, as a researcher you will determine which documents are essential for re-use and secondary analysis. If you keep these documents clearly labeled in their folders, you will be able to assess and archive them quickly at the end of the project. Make sure to check with your archive or repository to determine which documentation is also required.

File Formats

For long-term preservation, access and reuse, consider formats that are open source, platform-independent, and widely used in the field of your research (see Directory of Metadata Standards). Whenever possible consult your selected repository for preferred formats of data and documentation for archiving.  

Examples of commonly used formats:

  • Documentation files: XML, PDF/A
  • Geospatial data files: ESRI Shapefile (.shp, .shx, .dbf)
  • Image data files: TIFF, JPEG
  • Numeric data files: ASCII (.dat, .txt), tab-delimited (.tsv, .tab), comma separated (.csv), SAS (.sd2, .stc, .xpt), Stata (.dta)
  • Textual data files: ASCII (.txt, .md) 
  • Video data files: MPEG-4 (.mp4) 

Folder and File Organization

At the beginning of a research project, it is best to determine where and how your files and folders will be organized on any project-related drives. You may create a basic README file or graphic to inform project team members about this organizational framework.

It is also essential to determine which team members should have access to specific files or folders and set those permission levels. You may restrict access to admin status, or lock folders and files with password-only access. This will keep unauthorized staff out of specific files or folders. If you need assistance with this, please contact the Division of Information Technology, or you project's IT personnel.

 

Example of Folder and File Organization

Example of Basic Folder and File Organization

File Naming & Versioning

You can keep track of the research project data files and documents by creating naming conventions and versioning policies. These policies will make sure that each file is unique so no one has to wonder which “final_version” data file is actually the final version. Before the project starts, it would be best to clearly specify naming conventions and versioning policies in a README file or spreadsheet. Inform all research project team members of the policies and show them where the document is in case they need to refer to it in the future.

Naming conventions are simply a set way for labeling each document and data file. Different methods for labeling can include:

  • underscores (ex: Project_Name)
  • camel-case (ex:ProjectName)
  • lowercase text with no spacing (ex: projectname)
  • or a combination of all three (ex: ProjectName_datafile1.dta)

Versioning policies specify the way in which you document changes to data files or documentation.  Versioning can be specified in many ways such as

  • version1, version2, version3
  • v1, v2, v3
  • version1_1, version1_2, version 1_3

An example of a naming convention with versioning could be as simple as   

  • ProjectName_datafile1_version1.dta
  • ProjectName_datafile1_codebook_version1.pdf

Quality Assurance & Quality Control

Quality assurance looks different in different fields, but following good practices in your discipline will improve data management and save your team time.

Quality assurance may include access control procedures, activity logging, "training activities, instrument calibration and verification tests, double-blind data entry, and statistical and visualization approaches to error detection." - Ten Simple Rules for Creating a Good Data Management Plan.

For example, it might be necessary manually track who worked with specific data (electronic or even physical samples), when they touched the data, and what they did with it. In this case, creating and updating a tracking sheet like this may be necessary so that everyone will be able to see changes made to the data throughout the project’s lifetime:

Date

Name

File Manipulation

11/22/2017

John Smith

Added missing value labels to variable A, B, and C data

Metadata Standards

Metadata is documentation or information that describes the structure, content and layout of a data file. It can be presented in the form of a codebook that includes data types, column locations and coded values of each variable. It may also contain a frequency listing of variables, questionnaire, a description of study design, methodology, sampling, data collection and data quality. Robust metadata is essential for making raw data meaningful and reusable.

A metadata standard refers to a metadata structure organized in a consistent format for computer interpretation making data information searchable and retrievable. Repositories usually specify their metadata standards for archived data documentation. 

GW Libraries • 2130 H Street NW • Washington DC 20052202.994.6558AskUs@gwu.edu