Skip to main content

Data Management

Data management at George Washington University

Documentation

When managing your data, it is useful to keep detailed documentation in appropriate folders. A few examples of documentation are:

  • Questionnaires or other survey instruments
  • Codebooks or data dictionaries
  • Computer code
  • Methodology
  • Consent forms
  • Quality assurance or activity tracking log
  • Additional supplementary materials

You do not need to archive all of these materials; however, as a researcher you will determine which documents are essential for re-use and secondary analysis. If you keep these documents clearly labeled in their folders, you will be able to assess and archive them quickly at the end of the project. Make sure to check with your archive or repository to determine which documentation is also required.

File Formats

For long-term preservation, access and reuse, consider formats that are non-proprietary, platform-independent, and widely used in the field of your research. It is also recommended that you consult your selected repository for preferred formats of data and documentation for archiving.  

Examples of commonly used formats:

  • Documentation files: XML, PDF/A
  • Geospatial data files: ESRI Shapefile (.shp, .shx, .dbf)
  • Image data files: TIFF, JPEG
  • Numeric data files: ASCII (.dat, .txt), tab-delimited (.csv), SAS (.sd2, .stc, .xpt), SPSS (.por, .sav), Stata (.dta)
  • Textual data files: ASCII (.txt) 
  • Video data files: MPEG-4 (.mp4) 

Folder and File Organization

At the beginning of a research project, it is best to determine where and how your files and folders will be organized on any project-related drives. You may create a basic README file or graphic to inform project team members about this organizational framework.

It is also essential to determine which team members should have access to specific files or folders and set those permission levels. You may restrict access to admin status, or lock folders and files with password-only access. This will keep unauthorized staff out of specific files or folders. If you need assistance with this, please contact the Division of Information Technology, or you project's IT personnel.

 

Example of Folder and File Organization

Example of Basic Folder and File Organization

File Naming & Versioning

You can keep track of the research project data files and documents by creating naming conventions and versioning policies. These policies will make sure that each file is unique so no one has to wonder which “final_version” data file is actually the final version. Before the project starts, it would be best to clearly specify naming conventions and versioning policies in a README file or spreadsheet. Inform all research project team members of the policies and show them where the document is in case they need to refer to it in the future.

Naming conventions are simply a set way for labeling each document and data file. Different methods for labeling can include:

  • underscores (ex: Project_Name)
  • camel-case (ex:ProjectName)
  • lowercase text with no spacing (ex: projectname)
  • or a combination of all three (ex: ProjectName_datafile1.dta)

Versioning policies specify the way in which you document changes to data files or documentation.  Versioning can be specified in many ways such as

  • version1, version2, version3
  • v1, v2, v3
  • version1_1, version1_2, version 1_3

An example of a naming convention with versioning could be as simple as   

  • ProjectName_datafile1_version1.dta
  • ProjectName_datafile1_codebook_version1.pdf

Quality Assurance

It might be necessary to create a document noting who worked with a specific data file, when they worked with the file, and what changes they may have made. This can help with the quality assurance and overall data collection as everyone will be able to see changes made to the data throughout the project’s lifetime.

An example of documenting file manipulations could be a extra spreadsheet listing:

Date

Name

File Manipulation

11/22/2014

John Smith

Added missing value labels to variable A, B, and C data

Metadata Standards

Metadata is documentation or information that describes the structure, content and layout of a data file. It can be presented in the form of a codebook that includes data types, column locations and coded values of each variable. It may also contain a frequency listing of variables, questionnaire, a description of study design, methodology, sampling, data collection and data quality. Robust metadata is essential for making raw data meaningful and reusable.

A metadata standard refers to a metadata structure organized in a consistent format for computer interpretation making data information searchable and retrievable. Repositories usually specify their metadata standards for archived data documentation. 

GW Libraries • 2130 H Street NW • Washington DC 20052202.994.6558AskUs@gwu.edu