Research Guides: Research Data Management: Recommended Practices

Documentation

When managing your data, it is useful to keep detailed documentation in appropriate folders. A few examples of documentation are:

Questionnaires or other survey instruments
Codebooks or data dictionaries
Computer code
Methodology
Consent forms
Quality assurance or activity tracking log
Additional supplementary materials

You do not need to archive all of these materials; however, as a researcher you will determine which documents are essential for re-use and secondary analysis. If you keep these documents clearly labeled in their folders, you will be able to assess and archive them quickly at the end of the project. Make sure to check with your archive or repository to determine which documentation is also required.

File Formats

For long-term preservation, access and reuse, consider formats that are open source, platform-independent, and widely used in the field of your research (see Directory of Metadata Standards). Whenever possible consult your selected repository for preferred formats of data and documentation for archiving.

Examples of commonly used formats:

Documentation files: XML, PDF/A
Geospatial data files: ESRI Shapefile (.shp, .shx, .dbf)
Image data files: TIFF, JPEG
Numeric data files: ASCII (.dat, .txt), tab-delimited (.tsv, .tab), comma separated (.csv), SAS (.sd2, .stc, .xpt), Stata (.dta)
Textual data files: ASCII (.txt, .md)
Video data files: MPEG-4 (.mp4)

Folder and File Organization

At the beginning of a research project, it is best to determine where and how your files and folders will be organized on any project-related drives. You may create a basic README file or graphic to inform project team members about this organizational framework.

It is also essential to determine which team members should have access to specific files or folders and set those permission levels. You may restrict access to admin status, or lock folders and files with password-only access. This will keep unauthorized staff out of specific files or folders. If you need assistance with this, please contact the your project's or school's IT personnel. If you need help identifying who to contact for support with this, contact a Data Services Librarian.

Example of Folder and File Organization

File Naming & Versioning

You can keep track of the research project data files and documents by creating naming conventions and versioning policies. These policies will make sure that each file is unique so no one has to wonder which “final_version” data file is actually the final version. Before the project starts, it would be best to clearly specify naming conventions and versioning policies in a README file or spreadsheet. Inform all research project team members of the policies and show them where the document is in case they need to refer to it in the future.

Naming conventions are simply a set way for labeling each document and data file. Different methods for labeling can include:

underscores (ex: Project_Name)
camel-case (ex:ProjectName)
lowercase text with no spacing (ex: projectname)
or a combination of all three (ex: ProjectName_datafile1.dta)

Versioning policies specify the way in which you document changes to data files or documentation. Versioning can be specified in many ways such as

version1, version2, version3
v1, v2, v3
version1_1, version1_2, version 1_3

An example of a naming convention with versioning could be as simple as:

ProjectName_datafile1_version1.dta
ProjectName_datafile1_codebook_version1.pdf

Quality Assurance & Quality Control

Quality assurance looks different in different fields, but following good practices in your discipline will improve data management and save your team time.

Quality assurance may include access control procedures, activity logging, "training activities, instrument calibration and verification tests, double-blind data entry, and statistical and visualization approaches to error detection. -Ten Simple Rules for Creating a Good Data Management Plan.

For example, it might be necessary to manually track who worked with specific data (electronic or even physical samples), when they touched the data, and what they did with it. In this case, creating and updating a tracking sheet like this may be necessary so that everyone will be able to see changes made to the data throughout the project’s lifetime:

Date	Name	File Manipulation
11/22/2017	John Smith	Added missing value labels to variable A, B, and C data

Metadata Standards

Metadata is documentation or information that describes the structure, content and layout of a data file. It can be presented in the form of a codebook that includes data types, column locations and coded values of each variable. It may also contain a frequency listing of variables, questionnaire, a description of study design, methodology, sampling, data collection and data quality. Robust metadata is essential for making raw data meaningful and reusable.

A metadata standard refers to a metadata structure organized in a consistent format for computer interpretation making data information searchable and retrievable. Repositories usually specify their metadata standards for archived data documentation.

Data Documentation Initiative (DDI)
DDI in XML format is a commonly used standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. The DDI metadata specification supports the entire research data life cycle.
Digital Curation Center's Disciplinary Metadata
Search for the best metadata standard by discipline: Biology, Earth Science, Physical Science, Social Science, Humanities and General Research.
Metadata Directory from the Research Data Alliance
A directory of metadata standards organized by discipline managed by the Metadata Standards Directory Working Group of the Research Data Alliance.