Skip to Main Content

Research Data Management: Data Collection and Organization

Learn how to write a successful data management plan according to funding agency requirements.

Data Collection and Organization To Do List

 List and define variables

 Choose file naming conventions   

 Save raw data without processing

Establish Researcher Responsibilities

  • collecting data
  • entering data
  • validating data
  • analyzing data
  • ensuring data is securely stored   

Data Dictionary

Document the following information:

  • variable name
  • variable type
  • variable definition 
  • possible values for the variable
  • explanations of how variables might relate to other variables

Data Collection Best Practices

Collecting variables in their rawest, most precise form provides the greatest flexibility when it comes time to analyze the data.

  • Determine whether there is a data standard for topic of study

  • Clearly define variables to avoid confusion

  • Avoid categorizing variables -- collect them as precisely as possible

  • Avoid collecting compound variables -- collect data in its rawest form

  • Avoid ambiguous field names

  • Document all variables in a table or spreadsheet, known as a data dictionary

Data Files Best Practices

  • Always include the same information in the same order for each file related to the same project

  • Avoid using special characters, e.g. $!@

  • Avoid using periods and spaces

  • File names should include letters, numbers, and underscores only

  • Reserve the three-letter file extensions for codes the system assigns to the file type, e.g. csv, tif

  • Use a directory hierarchy

  • Use YYYYMMDD format for date 

  • Use hhmmssTZD format for time (TZD = time zone)

  • Use Last Name, First and Middle initial format for names

    • Example: Davis, J. P. 

  • For analyzed or processed data, include descriptions and references to software or code used

  • Save raw data with no processing in a separate file
  • Use README files
  • Create a data dictionary that explains terms used in files and folders
    • Example: units of measurements and how they are collected  

Readme files: general project files describing the overall organization, responsible parties, instruments, etc. related to the research data

  • Save as TXT files or whatever long-term file format fits with the data
  • Include the following information as applicable:
    • Who is creating the record
      • If there are multiple people, explain each person’s role
    • Who owns the data
    • What was done (by each person)
    • When was the work done, clearly stating month, date, and year
    • Where it was done (use standard GIS formats if necessary)
    • Why it was done
    • What project the research is related to
    • How you did it (including the methodology)
    • What materials were used (e.g., reagents, surveys)
    • Links to locations of any related data
    • Coding conventions used, for example, characters used for missing data or null sets, categories, classifications, acronyms, and annotations
    • List of folders that relate to the project
    • What could be done next

Community of Science Registered Reports

Registered Reports is a publishing format that emphasizes the importance of the research question and the quality of methodology by conducting peer review prior to data collection. High quality protocols are then provisionally accepted for publication if the authors follow through with the registered methodology.

By doing so, Registered Reports eliminates questionable research practices, such as low statistical power, selective reporting of results, and publication bias.

undefined