91精品黑料吃瓜

Comprehensive guide on how to better name and organize your data

Best practices in file naming

RDM file naming
Image: Laure Perrier, University of Toronto
  • Avoid using spaces, dots and special characters (& or ? or !)
  • Use hyphens (-), underscores (_), or capitalization (C) to separate elements in a file name
  • Include an abbreviation in the file name to identify
    • the instrument used
    • the phase (if research constitutes multiple phases)
    • the transformation phase (i.e. original, raw, compressed, digitized, recoded, restructured, cleaned)
    • the source of third-party data (data provider or principle investigator) (i.e.  YinGiordanoAntarcticImagesSTCCensus2016)
    • the team (if working with multiple teams)
    • the language (if working with multiple languages)
  • Include  within file names as appropriate
     

Examples of good file names

  • MCIM_Proposal_0.9.doc
  • PressHouseUserManual-01.02.doc
     

File versioning

Versioning is used to ensure you are not working on outdated versions of files (or documents or datasets or records, etc.), particularly in collaborative work.

  • Include version information in both the file name, and in the document itself
  • Use sequential numbering (i.e. 0.1, 0.2, 0.3 鈥) for draft until a final version is reached
  • Number the final version 1.0. If the final version is revised, number as 1.1, 1.2, 1.3 until a version 2.0 is completed. Continue in this fashion.
  • Use versioning software, such as ; or follow a  (PDF) and create a version control table that details the version number, person responsible for the change, purpose or nature of the change, date of the change
Version numberAuthorPurpose/ChangeDate
0.1S.Smith, Post-DocInitial draft01/01/2016
0.2F.Hill, Post-DocChanges to conclusion01/07/2016
0.3G.Joe, PIChanges to introduction01/12/2016

Best practices in folder structure

Folder organization
Image: Laure Perrier, University of Toronto
  • Restrict level of folders to three or four deep
  • Consider limiting the number of folders within each folder, to ten
  • Include a folder within the folder structure for 鈥渄ocumentation鈥. This might include:
    • Project proposals/protocols
    • Consent and approval forms
    • Methodology documents
    • Data management plan
    • Code used for recodes, analysis, and outputs
    • Readme files with transformation information
    • Readme files with the full names or titles for any abbreviations used in file names
    • Codebooks or guides

Preferred long-term file format qualities

File formats can help ensure long-term access and share-ability of your data. Once data analysis is complete using the software and formats most suited to the planned analysis, consider converting files into stable, open file formats for long-term storage.

  • Non-proprietary
  • Lossless
  • Standard for the field or in common usage by the research community
    • Include data labels (metadata)
  • Adheres to an open, documented standard
    • Interoperable among diverse platforms and applications
    • Fully published and available royalty-free
    • Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology
    • Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.

Examples of preferred file formats

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV, POR
  • Geospatial: SHP (with SHX and DBF), DBF, GeoTIFF, NetCDF, e00
  • Moving Images: MKV, MPEG, AVI
  • Sounds: FLAC, WAVE, AIFF, MP3
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still Images: TIFF, PDF/A, PNG, GIF
  • Text: PDF/A, ASCII, TEI XML
  • Web Archive: WARC

Further reading

UBC Research Data Management (2019) 

Library and Archives Canada. (2014). 

chat loading...