How you choose to name your data files has a large impact on your ability to find and understand those files later on. File names should remain consistent, logical, and descriptive in order to maximize accessibility and findability. File names may contain information such as project acronym, study title, location, investigator, year(s) of study, data type, version number, and file type.
When choosing a file name, check for any database management limitations on file name length and use of special characters. Also, in general, lower-case names are less software and platform dependent. Avoid using spaces and special characters in file names, directory paths and field names. Instead, consider using underscore ( _ ) or dashes ( - ) to separate meaningful parts of file names. Avoid $ % ^ & # | : and similar.
Example of Descriptive File Name:
Sevilleta_LTER_NM_2001_NPP.csv
When organizing these data files together, the directory top-level folder should include the project title, unique identifier, and date (year).
Source: Adapted from DataONE
The file formats you choose now will affect your own ability to open the data in the future as well as other's ability to access the data.
Using non-proprietary (open) file formats will maximize access to the data and are more sustainable for the future. Consider migrating your data into a open format in addition to keeping a copy in the original software format. If it is necessary to use a proprietary file format, make sure to include the name and version of software used to generate the file, as well as the company who made the software in a readme.txt.
File formats should also be:
Preferred file formats include:
Data repositories treat file formats differently so make sure to research what your chosen archive accepts. Note that not all repositories are able to migrate data files to new file formats for preservation.
Versioning refers to saving new copies of your files when you make changes so that you can go back and retain specific versions of your files later. This is especially useful in collaborations so researchers in various teams know that changes have been made. Versioning additionally allows you to decide later that you prefer an earlier version of the data rather than retracing your misteps.
One of the simplest way to version is to manually save new versions each time you make significant changes. This method is best used when only one person is working on the files and few versions are needed.
This file sharing software records version changes.
Drive's word processing, spreadsheet, and presentation software automatically create versions as you edit.
Git offers a free and open source distributed version control system with more features than the above options. Unlike the previous software options, Git is designed specifically for managing version tracking.
This site is maintained by Pollak Library.
To report problems or comments with this site, please contact
libraryanswers@fullerton.edu.
© California State University, Fullerton. All Rights Reserved.
CSUF is committed to ensuring equal accessibility to our users. Let us know about any accessibility
problems you encounter using this website.
We'll do our best to improve things and get you the information you need.