• You are here:
  • Home /
  • Deposit data /
  • Preparing your datasets for deposit
  • Preparing your datasets for deposit

    As part of the deposit process we will agree with you the format and structure of your data and a handover date. Ensuring that your data are correct and well-formatted will help to speed up the process.

    Format

    Data provided to the EIDC should normally be in a non-proprietary format (e.g. .csv(s) rather than an Excel workbook) 

    We maintain a list of acceptable formats, however the list is not exhaustive and we will consider other formats on a case-by-case basis.

    Filenames

    • Try to keep filenames short
    • Do not use spaces and special characters (e.g. $*@%)
    • Whenever possible, filenames should be meaningful and reflect the content
    • If you have multiple, related files it's a good idea to be consistent and use a relevant naming convention

    Examples

    1486Xiuytr.csv
    This doesn't tell us anything about the data

    Site location data from the UK Butterfly Monitoring Scheme 2011.csv
    This is very long and contains spaces

    ukbmsLocationData2011.csv
    This is descriptive, short and contains no spaces or special characters

    Content

    • Variable names should be unique, short and (preferably) meaningful
    • If your data is tabular, the variable names should be in the first row (and only the first row).
    • Avoid spaces and special characters (e.g. $*@/, ) in variable names
      • Best practice is to use only alphanumeric characters, underscores (_) and hyphens (-)
    • Remove any variables which are are not important for re-using the data (e.g. created for admin or internal purposes)
    • Ensure that any missing values are handled consistently throughout the data
    • Using codes and abbreviations in the data is often very useful. However, if you DO use them:
      • ensure they are unique (within the dataset) and used consistently
      • ensure that there are no unexplained codes in the data - they should all be described in accompanying metadata
      • ensure that any explanations are applicable to the data (e.g. the metadata states that "t = trace", but t doesn't occur in the data)

     

    csv_format_bad.png

     

    csv_format_good.png

    Anonymity and data security

    • Ensure that data are anonymised where needed and cannot be linked to any identifiable person
    • Consider anonymising site location data where this is necessary for the safety of the site, equipment or future research
    • Where data are derived from existing data, check if permission needs to be obtained from the data owner

    Quality

    • When converting data for deposit, ensure that all data and metadata are correct after conversion
    • Confirm that data detail is consistent with the access and licensing agreements as stated
    • Complete all internal consistency checks BEFORE offering your data for deposit
    • Resolve any data issues and ensure data are complete BEFORE deposit, to minimise the risk of further deposit(s) being necessary

     

    If you have any queries or are unsure about the suitability of your dataset(s) for deposit, we'll be happy to discuss it with you.  Please contact us.