• You are here:
  • Home /
  • Help /
  • Depositing data /
  • Suitable formats for data and supporting documentation
  • Suitable formats for data and supporting documentation

    Data

    The preferred format for deposit, long-term storage and accessibility via view and download services will depend on the original format of the data.

    For longevity the preferred format for depositing tabular data is comma separated value (.csv ) format. Many commonly used file types can easily be saved in this format, although there are some considerations when doing this. More detailed information on converting data files to .csv are available below.

    Guidance look-up table for file formats:

    Note this table is periodically updated and maintained. Any formats not listed can be considered by the Data Centre on a case-by-case basis.
    Original formatPreferred EIDC formatNotes
    comma separated values .csv File must open in commonly available software that reads CSV e.g. MS Excel
    text .txt File must open in commonly available software e.g. Notepad
    MS Excel .csv File must open in commonly available software that reads CSV e.g. MS Excel
    MS Access .csv File must open in commonly available software that reads CSV e.g. MS Excel
    MS Word .rtf File must open in commonly available software e.g. Wordpad
    Oracle .csv Key datasets output as CSV and must open in commonly available software that reads CSV e.g. MS Excel
    SQL Server .csv Key datasets output as CSV and must open in commonly available software that reads CSV e.g. MS Excel
    NetCDF no change File must open in two netCDF-capable applications (e.g. FME, QGIS) without additional transformation
    Shapefile no change File must open in common GIS e.g. ArcGIS
    Personal geodatabase (.mdb) no change Must open in common GIS e.g. ArcGIS
    File geodatabase (.gdb folder) no change (Preferred for spatial data.) A zipped .gdb folder is required so that none of the contents are lost. File geodatabases are more efficient for storage. Must open in common GIS e.g. ArcGIS
    ArcInfo coverage .e00 ArcInfo coverages are an old folder based format. ArcInfo export (.e00) is preferred. Must open in common GIS e.g. ArcGIS
    ArcInfo Export no change Must open in common GIS e.g. ArcGIS
    ArcGIS SDE database .gdb A zipped .gdb folder is required so that none of the contents are lost. File geodatabases are more efficient for storage. Must open in common GIS e.g. ArcGIS
    Raster Data no change tiff, jpg, png, gif, etc. Must be accompanied by appropriate geo-referencing information such as .jpw or .tfw.
    ArcInfo Grid .ascii or .asc ArcInfo Grid export file. Must open in common GIS e.g. ArcGIS
    SAS .csv File must open in commonly available software that reads CSV e.g. MS Excel
    Minitab .csv File must open in commonly available software that reads CSV e.g. MS Excel
    NASA Ames .nc or .csv File must open in commonly available software (e.g. MS Excel for .csv or two applications such as FME, QGIS for .nc) without additional transformation
    MATLAB binary file .csv File must open in commonly available software that reads CSV e.g. MS Excel
    STL no change File must open in commonly available mesh rendering software e.g. Meshlab
    FASTA no change

    File must open in commonly available software e.g. Notepad

    Portable Document Format .rtf or .txt

    Wherever practically possible, files should be supplied as .rtf or .txt and open in commonly available software e.g. Wordpad (.rtf) or Notepad (.txt).

    Why CSV?

    Comma separated value (.csv) files are the preferred format for depositing your tabular datasets as they are proven to be robust and future-proof, allowing reading and viewing of the data through a wide variety of common software tools and conversion to many common formats. This file type has been used since the 1970's and is possibly the most widely used standard for datasets in such circumstances.
    Saving your datasets in csv format may take a few extra minutes but it facilitates re-use and will help prevent obsolescence of data due to the format used for its storage.

    Supporting documentation

    It is EIDC policy that supporting documentation will be made available as a separate, linked on-line resource accessible via the catalogue record for a resource.

    One of the main reasons for separating supporting documentation from data is that the EIDC is committed to a rolling programme of review and improvement of metadata, in order to make resources easier to find and easier to re-use. The data, conversely, must remain unchanged whilst under the custodianship of the Data Centre. Enabling access to supporting documentation separately from data also permits users to make an informed decision about whether the data resource meets their requirements prior to actually placing an order for or downloading a copy of the data resource itself, some of which may contain large volumes of data. 

    With some data formats, the metadata is inextricably embedded with the data. In such situations the Data Centre may extract as much metadata as possible to a separate supporting document, which can then be added to, or otherwise enhanced.

    Guidance look-up table for supporting documentation file formats

    Original formatExamplesPreferred EIDC formatNotes
    Simple Tabular

    Microsoft (xls, xlsx)

    csv

    csv csv provides high maintainability, longevity and ease of access.
    Plain Text txt txt txt provides high maintainability, longevity and ease of access.
    Related tables Microsoft (mdb) multiple csv files If appropriate, relational metadata can be denormalised into a single table - however, a normalised set of tables is more maintainable in the situation where error corrections or potential improvements are identified at a later date.
    Rich Text

    Microsoft (doc, docx)

    Portable Document Format (pdf)

    OpenDocument Text Document (odt)

    rtf

    Whilst a proprietary format, rtf is preferred for its simplicity, ease of maintenance, widespread acceptability, and choice of available editing tools.

    Metadata in pdf format cannot easily be maintained. It can be accepted reluctantly if there are no other options.

    Slide Show

    Microsoft (ppt, pps, ppsx)

    OpenDocument Presentation Format (odp)

    png / pdf Slide presentations are treated as essentially "non-maintainable" metadata. As such it is important to capture meta-metadata such as the author and date published, so that future users can read the information with knowledge of its original context. Sometimes slideshow authors embed such information in the presentation itself, but not always.
    Hierarchic Extensible Markup Language (xml) xml xml provides high maintainability, longevity and ease of access.
    Special Network Common Data Form (NetCDF)(nc) ncml ncml provides an xml encoded version of the embedded metadata.

    If your supporting documentation is in another format than those listed above, please contact us for advice.