In this page:
- Benefits of sharing data
- When not to share data
- Options for sharing data
- Where to deposit data
- Preparing data for deposit
- Citing data
Benefits of sharing data
Research data is a valuable resource that can often be put to significant use beyond its original purpose. Sharing data brings benefits to you as a researcher, and to the scholarly community and society.
The UK Data Service guide Why share data? lists the following benefits.
- increases the impact and visibility of research
- provides credit to the researcher as a research output in its own right
- leads to new collaborations between data users and data creators
- encourages scientific enquiry and debate
- promotes innovation and potential new data uses
- maximises transparency and accountability
- enables scrutiny of research findings
- encourages the improvement and validation of research methods
- provides great resources for education and training
- reduces the cost of duplicating data collection
Sharing data also helps you meet requirements from:
- research funders that increasingly view research data as a public good, which should be openly available with as few restrictions as possible. Many funders have adopted data sharing policies and the DCC (Digital Curation Centre) offers and overview of funders’ data policies.
- publishers that may require data underpinning publications to be made available as a condition of publishing.
- requests arising under the Freedom of Information Act (FoI) and Environmental Information Regulations (EIR). For advice on FoI and EIR requests, contact the Records Management Office.
When not to share data
It may not be possible to share data due to:
- legal requirements – to comply with the Data Protection Act. Guidance is available from the University’s Data Protection website and data protection online course, and further advice is available from the Records Management Office.
- ethical concerns – if your data includes sensitive or confidential data where no consent for sharing has been given. For guidance, see the University’s research ethics pages.
- licence restrictions – if you are using data owned by third parties and don’t have the rights to share.
- commercial value - if your data has financial value or potential for patents. For guidance contact UMIP (The University of Manchester Intellectual Property).
Nonetheless, the UK Data Service guide Legal and ethical issues notes that “Much research data – even sensitive data - can be shared ethically and legally if researchers employ strategies of informed consent, anonymisation and controlling access to data.”
Options for sharing data
- Deposit data with a data repository or specialist data centre (see 'where to deposit data' in this guidance)
- Submit data alongside an associated publication (e.g. journal article) or publish a data paper
- Make data available via a project or institutional website
- Share data with research collaborators, for example via:
- ZendTo enables transfer of files up to 20GB, in and out of the University;
- Shared areas for secure storage and sharing of files amongst University staff and postgraduate researchers;
- research collaboration services enable collaboration between researchers both internal and external to the University.
Where to deposit data
To make your data accessible, you can deposit your data with a specialist data repository or data centre that provides a managed environment for preserving and sharing your data.
Some research funders expect data to be deposited in specific data centres e.g. ESRC and NERC support dedicated data centres that take in data from funded projects. Similarly, consider whether any collaborative agreements with your research partners include requirements for data deposit.
Discipline-specific data repositories are also available for a variety of subject disciplines e.g. the Archaeology Data Service, and GenBank for genetic sequences.
You can search for a suitable repository for your data using re3data.org, which is a registry of research data repositories. For example, re3data.org can help identify whether a repository:
- provides open, restricted or closed access to its data
- uses persistent identifiers such as a DOI (Digital Object Identifier) to make data persistent, unique and citable
- supports a repository standard or has been certified
If no discipline-specific repository is available for your research community, then consider general-purpose data repositories, which make data publication as easy as possible for authors by publishing data sets with minimal validation
- The University's first recommendation is the Mendeley Data repository. When you deposit data in Mendeley Data make sure you select "University of Manchester", or your particular school/faculty in the institutions field.
- Other general-purpose data repositories are also available, e.g. Zenodo and Figshare.
When choosing where to deposit your data you should bear in mind relevant funder policy requirements (the DCC provides an overview of funders' data policies and summaries of funders’ data policies) and you may also wish to consider factors such as:
- Are the repository’s terms and conditions acceptable?
- Does the repository enable you to license data in ways that satisfy your needs?
- Where is your data going to be stored e.g. should this be in a jurisdiction where legal safeguards provide at least the same levels of protection that are available in the UK?
- Will your data be given a persistent and unique identifier, such as a DOI?
- Will the repository provide a landing page for each dataset, with metadata that helps others find it, understand what it is, and cite it?
- Where relevant, does the repository support legal requirements (e.g. for data protection) and ethical requirements (e.g. for controlling access to your data)?
- Is the repository established and well-funded so that you can rely on it to preserve your data into the future?
- Does the repository have a good reputation in your research community?
For more detailed advice on where to deposit data, see: Whyte, A. (2015). 'Where to keep research data: DCC checklist for evaluating data repositories' v.1 Edinburgh: Digital Curation Centre.
Preparing data for deposit
Thinking about data deposit as part of your data management planning from the beginning of a project can:
- significantly reduce project costs
- ensure that data is ready for deposit at an appropriate time
- improve the discoverability (and hence the potential impact) of your data
To illustrate, data centres may ask you to satisfy minimum quality standards to ensure that data can be understood and re-used by other researchers.
Before depositing data you typically need to select, prepare, organise, and document your data, check any legal or ethical issues, and decide what access you will give to your data. Whilst the University of Edinburgh Checklist for deposit was developed for a specific repository, the principles can help you identify relevant issues and prepare for data deposit.
The DCC guide How to license research data can help you decide how to apply a licence to your research data, and which licence would be most suitable.
Data access statements
A data access statement may be required by research funders, and is a requirement of the RCUK Policy on Open Access which states:
“[3.3] (ii) As part of supporting the drive for openness and transparency in research, and to ensure that researchers think about data access issues, the policy requires all research papers, if applicable, to include a statement on how underlying research materials, such as data, samples or models, can be accessed.”
Data access statements are used in publications to describe where supporting data can be found and under what conditions they can be accessed. The objective of the statement is to aid data discovery. Accordingly, data access statements should include a persistent identifier, such as a Digital Object Identifier (DOI), which links directly to the data or to supporting documentation that describes the data in detail, how it may be accessed and any constraints that may apply.
· If data are openly available, provide the name of the data repository together with any persistent identifiers (e.g. Digital Object Identifier (DOI)) for the data.
· If there are justifiable legal or ethical reasons why the data cannot be made available, then these should be noted.
· If the data are not openly available, then direct users to a permanent record that describes any access constraints or conditions that must be satisfied for access to be granted.
· If you did not collect the data yourself but instead used existing data from another source, then this source should be credited.
A simple direction to ‘contact the author’ would not normally be considered sufficient.
The exact format and placement of a data statement may be influenced by a publication’s house style.
DOIs are provided for datasets by many data centres and repositories. For guidance on where to deposit data, see the Sharing data page.
Licensing your data can help clarify the terms of its use. The DCC guide on How to License Research Data offers guidance.
Example data access statements
Depending on the nature of your data you may wish to combine information from different examples.
Openly available data
- All research data supporting this publication are directly available within this publication.
- Additional research data supporting this publication are available as supplementary information accompanying this publication at [insert DOI].
- Additional research data supporting this publication are available from the [insert repository name] repository at [insert DOI].
- Multiple datasets openly available at various data repositories were used to support these research findings. All the data used are referred to in the ‘References’ section of this publication.
Secondary analysis of existing data
- This study is a re-analysis of existing data that are publicly available from the [insert name of repository] repository at [insert DOI]. Further documentation about data processing is available from the [insert repository name] repository at [insert DOI].
- This study brought together existing research data obtained upon request and subject to licence restrictions from a number of different sources. Full details of how these data were obtained are available in the documentation available at the [insert repository name] repository at [insert DOI].
- Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available from the [insert repository name] repository at [insert DOI].
- Anonymised interview transcripts from participants who consented to data sharing, plus other supporting information, are available from the UK Data Service, subject to registration at [insert DOI].
- Supporting data are available to bona fide researchers, subject to registration, from the UK Data Service at [insert DOI].
- Due to the [insert term where appropriate: commercially, politically, ethically] sensitive nature of the research, no research subjects consented to their data being retained or shared. Additional details relating to other aspects of the data are available from the [insert repository name] repository at [insert DOI].
- Processed, qualitative data from this study is available from the [insert repository name] repository at [insert DOI]. Additional raw data related to this publication cannot be openly released; the raw data contains transcripts of interviews, but none of the interviewees consented to data sharing.
- Research data supporting this publication will be available from the [insert repository name] repository at [insert DOI] after a 6 month embargo from the date of publication to allow for commercialisation of research findings.
- Due to confidentiality agreements with research collaborators, research data supporting this publication can only be made available to bona fide researchers subject to a non-disclosure agreement. Details of the data and how to request access are available from the [insert name of repository] repository at [insert DOI].
Cost-effective sharing of data
- This publication is accompanied by a representative sample of research data from the experiment which is available from the [insert repository name] repository at [insert DOI]. Detailed procedures explaining how this representative sample was selected, and how this experiment can be repeated, are provided in the Materials and Methods section of this publication. Additional raw data underlying this publication contain [insert relevant number] additional sample images. These additional images are not shared online due to size of the images ([insert relevant number]GB/image); public sharing of these images is not cost-effective, and the experiment can be easily reproduced.
Digital data in proprietary formats
- Research data supporting this publication is available from the [insert repository name] repository at [insert DOI]. Some of this data is only available in a proprietary file format [insert name of the file format], which can only be opened with [insert name of software] software.
Non-digital research data supporting this publication are stored at a safe location at [insert name of School or institution] and can be made available on request, subject to the requestor travelling to [insert location of the samples]. Further information about the data and how to request access are available from the [insert repository name] repository at [insert DOI].
No new data created
- This is a review article, and therefore all data underlying this study is cited in the references.
Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to other scholarly resources. Effective data citation:
- provides the information necessary to identify and locate the data;
- enables the reuse and verification of data;
- allows the impact of data to be tracked;
- creates a scholarly structure that recognises and can reward data producers.
The exact format and placement of a data citation may be influenced by a publication’s house style.
To illustrate, you could provide a citation to your data using the following core elements.
- Creator (Publication Year): Title. Publisher. Identifier
In this example, the ‘identifier’ is a digital object identifier (DOI):
- Denhard, Michael (2009): dphase_mpeps: MicroPEPS LAF‐Ensemble run by DWD for the MAP D‐PHASE project. World Data Center for Climate. http://dx.doi.org/10.1594/WDCC/dphase_mpeps
Where appropriate, it may also be desirable to include information about two optional elements, Version and Resource Type. If so, the recommended form is as follows:
- Creator (Publication Year): Title. Version. Publisher. Resource Type. Identifier
For more information, see the DCC guide How to cite datasets and link to publications.