Managing open data with an industrial research partner

With funders trying to ensure maximum value-for money they are now demanding that research data be made widely available.  This can be good for researchers and for research in general – but what does it mean for working with industry and are there some principles and practices to make it easier?

Dan Crane, Research Support Librarian in the Engineering Department, led a seminar last week to discuss the context of open data, to describe some principles for working effectively and signposted many resources.  There was also an interesting discussion about the implications for working with industrial partners.

Open Access is being driven by funders (including the Research Councils and the Wellcome Trust) as a condition of their funding and, importantly, will be a condition for publications to be submitted for the next REF.  The funders seek greater use, leverage and impact of the work they have supported and to minimise duplication of effort.  Open access has a number of practical benefits for the researcher, including potentially greater exposure and citation.  Managed well, open data will potentially also allow a rich seam of research associated with the data itself.

To get an overview of the area there is a very useful site to guide you through the issues (www.openaccess.cam.ac.uk) and to ensure that your publication is eligible for the REF, by helping you comply with funders’ requirements. It includes advice and guidance about your copyright options (https://www.openaccess.cam.ac.uk/what-do-i-need-to-do#section-your-copyright-options), support for liaison with publishers and the ways you can managing publications and open access.  Another valuable source of advice is the library itself; contact them on cued-library@cam.ac.uk.

RDM2It’s not only about the final publications but also about the management of the data from your research.  The first port of call should be the Research Data Management site at www.data.cam.ac.uk.  There is a summary of, and links to the funders’ policies (www.data.cam.ac.uk/funders) of which EPSRC’s is the most stringent.  There is a specific page that advises how to comply with their needs (www.data.cam.ac.uk/funders/epsrc-funded-researchers). Note also that the EPSRC is also promising to follow up and check for compliance with its policies.

In essence, the requirement is that all publications should have a statement describing how to access the underlying data or a statement explaining why access to the data has been restricted.  The materials must be available for ten years and this is done most conveniently via long-term repositories which maintain Digital Object Identifiers.  Cambridge University’s repository (www.repository.cam.ac.uk)  is one such.

There are several accepted reasons why data may be restricted including, for example, personal data; data that is sensitive, for example that might influence national security; if there are intellectual property or commercial confidentiality considerations; or if it is not cost-effective to store all the data, perhaps because of its volume.  In instances where volume is an issue then a subset can be usefully stored.  The metadata which describes the data and any restrictions should be in the publications or with the data.  Note that the EPSRC regards the researcher themselves as the person best-placed to decide on the data to be made available or to describe the reasons why it cannot be released.

So all this leads to the practice of disciplined Research Data Management as

  • preparation for sharing and preserving data as a research outcome, and
  • underpinning working as efficiently as possible during the research process

The key to this is to address some important preliminary questions before you start, including:

  1. What type of data will you generate in your project?
  2. What will be the volume (size) of your data? Will you require financial support to share your data?
  3. File formats
  4. What are your proposed data management strategies?
  5. How will you describe your data?
  6. Secondary use
  7. Methods for data sharing
  8. Timeframes

(Dan provided a briefing document covering these questions – if you’re interested please contact the CUED library or me on cb683@cam.ac.uk )

Then you can go on to create a Management Plan covering topics such as:RDM1

  • Context
  • Data Collection
  • Documentation and Metadata
  • Ethics and Legal Compliance
  • Storage and Backup
  • Selection and Preservation
  • Data Sharing
  • Are any restrictions on data sharing required?
  • Responsibilities and Resources

 

 

This is supported by the Digital Curation Centre which offers and on-line template: https://dmponline.dcc.ac.uk/.  It’s best to do this as you put together your plans for research and as part of writing your grant application.  In this way all your thinking is integrated and you will remember to reserve the budget and facilities for data management.

DMP

So how does all this change for working with industrial partners?  Actually, it’s all about communication and starting early.

Dan’s advice covered:

  • Communication with an industrial partner to explain funders’ requirements for data sharing, and allowable exemptions, and plans for publishing.
  • Communication with the EPSRC to explain the kind of data used, and the extent to which it might be commercially sensitive.
  • Communication with the industrial partner to understand and negotiate acceptable transfer, storage and sharing.

 

Then write a Data Management Plan, and make sure all of the team knows its content and why it’s written that way.

You may also need to put in place security measures for data transfer and storage during the work, for example thinking about:

  • Receiving data from partner via secure transfer
  • Storing data on a secure group cluster, but thinking about whether it might also be processed on other computers.
  • Before using, check how secure other computers actually are (and beware of offers for ‘free compute time’!)
  • Is it necessary to use an NDA to collaborate and compute,? How much time will that add to the process of negotiating the relationship and start-up of the work?

 

And when you’re ready to publish make sure you communicate again with your industrial partner. Are they happy for you to publish? And what about the data you’ve generated and wish to share?  Make sure you keep them up to date throughout the project, telling them immediately if things change or you encounter a problem.

During the seminar there was wide-ranging discussion of the issues associated with working with data, especially with industrial partners.

  • The funders are well aware of the commercial concerns of industrial partners and their requirements for open access and data do allow researchers to manage commercial confidentiality. But it may be necessary to talk this through in detail with the industrial partner because, unless the details are understood and managed, this could be a justifiable source of concern for the industrial collaborator.  Hence the importance of the research data management plan.
  • It was pointed out that a research data management plan might also be a considerable contribution to knowledge management, both in research groups and for some industrial collaborators. So here’s a way to derive an extra benefit from working with Cambridge researchers.
  • Data provenance has been a topic of research in the Computer Labs – how do you track the various transformations and manipulations of the data as it is prepared and analysed. They have developed a tool to track this – OPUS.  See http://www.data.cam.ac.uk/support/resources-and-support-cambridge/research-data-management-support#Opus and https://www.cl.cam.ac.uk/research/dtg/fresco/opus/ for more information.
  • This led to a discussion of electronic lab notebooks – see cam.ac.uk/research/news/electronic-laboratory-notebooks-for-academic-research for work in the Chemistry Department. The CUED Library plans to look at these in due course.

Note the potential for added elapsed time as you negotiate your data management plans with your industrial partner

For those interested in following all this up, you can go to the site and sign up for Research Data Management workshops (12/4, 11/5,  14/9) http://www.data.cam.ac.uk/events.

If you’d like a copy of Dan’s presentation then please contact me on cb683@cam.ac.uk.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s