Closing the gap – connection points between DMPs and repositories

13 June, 2019

On Monday, we ran a workshop  in Hamburg at the Open Repositories conference where we discussed potential connection points between Data Management Plans and repository platforms. We had a great variety of attendees from different backgrounds (librarians, repository managers, developers, researchers) and different countries (South Africa, China, Japan, Australia, Canada, Germany, UK and Norway amongst others). As a result our discussions were very rich and informed by various perspectives and viewpoints.

 

At the beginning we asked what people would like to do with DMPs. Responses were focused on ensuring that DMPs are not standalone. They should be used to support capacity planning, to validate information when depositing data into repositories, and to be published as an output.  

 

“Automate the process by integrating with our CRIS.”

“Use content to prepopulate deposit and determine training needs”

“Validate data sharing and reuse permissions”

“Assist PIs to create and make available in IR”

“Publish them as documentation for a project”

“Include DMP in repository”

“Use and re-use of metadata on deposited items - that would include PI, formats of the data“

“Link to researchers, organisations, outputs”

“Keep it simple for the researcher”

“Develop DMP not for funder but facilitating research activity”

“Enhance reuse and interoperability”

“Reuse and connect”

 
 
Living document

DMPs can be seen as a conversation starter between the institution and the researchers. They help institutions talk to researchers about various aspects of managing their research data. For instance, at Manchester University they use the DMPonline API to track the creation of new DMPs and start a conversation with the researchers about their project, data planning and management. See more on this in Clare’s blog post.

 

Another great suggestion at Open Repositories was including a repository profile in DMPs to note what each service would accept and help researchers select the right one. Suggestions could be made based on the data type, format, access restrictions and data volumes.  Institutions might want to make recommendations and inform researchers of any limitations based on the funder, financial charges or agreements they might already have in place.

 

We are currently working on machine-actionable DMPs and during our recent user groups we have started discussions around conditional and trigger questions which can be used to tailor what is asked and what specific pieces of information are pulled out. Having triggers in data management plans would help with repository planning. For instance, at the beginning of the project researcher might expect to create 20 GB of data. However they might re-visit the DMP as plans change and increase this by an order of magnitude. In cases where this exceeds what a service can support, these changes would be flagged to renegotiate.

 

Another example is that at the beginning of the project the researchers might decide that the data will be openly available, however when depositing the data another researcher decides this has to be closed data. Therefore, it would be good to have a DMP as a reference and be able to validate what has changed and to know why. It would also be beneficial to have a ‘gatekeeper view’ where a responsible person could see all the activities between the DMPs and repositories where they could have access to some dashboard with various information that could be potentially customised.


Two way communication
 

Another point that was raised was ensuring a two-way stream of communication, or in other words understanding not only how we can connect DMPs to repositories but also how to connect repositories and other systems to DMPs. We discussed that there are already a lot of systems in place. Having an understanding of what information is held in each and how they overlap could smooth the communication between them.


It is important to pull information from DMPs to inform repository activities, but similarly the DOIs and grant information could be fed back into DMPs. Recently the DMPonline team has added an integration with the OpenAIRE API to allow researchers writing H2020 DMPs to find and add their project details. See the video on how this works.

 

Challenges

Different repositories require different information


It will be important to have DMPs with more granular fields so different information could be mapped out for the repositories and vice versa. We discussed the RDA Common Standard for DMPs and the potential to extend our data model to record specific datasets. This would help provide metadata, technical information, access permissions and licensing information on a more granular level, which could ease data deposit workflows. Workshop attendees mentioned specific pieces of information that could help. Although different repositories will require different information, the more general ones were: dataset metadata, ORCID / researcher details, data ownership, the size and format of the data, access rights, ethics and any other information that relates to the license and privacy of the data.


International collaboration projects


It is important to also bear in mind that data ownership, access and reuse will vary in different contexts. Even within the same country, different universities might have different rules about who owns the data, or around licensing the data. Contextual information therefore becomes very important to understand any limitations. Also, in large-scale collaborations there are international laws that will apply to the data in terms of the licensing, copyrights and other aspects.

 

DOIs


Attendees felt it would be beneficial for DMPs to have DOI to support discovery and to link up research outputs. If this is the case though, it will be important to consider how to manage the changing nature of DMPs. Should there be a DOI for every change, or do we just publish certain versions? Another question that was raised was whether it will be the people who should manage these changes or whether this should be machine actionable?

 

So what’s next?

 

Since the main use cases to emerge in this workshop were capacity planning and validating the DMP against the activities which result, we will continue to prioritise the conditional questions (#1772) and setting triggers. This will allow the researcher to skip a bunch of questions for instance on security or ethics in a case this will be not applicable for the research project. Triggers would flag responses which exceed agreed parameters or notify universities of significant changes to ethics and data volumes for example so they could get in touch with the researcher.

 

We are planning to work on plan versioning in the months to come. This is a bigger piece of feature development. We will also keep you informed about the work to come, and if you have any suggestions, please do get in touch with us.


If there is anything you wish to discuss in regards to this, feel free to join our monthly GoToMeetings. We run these  as drop-in sessions where you can chat with your peers, ask us any questions you have or learn about new features. If there are some points you wish to raise before the meeting, drop us an email. Our next drop in meeting will be on 17th June 2019 at 12:00 BST.

• 17th June 2019 at 12:00 DMPonline users June drop-in meeting - GoToMeeting.

As always, I am very keen to hear from you about how you use the tool and how we can make it work better, so please feel free to email me at magdalena.drafiova@ed.ac.uk, dmponline@dcc.ac.uk, on Twitter @mdDCC1, @DMPonline, Facebook or LinkedIn. To keep up with DMPonline news, you can subscribe to the RSS feed to receive our blogs and tweets, and watch GitHub for code updates. You can also discuss any of our new features on the user group mailing list.