Research Data Management with DataLad
Talk @ MRI Together 2023
Psychoinformatics lab,
Institute of Neuroscience and Medicine, Brain & Behavior (INM-7)
Research Center Jülich, Germany
What is a DataLad?
- A free and open source tool
- for decentralized (research) data management
- with a command line interface, Python API, and a GUI (pre-alpha)
- allowing exhaustive tracking of the evolution of digital objects
- and computational provenance tracking
- to enhance modularity, portability and reproducibility.
Let's explore this...
Everything should be FAIR...
F
indable
A
ccessible
I
nteroperable
R
eusable
https://www.go-fair.org/fair-principles
Wilkinson et al. The FAIR Guiding Principles for scientific data management
and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
But what does FAIR really mean, practically?
- Bench/bed/field-side researchers are an essential source of
valid metadata, critical for FAIR data
- Their resources are limited, and they need something in exchange, otherwise FAIR won't happen
Why not focus on enabling practical collaboration
(even if just with one's future self)?
Why not make the aspirational goal "FAIR data"
a by-product of enabling efficient research?
The DataLad approach
V.A.M.P. (practical) vs F.A.I.R. (aspirational)
Divebomb Records
Be FAIR and immediately benefit from it yourself...
...while still working towards the greater good of FAIR data
V
ersion-controlled
A
ctionable metadata
M
odular
P
ortable
Let's look at an example:
they install datalad...
Data publishing
Data consumption
(Meta)data deposition (on Dataverses)
- Register any dataset at any Dataverse site (e.g. Jülich DATA), receive citable DOI
- No requirement to re-host data (avoids duplication of storage cost)
- Data owner remains in full control over data access
- DataLad extension: https://github.com/datalad/datalad-dataverse
DataLad contact and more information
distribits 2024
4-6 April 2024
The first distribits meeting will happen in 2024, and we are inviting all interested parties to join! The aim of this meeting is to bring together enthusiasts of tools and workflows in the domain of distributed data. It is organized by the people behind the git-annex and DataLad projects. The event will comprise a two-day conference and an additional hackathon day.