Research Data Management with DataLad

Talk @ MRI Together 2023

title image
Stephan Heunis
@jsheunis jsheunis

Psychoinformatics lab, Institute of Neuroscience and Medicine, Brain & Behavior (INM-7)
Research Center Jülich, Germany

Slides: 
jsheunis.github.io/mritogether23-datalad

What is a DataLad?


  • A free and open source tool
  • for decentralized (research) data management
  • with a command line interface, Python API, and a GUI (pre-alpha)
  • allowing exhaustive tracking of the evolution of digital objects
  • and computational provenance tracking
  • to enhance modularity, portability and reproducibility.



Let's explore this...

Everything should be FAIR...

  • F

    indable
  • A

    ccessible
  • I

    nteroperable
  • R

    eusable


https://www.go-fair.org/fair-principles Wilkinson et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

But what does FAIR really mean, practically?

  • Bench/bed/field-side researchers are an essential source of
    valid metadata, critical for FAIR data
  • Their resources are limited, and they need something in exchange, otherwise FAIR won't happen


Why not focus on enabling practical collaboration
(even if just with one's future self)?


Why not make the aspirational goal "FAIR data"
a by-product of enabling efficient research?

The DataLad approach

V.A.M.P. (practical) vs F.A.I.R. (aspirational)

Divebomb Records

Be FAIR and immediately benefit from it yourself...

...while still working towards the greater good of FAIR data
  • V

    ersion-controlled
  • A

    ctionable metadata
  • M

    odular
  • P

    ortable

Let's look at an example:



they install datalad...



Data publishing

Data consumption






Use cases

(Meta)data deposition (on Dataverses)

  • Register any dataset at any Dataverse site (e.g. Jülich DATA), receive citable DOI
  • No requirement to re-host data (avoids duplication of storage cost)
  • Data owner remains in full control over data access
  • DataLad extension: https://github.com/datalad/datalad-dataverse


DataLad contact and more information



distribits 2024

4-6 April 2024



The first distribits meeting will happen in 2024, and we are inviting all interested parties to join! The aim of this meeting is to bring together enthusiasts of tools and workflows in the domain of distributed data. It is organized by the people behind the git-annex and DataLad projects. The event will comprise a two-day conference and an additional hackathon day.


Website + Demos http://datalad.org
Documentation http://handbook.datalad.org
Talks and tutorials https://youtube.com/datalad
Development http://github.com/datalad
Support https://matrix.to/#/#datalad:matrix.org
Office hour Tuesdays @ 16h CE(S)T
https://matrix.to/#/!NaMjKIhMXhSicFdxAj:matrix.org
Open data http://datasets.datalad.org
Mastodon @datalad@fosstodon.org
Twitter @datalad