DataLad overview

Distributed data management

Free and open source

  Get DataLad

Keep track

Building on top of Git and git-annex, DataLad allows you to version control arbitrarily large files in datasets, without the need for custom data structures, central infrastructure, or third party services.

  •   View data change history
  •   Revert to previous versions
  •   Capture full provenance record
  •   Ensure complete reproducibility

DataLad version control
DataLad nested datasets

Create structure

A DataLad dataset is just a computer directory with files that is managed by DataLad. Datasets can contain other datasets, known as subdatasets, nested arbitrarily deep. DataLad can perform commands recursively across a hierarchy of datasets while maintaining its advanced provenance capture abilities.

Use DataLad

DataLad is packaged as a free and open source command line tool with a Python API and is compatible with all major operating systems. Use DataLad to:

  •   create new datasets locally
  •   clone online datasets
  •   get content on-demand
  •   save changes to datasets
  •   drop content as needed
  •   push changes to a remote store
  • ... and much more!

  Try DataLad in your browser

datalad create my_dataset

datalad clone --dataset

datalad get specific_file

datalad drop specific_file

datalad save -m "hello world"

datalad push --to location

DataLad collaboration

Collaborate

DataLad lets you consume datasets provided by others, and collaborate with them. You can install existing datasets and update them from their sources, or create sibling datasets that you can publish updates to and pull updates from. The collaborative power of Git, for your data.

DataLad in the Wild

DataLad is integrated with a variety of hosting services and data management platforms, and extended and used by a diverse community. Export datasets to third party services such as GitHub or Figshare with built-in commands. Extend DataLad to be compatible with your preferred data supplier or workflow. Or use a multitude of other DataLad-compatible services such as Dropbox or Amazon S3. Search through all integrations, extensions, and use cases to find the right fit for your data!

DataLad integrations and extensions
More tags  No tags found {{tag}} 

{{itm.name}}

{{itm.description}}

DataLad learning

Learn More

DataLad is not solely a data management system, but also an open source community of users, developers, and researchers all contributing to its growth. To support this community, DataLad maintains several important resources.

Install
DataLad

Install DataLad and its dependencies on Linux, MacOS, or Windows

DataLad
Handbook

Become an expert DataLad user with this rich educational resource

Community
Chat

Join the community on Matrix, say hi, and ask questions

Technical
Forum

Get help from DataLad experts and users to solve your challenges

DataLad on
GitHub

Contribute via GitHub by creating issues or sending a pull request

Developer
Docs

Dive into the DataLad API with the developer documentation

DataLad
Tutorials

Hands-on tutorials and videos to help you on your DataLad journey

Supporting DataLad

DataLad development is funded as a US-German project on collaborative research, with primary funding from the US National Science Foundation (NSF 1912266, NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1905, BMBF 01GQ1411). Additional support has been provided by the US National Institute of Biomedical Imaging and Bioengineering (NIH 1P41EB019936-01A1) via ReproNim, the European Union’s Horizon 2020 research and innovation programme under (945539, 826421), and the German federal state of Saxony-Anhalt and the European Regional Development Fund.

DataLad funding