What is DataLad?
Keep track
Building on top of Git and git-annex, DataLad allows you to version control arbitrarily large files in datasets, without the need for custom data structures, central infrastructure, or third party services.
- View data change history
- Revert to previous versions
- Capture full provenance record
- Ensure complete reproducibility
Create structure
A DataLad dataset is just a computer directory with files that is managed by DataLad. Datasets can contain other datasets, known as subdatasets, nested arbitrarily deep. DataLad can perform commands recursively across a hierarchy of datasets while maintaining its advanced provenance capture abilities.
Use DataLad
DataLad is packaged as a free and open source command line tool with a Python API and is compatible with all major operating systems. Use DataLad to:
-
create
new datasets locally -
clone
online datasets -
get
content on-demand -
save
changes to datasets -
drop
content as needed -
push
changes to a remote store - ... and much more!
datalad create my_dataset
datalad clone --dataset
datalad get specific_file
datalad drop specific_file
datalad save -m "hello world"
datalad push --to location
Collaborate
DataLad lets you consume datasets provided by others, and collaborate with them. You can install existing datasets and update them from their sources, or create sibling datasets that you can publish updates to and pull updates from. The collaborative power of Git, for your data.
DataLad in the Wild
DataLad is integrated with a variety of hosting services and data management platforms, and extended and used by a diverse community. Export datasets to third party services such as GitHub or Figshare with built-in commands. Extend DataLad to be compatible with your preferred data supplier or workflow. Or use a multitude of other DataLad-compatible services such as Dropbox or Amazon S3. Search through all integrations, extensions, and use cases to find the right fit for your data!
{{itm.name}}
{{itm.description}}
Learn More
DataLad is not solely a data management system, but also an open source community of users, developers, and researchers all contributing to its growth. To support this community, DataLad maintains several important resources.
Install
DataLad
Install DataLad and its dependencies on Linux, MacOS, or Windows
DataLad
Handbook
Become an expert DataLad user with this rich educational resource
Community
Chat
Join the community on Matrix, say hi, and ask questions
Technical
Forum
Get help from DataLad experts and users to solve your challenges
DataLad on
GitHub
Contribute via GitHub by creating issues or sending a pull request
Developer
Docs
Dive into the DataLad API with the developer documentation
DataLad
Tutorials
Hands-on tutorials and videos to help you on your DataLad journey
Supporting DataLad
DataLad development is funded as a US-German project on collaborative research, with primary funding from the US National Science Foundation (NSF 1912266, NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1905, BMBF 01GQ1411). Additional support has been provided by the US National Institute of Biomedical Imaging and Bioengineering (NIH 1P41EB019936-01A1) via ReproNim, the European Union’s Horizon 2020 research and innovation programme under (945539, 826421), and the German federal state of Saxony-Anhalt and the European Regional Development Fund.