Preface

Preface#

This collection of workshops provides an introduction to the concepts, workflows, and tools fundamental to reproducible computational research. The workshops are:

  • Reproducibility Principles and Practices (one 1-hour session)

    A research project is reproducible if a different researcher can carry out the same analysis with the same data and produce the same overall result. To do so, they need transparent, detailed documentation about all of the steps in the research process and access to the tools—especially code—with which the steps were carried out. Reproducibility enables independent verification, a touchstone for all research.

    There are myriad practices, often accompanied by software tools, that can help ensure research projects are reproducible. This overview workshop will help you decipher which to adopt and when to adopt them. The workshop also highlights additional benefits many of these practices confer, such as making it easier to collaborate with others. As an overview, this workshop is relatively non-technical, but provides technical references, including other DataLab workshops, for all of the practices covered.

    This workshop is intended for learners at all experience levels, and may benefit learners at different experience levels in different ways.

    Important

    We use this slide deck when we present this workshop.

    The entire workshop is summarized by this cheat sheet.

  • Introduction to the Command Line (one 2-hour session)

    Learn and practice how to talk directly to your computer via the command line. The command line is a powerful tool for using scientific software, working with large data sets, and controlling remote servers. It is primarily used to manage files and run programs, and it allows for automation of repetitive tasks. This workshop is a prerequisite for many of DataLab’s other workshops, including all of the following workshops in this list.

    See also

    There’s a recording of an earlier version of this workshop on Aggie Video.

  • Installing Software with Pixi (one 2-hour session)

    Learn how to install and manage open-source software packages for research computing. Installing software is often tricky due project-specific requirements for package versions—which can conflict with other projects—and inconsistent or incomplete install documentation. We’ll focus on using Pixi, a package manager for the conda ecosystem, to create independent, reproducible software environments and install software with ease. We’ll also briefly discuss the unique advantages of using pixi for Python projects and how pixi compares to other package managers.

  • Introduction to Version Control (one 2-hour session)

    This workshop covers the fundamentals of using version control systems for reproducible research. Topics covered include what version control is, key concepts and terminology, how to install the Git version control system, how to create a repository, how to save versions of files, how to restore old versions of files, and how to use hosting services for Git repositories to share and collaborate on projects.

  • Git for Teams (one 2-hour session)

    This workshop goes beyond the basics of Git: it explains how to customize Git to suit your preferred workflow and how to take full advantage of Git and GitHub’s collaborative features. Topics include why and how to use branches, how to merge branches even when there are conflicts, how to use GitHub’s project management features, and ways to configure Git to be more convenient. This workshop also prepares learners to use Git to contribute to open-source projects.

Open-source tools are an integral part of many research projects. Contributing to these projects ensures they continue to be sustainable, and releasing research-related code under open-source licenses ensures computational research is reproducible. These workshops are part of the University of California Open Source Program Office Network and their development was funded in part by a grant from the Alfred P. Sloan Foundation.