1. Installing Software#
Learning Goals
After this lesson, you should be able to:
Explain what computing environments are
Explain what virtual environments are and why they’re useful
List popular tools for installing software on POSIX computers
Create and organize project directories for projects
Initialize projects with Pixi
Install software with Pixi
This chapter will show you how to set up a computer so that it has the software you need. You’ll learn how to install software whether or not you have administrator access.
1.1. What’s an Environment?#
A computing environment is a collection of hardware, software, and associated settings. Whenever you run software (including code) on your computer, it runs in a computing environment. Being able to set up, inspect, maintain, and document computing environments is important because:
A computer might not have the software you need pre-installed. Often the best and sometimes the only solution is to install the software yourself. Doing it yourself gives you more control over what’s installed and is typically faster than asking an administrator to do it for you. For your personal computer, you’re probably the sole administrator. You can use the same tools to set up and maintain environments on your computer as you do on remote computers.
Different projects may require different computing environments, and you may need to switch between them frequently. Switching software environments and settings can be quick and painless with the right tools. With modern compute clusters and cloud computing services, even switching hardware environments can be relatively easy.
Specifying a required software environment, with versions, can make it easier to collaborate on, distribute, revisit, and reproduce research projects. Differences in environment can cause major errors or subtle bugs.
Inspecting the computing environment is the first step to diagnosing most computing problems. If you ask someone for help, they’ll likely want to know which hardware, software, and settings you’re using.
In a high-level programming language like R or Python, details of the hardware are mostly hidden away. That is, hardware has limited influence on how you write code (with exceptions for a few specific use cases, such as GPU computing). Hardware affects how quickly your code runs, but usually not the final result. For many research computing projects, hardware is less of a concern than software, so this chapter focuses on software environments.
1.2. Environment Managers#
A package manager is a tool that can download, install, update, and remove software packages. If you’ve used R or Python, you might already be familiar with the package managers they provide. Many modern operating systems also provide a package manager, because package managers have several benefits. They can:
Automatically select packages compatible with the computing environment
Automatically install dependencies for packages
Update installed packages, often automatically or with a single command
In some cases, provide guarantees that packages are not malicious
Note
Most Linux distributions provide a package manager as the recommended way to install software. Nevertheless, it’s possible to install software on Linux without a package manager. One way is to download the source code for the software and compile it yourself; another is to download a pre-built binary. FlatPak and AppImage are two popular formats for distributing pre-built binaries.
Install software via a package manager when possible, but be aware that there are alternatives when it’s not.
Some, but not all, package managers can create virtual environments: self-contained environments that can coexist alongside others, even if they contain conflicting packages. You can think of a virtual environment as being like a terrarium for a collection of packages.
Virtual environments make it easier to work on projects with different software requirements simultaneously. For example, suppose one of your projects requires Python 3.13 or newer, but another uses a package that hasn’t been updated since Python 3.9. You can work on either project as needed if you create two virtual environments and switch between them: one with Python 3.13 and one with Python 3.9.
In the strictest sense, an environment manager is a tool that can create, modify, and delete virtual environments. There are environment managers that are not package managers (and vice-versa), but from here on we’ll use the terms somewhat interchangeably.
Pixi is the environment manager we recommend and use. Pixi is related to the popular environment manager Conda: both install conda packages from conda-forge, a community-led repository of packages for research computing. There are packages on conda-forge for R and Python, as well as other programming languages and tools. Pixi can also install packages from other sources and repositories (most notably, from the Python Package Index).
We recommend Pixi over Conda because Pixi creates environments that are fully reproducible, takes a project-centric approach to environments, is noticeably faster, and lacks many of Conda’s quirks and pitfalls. That said, Pixi is relatively new, so you might occasionally encounter missing features or bugs.
Important
Pixi is available for Windows, macOS, and Linux, and generally doesn’t require administrator privileges to install.
Install Pixi by following the official instructions.
Note
Examples of other popular package and environment managers are:
Homebrew for macOS and Linux
Chocolatey for Windows
Advanced Packaging Tool (APT) for Debian-based Linux distributions
Nix for Linux and macOS
Spack for Linux and macOS, focused on high-performance computing
EasyBuild for Linux, focused on research computing
Note
Virtualization tools, such as Podman, Docker, and VirtualBox are a different way to create isolated computing environments. They provide complete control over the operating system and software in an environment, so they provide stronger guarantees of reproducibility. The cost is that these tools are often slower than environment managers and using them requires more technical knowledge.
For most research projects, using an environment manager provides adequate flexibility and reproducibility.
1.3. Pixi#
Pixi manages virtual environments on a per-project basis. Each project must have its own directory—called a project directory—where all of the files related to the project are stored. We recommend organizing projects this way even when you aren’t using Pixi. By using a project directory to centralize all of the files in a project, it’s easier to:
Find files, because you know where to look or to run search software
Move or copy the project to other computers
Share the project with collaborators, colleagues, or the public
Create backup copies of the project to protect your work
Access and run files with R, Python, and other tools
Use version control software to manage different versions of project files
Tip
Create a new project directory for every project, no matter how small. As you produce or acquire new files for the project, make sure that they’re stored in the directory. Some examples of files you should store in a project directory are:
Documentation, such as a file manifest and instructions for use
Code, such as notebooks and scripts
Inputs, such as data sets and configuration files
Outputs, such as reports, figures, and intermediate data sets
License information (if the project will be shared with anyone)
Give the files descriptive names and create subdirectories to keep the files well-organized. The gold standard is for a project directory to be completely portable, meaning you can copy the directory to another computer, follow included instructions to setup necessary software (such as R or Python), and then run the code without any modifications to get the expected result.
1.3.1. Initializing a Project#
You can initialize a project to use Pixi with the pixi init
command. The
command takes one argument: a path to the project directory. If the directory
doesn’t exist yet, Pixi will create it.
Open a terminal and make a directory where you can try out some commands:
mkdir pixi_workshop
cd pixi_workshop
Initialize a Pixi project called my_project
:
pixi init my_project
✔ Created /home/nick/pixi_workshop/my_project/pixi.toml
Navigate to the new my_project/
directory and take a look at the files
inside:
cd my_project
ls -a
. .. .gitattributes .gitignore pixi.toml
The .gitattributes
and .gitignore
files are files for Git, a version
control system. If you’re not familiar with Git, it’s safe to ignore these
files, and you can skip the rest of this paragraph. The .gitattributes
file
tells Git how to handle merges for pixi.lock
, a file Pixi uses to keep track
of installed packages. The .gitignore
file tells Git to ignore the .pixi/
subdirectory, which is where Pixi installs packages.
See also
We recommend using Git for your research computing projects. To learn more, check out DataLab’s Introduction to Version Control reader.
The pixi.toml
file identifies the project as a Pixi project. It’s also a
place where you can store metadata about the project (such as its name,
authors, and version) and where Pixi will store details about the project’s
virtual environments.
Open pixi.toml
in a text editor (such as vim
). It will look something like
this:
[project]
authors = ["YOUR_NAME <YOUR_EMAIL>"]
channels = ["conda-forge"]
name = "my_project"
platforms = ["linux-64"]
version = "0.1.0"
[tasks]
[dependencies]
The file is in TOML format, a format designed to be easy for both people
and computers to read and write. TOML files consist primarily of key-value
pairs of the form key = value
. The key-value pairs can be organized into
named tables with headers of the form [name]
.
A pixi.toml
file always has at least three tables:
project
is metadata about the project.tasks
is a list of project-specific commands.dependencies
is a list of required packages for the project’s default virtual environment, which starts out empty (no packages).
When you initialize a project, Pixi fills in as much of pixi.toml
as it can.
For instance, it gets your name and email from Git, if you have Git installed
and configured. You can edit pixi.toml
to add or correct details.
Section 1.3.3 explains more about this file, but first, let’s
install some packages.
1.3.2. Adding & Removing Packages#
You can list a project’s packages with pixi list
. Go ahead and try this for
the my_project
project:
pixi list
✘ No packages found in 'default' environment for 'linux-64' platform.
We haven’t installed any packages yet, so there are none to list.
Suppose we want to install Python. Package names are always lowercase and
usually not surprising, although you can use the pixi search
command or
search online if you’re not sure about a package’s name. Python is in the
python
package.
You can install a package with the pixi add
command. So the command to
install Python is:
pixi add python
✔ Added python >=3.13.2,<3.14
By default, Pixi installs the most recent compatible version of a package. So depending on your computer’s operating system and when you run the command, Pixi might install a different version of Python.
Tip
You can use =
, <
, <=
, >
, and >=
with pixi add
to set constraints on
package versions. For instance, if you want the most recent version of Python
less than 3.10:
pixi add 'python<3.10'
When you set constraints like this, make sure to surround them with single
quotes (' '
) or the shell will misunderstand the command.
Caution
The pixi add
command installs packages. Don’t confuse it with the pixi install
command, which installs entire virtual environments. You usually won’t
need to run pixi install
, because Pixi runs it automatically as needed.
The python
package is now listed in the environment, along with all of its
dependencies:
pixi list
Package Version Build Size Kind Source
_libgcc_mutex 0.1 conda_forge 2.5 KiB conda _libgcc_mutex
_openmp_mutex 4.5 2_gnu 23.1 KiB conda _openmp_mutex
bzip2 1.0.8 h4bc722e_7 246.9 KiB conda bzip2
ca-certificates 2025.1.31 hbcca054_0 154.4 KiB conda ca-certificates
ld_impl_linux-64 2.43 h712a8e2_4 655.5 KiB conda ld_impl_linux-64
libexpat 2.6.4 h5888daf_0 71.6 KiB conda libexpat
libffi 3.4.6 h2dba641_0 52.2 KiB conda libffi
libgcc 14.2.0 h767d61c_2 828 KiB conda libgcc
libgcc-ng 14.2.0 h69a702a_2 52.5 KiB conda libgcc-ng
libgomp 14.2.0 h767d61c_2 449.1 KiB conda libgomp
liblzma 5.6.4 hb9d3cd8_0 108.7 KiB conda liblzma
libmpdec 4.0.0 h4bc722e_0 87.9 KiB conda libmpdec
libsqlite 3.49.1 hee588c1_2 897.1 KiB conda libsqlite
libuuid 2.38.1 h0b41bf4_0 32.8 KiB conda libuuid
libzlib 1.3.1 hb9d3cd8_2 59.5 KiB conda libzlib
ncurses 6.5 h2d0b736_3 870.7 KiB conda ncurses
openssl 3.4.1 h7b32b05_0 2.8 MiB conda openssl
python 3.13.2 hf636f53_101_cp313 31.7 MiB conda python
python_abi 3.13 5_cp313 6.1 KiB conda python_abi
readline 8.2 h8c095d6_2 275.9 KiB conda readline
tk 8.6.13 noxft_h4845f30_101 3.2 MiB conda tk
tzdata 2025a h78e105d_0 120 KiB conda tzdata
Packages you installed explicitly are printed in bold (although the bold
doesn’t show up in this reader). All of the other packages are dependencies.
You can use pixi add
multiple times to install whatever packages you need.
Once you’ve installed some packages in a virtual environment, you’ll probably
want to use them. You can run a command in the virtual environment with the
pixi run
command. So to run python
:
pixi run python
This will open a Python prompt in the virtual environment. You can use Python as you would normally at this point.
Note
On most operating systems, you can use which
to check which program a command
will run. For instance, try:
pixi run which python
Compare the output to:
which python
The output should be different. The which
command is a Unix shell tool, not
part of Pixi.
If you no longer need a package for your project, you can uninstall it with
pixi remove
. Let’s remove Python from our project:
pixi remove python
✔ Removed python
The environment is once again empty:
pixi list
✘ No packages found in 'default' environment for 'linux-64' platform.
With Python uninstalled, let’s install another major language of data science:
R. The package for R is called r
. We’ll also install R’s popular ggplot2
package for data visualization. In conda-forge, R packages have the prefix
r-
, so ggplot2 is r-ggplot2
. We can install both packages in a single pixi add
command:
pixi add r r-ggplot2
When you need to run multiple commands in a virtual environment, typing pixi run
in front of each one is inconvenient and tedious. Instead, you can use
pixi shell
to launch a subshell in the virtual environment. Any commands you
enter in the subshell run in the virtual environment.
Caution
If you’re using Windows and Git Bash, the pixi shell
command is not yet
supported.
Open a subshell:
pixi shell
Now try running R
. You can use R normally, exit, and reopen R again without
leaving the virtual environment. When you’re finished with the subshell, enter
exit
to return to your original shell.
1.3.3. Editing pixi.toml
#
When you install a package with pixi add
, Pixi adds the package’s name and a
version constraint to the pixi.toml
file. Likewise, when you uninstall a
package with pixi remove
, Pixi removes the package’s name from the file. The
file is a description of the project and the virtual environment(s) it
requires.
Open pixi.toml
in a text editor again:
[project]
authors = ["YOUR_NAME <YOUR_EMAIL>"]
channels = ["conda-forge"]
name = "my_project"
platforms = ["linux-64"]
version = "0.1.0"
[tasks]
[dependencies]
r = ">=4.4,<4.5"
r-ggplot2 = ">=3.5.1,<4"
The packages we installed in the default virtual environment with pixi add
are listed in the dependencies
table, along with their version constraints.
In the constraints, the lower bound comes from the version of the package that
was actually installed. The upper bound is one minor version higher. We say
these packages are pinned because there are specific constraints on their
versions. Pixi makes sure constraints on pinned packages are satisfied unless
you explicitly override them.
In addition to or instead of using pixi add
and pixi remove
to manage
packages, you can do so by editing pixi.toml
directly. For instance, if you
want to remove a package, just remove its line from pixi.toml
. Pixi will
automatically uninstall the package the next time you run a pixi
command.
Similarly, you can add packages by adding a line to pixi.toml
, and they’ll be
installed the next time you run a command.
Take a look at the files in the my_project/
directory again now that we’ve
added and removed some packages:
ls -a
. .. .gitattributes .gitignore .pixi pixi.lock pixi.toml
There’s a new directory called .pixi/
. This is where Pixi installs all of a
project’s virtual environments and packages.
Tip
If you know you won’t work on a project again for a while (or ever), you can
safely delete .pixi/
to get back some storage space. All of the information
needed to reconstruct the virtual environment(s) is in pixi.toml
and
pixi.lock
.
There’s also a new file pixi.lock
, called a lockfile. The lockfile lists
the exact version and source of every package, including dependencies,
installed in the project’s virtual environments. It’s a complete specification,
in contrast to the flexible specification in pixi.toml
. Both are useful: the
lockfile is for reproducing environments exactly, while the pixi.toml
file is
for producing compatible environments—as defined by the version
constraints—that might differ slightly from the originals.
Important
If you use Git, make sure to commit both pixi.toml
and pixi.lock
.
By default, Pixi will only compute the lockfile for your computer’s operating
system. If you plan to use other operating systems—or share your project with
people who do—it’s a good idea to have Pixi compute the lockfile for all of
them. You can do this by adding the operating systems to the platforms
key in
pixi.toml
. Some valid operating systems are linux-64
, macos-64
,
osx-arm64
(for M1, M2, …), and win-64
. Computing the lockfile for all of
the operating systems where your project might be used can help you catch
problems with virtual environments early, such as packages that are not
available for some operating systems. For example, if you want to include all
three operating systems, change the platforms
key to:
platforms = ["linux-64", "osx-64", "osx-arm64", "win-64"]
Pixi searches for and installs packages from the conda-forge repository by
default. In the jargon of Pixi (and Conda), package repositories are called
channels. You can specify other channels Pixi should search for packages in
pixi.toml
’s channels
key. The channels have highest to lowest priority from
left to right.
1.3.3.1. Tasks#
In addition to being a package and environment manager, Pixi is also a simple task runner: a tool that can run other commands in a convenient way. Tasks make the functionality or workflow of your project easier for other people to discover—especially when paired with detailed documentation—and lower the cognitive burden on you and your collaborators to remember the syntax for specific commands.
You can create tasks with the pixi task
command or by editing the [tasks]
table in pixi.toml
. In pixi.toml
, tasks take this form:
task_name = { cmd = "commands_to_run", description = "Help for the task." }
The description
field is optional and can be omitted, but it’s a good habit
to document your tasks.
As an example, we use these tasks in the project for this reader:
[tasks]
build = { cmd = "jupyter-book build .", description = "Build the reader." }
publish = { cmd = "ghp-import --no-jekyll --no-history --push _build/html", description = "Publish the reader to the `gh-pages` branch on GitHub." }
clean = { cmd = "rm -rf _build/", description = "Remove the build directory." }
rebuild = { depends-on = ["clean", "build"], description = "Remove the build directory and build the reader." }
The build
task generates the reader from source files, the publish
task
uploads the reader to the web, and the clean
task deletes the built reader
(in case we want to rebuild it). The rebuild
task is special: instead of a
command to run, it specifies that it depends-on
the clean
and build
tasks. So Pixi will run both of those tasks when you run rebuild
.
You can run tasks with the pixi run
command and the name of the task. So in
the project for this reader, we can run the build
task with this command:
pixi run build
Caution
Be careful not to give tasks names that conflict with other tools you use. For
instance, python
is probably not a good name for a task.
See also
There are many other useful things you can do by editing pixi.toml
, such as
setting up multiple virtual environments or specifying minimum hardware
requirements for a project.
See the Pixi documentation for more details.
1.3.4. Global Installs#
Occasionally, you might come across a tool that’s broadly useful but not
required by any particular project. For example, ripgrep (rg
) is a tool to
search for files that contain a given text string (or regular expression). It’s
a faster, modernized version of the classic grep
tool, and worth knowing
about if you use the command line frequently, because it can help you find
files quickly.
You can use the pixi global install
command to install a package globally, so
that it’s available in all shells without any need to use pixi run
or pixi shell
. Try installing ripgrep
this way:
pixi global install ripgrep
Global environments as specified in '/home/nick/.pixi/manifests/pixi-global.toml'
└── ripgrep: 14.1.1 (installed)
└─ exposes: rg
Pixi creates a new virtual environment for each package you install this way,
in order to prevent conflicts. You can use pixi global list
to see which
packages you’ve installed globally:
pixi global list
Global environments as specified in '/home/nick/.pixi/manifests/pixi-global.toml'
└── ripgrep: 14.1.1
└─ exposes: rg
If you want to uninstall a globally-installed package, use pixi global uninstall
. For example, to uninstall ripgrep
:
pixi global uninstall ripgrep
✔ Removed environment ripgrep.