Rethinking requirements.txt

What is requirements.txt?

This should be familiar to most Python programmers, but here’s a brief summary anyway: a requirements file contains a list of dependencies for an application (not a library), often with specific version information. Many requirements files are generated with commands like pip freeze > requirements.txt1 . Here’s an example:

$ cat requirements.txt
blessings==1.6
bpython==0.14.2
curtsies==0.1.19
flake8==2.4.1
greenlet==0.4.7
jedi==0.9.0
mccabe==0.3.1
msgpack-python==0.4.6
neovim==0.0.38
pep257==0.6.0
pep8==1.5.7
pyflakes==0.8.1
Pygments==2.0.2
requests==2.7.0
six==1.9.0

Most importantly, requirements files serve several purposes:

  1. They ensure that our environments are consistent across different machines. Reproducible environments are crucial to preventing bugs, incompatibilities, and other breakage introduced by changes between versions of libraries.

  2. They communicate to our fellow developers what the code we’ve written relies on. Given the requirements file of a project, we can generally guess the kinds of things it’s going to do before we read one line of code. e.g. requests suggests that an application is going to communicate with HTTP servers, and msgpack-python tells us that we’ll probably be using msgpack as an interchange format.

  3. They allow for neat things like automated version update auditing from requires.io and (maybe one day) security alerts from Is it vulnerable?.

The Problem

Refer again to the file above. Where did all of those things come from? Here’s the command that created that environment:

pip install bpython flake8 jedi neovim pep257

Notice any differences? We asked for five packages and got fifteen back. This is actually exactly what we want for purposes #1 and #3. For both of those use cases, we want to know exactly what library versions our application is being deployed against. Our update notifications and security alerts are only any good when the auditing services are checking the versions running on our servers. However, it does very little to address point #2: providing relaying information from one person to another.

Here’s how that pip install resulted in that list of requirements:

$ pipdeptree
bpython==0.14.2
  - Pygments [installed: 2.0.2]
  - requests [installed: 2.7.0]
  - curtsies [required: >=0.1.18, installed: 0.1.19]
    - blessings [required: >=1.5, installed: 1.6]
  - greenlet [installed: 0.4.7]
  - six [required: >=1.5, installed: 1.9.0]
flake8==2.4.1
  - pyflakes [required: >=0.8.1, installed: 0.8.1]
  - mccabe [required: >=0.2.1, installed: 0.3.1]
  - pep8 [required: >=1.5.7, installed: 1.5.7]
jedi==0.9.0
neovim==0.0.38
  - msgpack-python [required: >=0.4.0, installed: 0.4.6]
  - greenlet [installed: 0.4.7]
pep257==0.6.0

It turns out that bpython, flake8, and neovim required a bunch of libraries. For a brand new virtualenv like this one, this is all pretty readable. Once we start looking at projects that have a lot of high level dependencies, making sense of this gets a lot harder. Additionally, when updates to libraries like bpython happen, e.g. when it drops support for old, outdated versions of Python, dependencies like six will be leftover, unused by any libraries but still sticking around with each new deployment.

The Solution

Here’s where I say something controversial within the Python community: Ruby got it right 2.

Bundler is a tool for managing Ruby environments; it can be seen as Ruby’s answer to virtualenv. In addition to isolating environments and installing dependencies, it provides separate files (Gemfile and Gemfile.lock) for human readable and machine readable 3 dependency specifications. My proposal: the Python community needs to take a similar approach.

While we need better tools 4 to address this problem, it’s largely a social one. Before we can solve this, we need to agree that the situation needs improvement. For now, I’m using the pip-compile command from pip-tools (this lets me keep separate requirements.in and requirements.txt files for my applications), and I think you should, too.

Footnotes:

Big thanks to @dirn, @PaulLogston, and @mattupstate for reviewing early versions of this post.

  1. Please curate your requirements files with more care than this. Simply dumping the output from pip freeze will likely lead to packages that are meant solely for development becoming permanent members of your deployment environments. 

  2. While I’m at it, so did node.js. NPM includes a command called shrinkwrap that produces a full, version-locked list of dependencies based on package.json. Because of how Python’s import system works, this would be incredibly difficult (if not impossible) to pull off. 

  3. Both of these files are actually in machine readable formats, but only the Gemfile addresses purpose #2 above. 

  4. There are several tools aside from pip-compile available. I considered each of these before finding and deciding on pip-tools. This list is here mainly as a reference for why each tool was not right for me.

    • pbundler: clone of Bundler, last updated in 2012.

    • pundle: clone of Bundler, reimplements standard Python tooling instead of working with it.

    • pundler: looks interesting; my second choice, but not as mature as pip-tools, and was broken with latest versions of pip when I last tried it (it tends to be difficult to use / write / maintain software that as a library when it was intended to be an application).