The Package Dependency Blues

Today I'm going to tell you a story. This is a true story about a web developer that I will call Peter.

Peter is the author of a web application that I will call App. He wrote App in Python, using a relatively popular web framework and a handful of packages that extend the framework with additional features. The actual names of these packages aren't important to the story, so I will call the framework Foo, and one of its extensions Bar.

It turns out that Bar, the add-on to Foo, has a dependency of its own on another package that I will call Baz.

When Peter began working on App he did what most of us would have done. He created a virtualenv, activated it, and then used pip (the Python package manager) to install the dependencies he needed. Here is the actual command he used:

$ pip install Foo Bar

Note that he did not need to install Baz, he actually did not even know he needed Baz. The Python package system allows a package to specify what its dependencies are, and the pip installer is pretty smart about these things. When pip installed Bar it found out that Bar declares Baz as a dependency, and as a result it installed Baz as well.

Peter spent a few weeks working on App with everything going smoothly, until one day he reached a milestone. That day he wrote several unit tests that ensured that all the functions in App were working properly and also wrote a readme page where he explained how to install and use App. He even included detailed instructions on how to setup a virtual environment with the packages App depends on. He then pushed App out to his github repository, patted himself on the back for a job well done, and feeling a sense of accomplishment moved on to work on other projects.

Peter went on with his life for a while, until one day he received an email from github, and he could not believe his eyes when he read it: someone had filed a bug against App.

A bug? How can it be? Peter took every precaution in the book to ensure that App worked as expected. And here is this stranger saying that App throws an exception right after it starts.

Peter still had App on his computer, so he quickly tested it and confirmed that it still worked fine. He also found that all the unit tests passed. He assumed this person was inexperienced and made some sort of installation mistake.

A couple of days later someone else commented on the bug. This new person was also having problems running App, but was more knowledgeable than the first and theorized that the problem could be caused by a new version of package Baz that was released a few days before.

Peter checked and sure enough, Baz had a recent release that changed a few things. In particular a class that existed in the previous release was now removed and replaced with a different class. A quick inspection of the code in package Bar showed that the class that was removed was used. That was the source of the exception.

The problem was now very clear in Peter's mind. This wasn't his fault, the developer of Bar just needed to issue a new release that worked with the new version of Baz, and then everything would be fine again. So he went to project Bar's github page and logged a bug against Bar.

But Peter was bothered by the open bug he had on App. Since he considered he had done nothing wrong he decided to close the bug explaining that the problem wasn't his. It made him feel a bit better seeing that App went back to having no open bugs.

A week later the bug that Peter filed against project Bar remained unattended. Worried about the lack of urgency he went to check the commit log for the project and found that there had not been any commits to Bar in almost a year. Project Bar looked like a dead or abandoned project. To make matters worse, another App user, unaware of the latest developments, wrote a new bug against App for the same issue.

The story ends with Peter feeling trapped and powerless, thinking that Python's package management is broken.

The Problem With Dependencies

Do you identify with Peter? I certainly do. I suffered the "package dependency blues" many times myself. Many consider this an unavoidable risk that developers just have to accept.

Peter didn't know this at the time, but he could have handled things in a better way. Of course he could not have prevented the changes in the dependent project, but he could have done a better job defining the dependencies of his own project. The fact that project Baz released a new version does not mean that App needs to adopt it.

Let's begin with a review of where dependencies are specified in Python. There are two different places where package dependencies can be written: the setup.py file and the requirements.txt file. Each has different purposes.

The setup.py file

The setup.py script contains the description of a package. All Python packages that are registered with PyPI (the PYthon Package Index) need to have a setup script in their root folder, because installation tools like pip read it to know how the package needs to be installed.

The section inside the setup script that describes package dependencies is called install_requires. Dependencies are specified as a list of strings, with each string containing the name of a package plus optionally one or more version specifiers to restrict the range of supported versions.

As an example, here is the package dependency specification for Flask 0.10:

install_requires=[
    'Werkzeug>=0.7',
    'Jinja2>=2.4',
    'itsdangerous>=0.21'
]

As you can see, this is a "loose" mechanism to define dependencies. Versions aren't called directly but instead ranges of accepted versions are specified.

The requirements.txt file

The other dependency definition mechanism is the requirements.txt file. This is a regular text file with one package per line, usually accompanied by an exact version number.

This is an example requirements.txt file:

Flask==0.9
Flask-Login==0.1.3
Flask-Mail==0.8.2
Flask-OpenID==1.1.1
Flask-SQLAlchemy==0.16
Flask-WTF==0.8.3

The packages in a requirements file are not automatically installed like those in a setup script. The requirements file is installed manually by the user using pip. Here is a command to install a requirements file:

$ pip install -r requirements.txt

It is also possible to generate a requirements file automatically from the contents of the virtual environment:

$ pip freeze > requirements.txt

Specifying Dependencies for an Application

If you are building an application, like Peter, then the best way to advertise your dependencies is through a requirements.txt file. The installation instructions for your application should just ask that the requirements file is installed with pip. Since the requirements file includes exact version numbers for all dependencies, everybody gets the same versions of all the packages.

Going back to Peter's example, what he should have done before pushing App to github is the following:

$ pip freeze > requirements.txt

The contents of Peter's requirements file might have looked like this:

$ cat requirements.txt
Foo==0.7
Bar==1.0
Baz==2.6

Peter didn't directly use project Baz in App, but note that this project is mentioned in the requirements file anyway. This is very important, because Baz is an indirect dependency for App. Since it is a dependency, it also needs to be locked down to a version that is known to work.

The day project Baz released a major update as, say, version 3.0 nothing would have changed for Peter and his App project. His requirements file would have still requested version 2.6, so that's the version that pip would have installed. This alone could have solved all of Peter problems with dependencies!

Specifying dependencies for a reusable component

The best way to specify dependencies for a reusable component is through the setup.py file. Reusable components are, by definition, going to be imported as dependencies by other projects, and you want pip to be able to sort out the dependencies for parent projects automatically, without giving the developer the extra work of having to install indirect dependencies manually.

Let's look at what project Bar's install_requires section of the setup script in Peter's example might have looked like:

install_requires=[
    "Baz"
]

And this is pretty bad. Bar is saying that any release of Baz will do as a dependency. But as a developer of a component you do not want to open up to such risk.

If the Bar developer only verified that the project works with Baz version 2.6 a sure way to not get exposed to dependency problems is to request that version explicitly:

install_requires=[
    "Baz==2.6"
]

But while requesting explicit versions for applications is a very good idea, for reusable components it is less so. The problem is that if every project requests specific versions of its dependencies the risk of having a dependency conflict increases. In the example above Bar wants version 2.6 of Baz. What would happen if project Foo also depended on Baz but requested version 2.5 in its setup script? With a conflict like that pip would not be able to resolve the dependencies and would just fail.

To avoid dependency conflicts it is expected that reusable components offer some amount of flexibility in their dependency declaration, so that package managers like pip can have some room to figure out a set of versions that work for all the packages.

An improvement would be to define Bar's dependencies with a lower bound:

install_requires=[
    "Baz>=2.0"
]

So now pip will never accept Baz 1.x as valid. If the developer of Bar only tested version 2.6, then it makes no sense to allow an older release that may or may not work.

But is it okay to leave the upper side unbound?

In most cases it is not. Only for extremely reputable projects that have a track history of not making changes that can break existing applications it might be okay to leave an open upper bound. You as the component developer would have to evaluate your risks if you decide to do that.

In almost all cases, however, it is a much better idea to have an upper bound for all your dependencies.

Projects typically change version numbers in a more dramatic way when they introduce incompatibilities with existing applications. For some projects this means a change in the major version component, for others it may be a change in the major or minor version numbers. You'll have to figure out what the version style of your dependencies is to look for a safe upper bound.

For Bar's project it would have been useful to have dependencies specified like this:

install_requires=[
    "Baz>=2.0, <3.0"
]

With this dependency declaration Bar can get minor updates to Baz, but not major updates, so it is open to receive bug fixes and small improvements, but not major changes that may require code changes.

Eventually there will be users out there that may want to have a version of Bar that works with Baz 3.x. These users will submit feature requests to Bar's developer instead of bug reports. And most importantly, they would not think that project Bar is broken.

Dependencies in the Real World

In an ideal world all application and component developers declare their dependencies in a reliable way, and as a result there are no problems with package dependencies. Unfortunately that is not the world we live in.

You can be extremely careful in the way you design your own project dependencies, but there is always the chance that some of those dependencies will not declare their own dependencies in a foolproof way.

If your project is an application then you are safe, because your requirements.txt file should list the versions of every dependency you have, including indirect ones.

You will more likely be affected if your project is a reusable component, because as discussed above, for this type of project you only list the direct dependencies with version ranges. You just need one of your dependencies to be sloppy with its own dependencies and that could cause your project to fail at some point in the future.

If this happens to you then the first thing you should do is report the problem to the appropriate project administrator.

I think it is also important for a project reputation to have the perception of being stable, so having your project temporarily broken because of a third party dependency is not acceptable. You can claim that the problem is not yours, but people will be forced to look elsewhere if your project does not work.

An emergency measure you can take to restore your project is to force the indirect dependency to load a version that is known to work in your setup.py script.

For example, if there was a component that depended on the Bar project from Peter's example, its setup script could have declared the dependency as follows:

install_requires=[
    "Bar>=1.0, <2.0"
]

This dependency declaration is well specified, but the day project Baz goes from 2.6 to 3.0 your component will break if project Bar's dependencies aren't too specific on the versions of Baz required.

To address the problem on your side you could just change your setup script to add Baz as a dependency:

install_requires=[
    "Baz>=2.x, <3.0",
    "Bar>=1.0, <2.0"
]

This is not ideal because your project does not really have a dependency on Baz, but if that's what you have to do to keep your project running, so be it. You can always remove the dependency once things settle and project Bar is fixed.

Conclusion

I hope you found this article useful in understanding how to work with dependencies in Python. Before I end the article I'll leave you a summary of the take-away points:

  • If you develop an application
    • Include a requirements.txt file in the root of your project, naming all your dependencies (direct and indirect) with the explicit versions that you have tested.
    • Document how to install dependencies using this requirements file.
  • If you develop a reusable component
    • Include a install_requires clause in your setup.py file, listing only your direct dependencies.
    • Always define a lower bound version for each dependency.
    • Unless you have a good reason not to, also define an upper bound version for each dependency. Use common sense to decide what the upper bound for each dependency needs to be.
    • To help parent projects decide their dependencies, document how your versions will change when you introduce an incompatible change. For example, say that whenever incompatible changes will be introduced the major version number will be increased.
    • If you ship your component with an example application, include a requirements.txt file for the example, so that at least there is a record of a set of specific versions that are known to work with your component.

If there are any aspects of version dependencies that you think I haven't covered please let me know below in the comments!

Miguel

7 comments

  • #1 LDPG said :

    My biggest grief with package management in Python is that you can't specify a local repository or set of directories for packages. Sometimes your app is on a system that isn't connected to the outside internet or for security reasons shouldn't / isn't allowed to download software from external third parties.

  • #2 Marius Gedminas said :

    > My biggest grief with package management in Python is that you can't specify a local repository or set of directories for packages. What? Yes, you can. pip -f /path/to/local/directory/full/of/packages install whatever-you-want.

  • #3 Charles Doutriaux said :

    This is all fine in a pure Python world. We're developing Python packages that rely on "non python" library. The dependencies blues at that point turns quickly inot dependencies nightmare... Especially when your app is deployed on newer/older system with "many" version of needed libraries and/or in non-standard locations... (/opt/usr/usr/local,...) CMake really helps but it is still not perfect. Just my two bits.

  • #4 Charle Doutriaux said :

    Glad you mentioned pip rather than easy_install though! It does a much better job!

  • #5 A. Jesse Jiryu Davis said :

    Great post, thanks for the advice.

  • #6 Thomas Güttler said :

    Thank you for this post. It explains setup.py vs requirements.txt very good. Unfortunately the python world has not settled on the words. In the Django world a "project" is the stuff where you only have configs (and a requirements.txt file). I personaly prefere the world "library" to "reusable app".

  • #7 Travis Oliphant said :

    For those looking for a better approach to managing dependencies for Python packages that also includes support for non-Python binary dependencies, please see conda.pydata.org. It is free and OSS but does have commercial support at http://www.continuum.io

Leave a Comment

Note: all comments are screened before they are published. Thank you for your patience!