The Package Dependency Blues

Posted by
on under

Today I'm going to tell you a story. This is a true story about a web developer that I will call Peter.

Peter is the author of a web application that I will call App. He wrote App in Python, using a relatively popular web framework and a handful of packages that extend the framework with additional features. The actual names of these packages aren't important to the story, so I will call the framework Foo, and one of its extensions Bar.

It turns out that Bar, the add-on to Foo, has a dependency of its own on another package that I will call Baz.

When Peter began working on App he did what most of us would have done. He created a virtualenv, activated it, and then used pip (the Python package manager) to install the dependencies he needed. Here is the actual command he used:

$ pip install Foo Bar

Note that he did not need to install Baz, he actually did not even know he needed Baz. The Python package system allows a package to specify what its dependencies are, and the pip installer is pretty smart about these things. When pip installed Bar it found out that Bar declares Baz as a dependency, and as a result it installed Baz as well.

Peter spent a few weeks working on App with everything going smoothly, until one day he reached a milestone. That day he wrote several unit tests that ensured that all the functions in App were working properly and also wrote a readme page where he explained how to install and use App. He even included detailed instructions on how to setup a virtual environment with the packages App depends on. He then pushed App out to his github repository, patted himself on the back for a job well done, and feeling a sense of accomplishment moved on to work on other projects.

Peter went on with his life for a while, until one day he received an email from github, and he could not believe his eyes when he read it: someone had filed a bug against App.

A bug? How can it be? Peter took every precaution in the book to ensure that App worked as expected. And here is this stranger saying that App throws an exception right after it starts.

Peter still had App on his computer, so he quickly tested it and confirmed that it still worked fine. He also found that all the unit tests passed. He assumed this person was inexperienced and made some sort of installation mistake.

A couple of days later someone else commented on the bug. This new person was also having problems running App, but was more knowledgeable than the first and theorized that the problem could be caused by a new version of package Baz that was released a few days before.

Peter checked and sure enough, Baz had a recent release that changed a few things. In particular a class that existed in the previous release was now removed and replaced with a different class. A quick inspection of the code in package Bar showed that the class that was removed was used. That was the source of the exception.

The problem was now very clear in Peter's mind. This wasn't his fault, the developer of Bar just needed to issue a new release that worked with the new version of Baz, and then everything would be fine again. So he went to project Bar's github page and logged a bug against Bar.

But Peter was bothered by the open bug he had on App. Since he considered he had done nothing wrong he decided to close the bug explaining that the problem wasn't his. It made him feel a bit better seeing that App went back to having no open bugs.

A week later the bug that Peter filed against project Bar remained unattended. Worried about the lack of urgency he went to check the commit log for the project and found that there had not been any commits to Bar in almost a year. Project Bar looked like a dead or abandoned project. To make matters worse, another App user, unaware of the latest developments, wrote a new bug against App for the same issue.

The story ends with Peter feeling trapped and powerless, thinking that Python's package management is broken.

The Problem With Dependencies

Do you identify with Peter? I certainly do. I suffered the "package dependency blues" many times myself. Many consider this an unavoidable risk that developers just have to accept.

Peter didn't know this at the time, but he could have handled things in a better way. Of course he could not have prevented the changes in the dependent project, but he could have done a better job defining the dependencies of his own project. The fact that project Baz released a new version does not mean that App needs to adopt it.

Let's begin with a review of where dependencies are specified in Python. There are two different places where package dependencies can be written: the setup.py file and the requirements.txt file. Each has different purposes.

The setup.py file

The setup.py script contains the description of a package. All Python packages that are registered with PyPI (the PYthon Package Index) need to have a setup script in their root folder, because installation tools like pip read it to know how the package needs to be installed.

The section inside the setup script that describes package dependencies is called install_requires. Dependencies are specified as a list of strings, with each string containing the name of a package plus optionally one or more version specifiers to restrict the range of supported versions.

As an example, here is the package dependency specification for Flask 0.10:

install_requires=[
    'Werkzeug>=0.7',
    'Jinja2>=2.4',
    'itsdangerous>=0.21'
]

As you can see, this is a "loose" mechanism to define dependencies. Versions aren't called directly but instead ranges of accepted versions are specified.

The requirements.txt file

The other dependency definition mechanism is the requirements.txt file. This is a regular text file with one package per line, usually accompanied by an exact version number.

This is an example requirements.txt file:

Flask==0.9
Flask-Login==0.1.3
Flask-Mail==0.8.2
Flask-OpenID==1.1.1
Flask-SQLAlchemy==0.16
Flask-WTF==0.8.3

The packages in a requirements file are not automatically installed like those in a setup script. The requirements file is installed manually by the user using pip. Here is a command to install a requirements file:

$ pip install -r requirements.txt

It is also possible to generate a requirements file automatically from the contents of the virtual environment:

$ pip freeze > requirements.txt

Specifying Dependencies for an Application

If you are building an application, like Peter, then the best way to advertise your dependencies is through a requirements.txt file. The installation instructions for your application should just ask that the requirements file is installed with pip. Since the requirements file includes exact version numbers for all dependencies, everybody gets the same versions of all the packages.

Going back to Peter's example, what he should have done before pushing App to github is the following:

$ pip freeze > requirements.txt

The contents of Peter's requirements file might have looked like this:

$ cat requirements.txt
Foo==0.7
Bar==1.0
Baz==2.6

Peter didn't directly use project Baz in App, but note that this project is mentioned in the requirements file anyway. This is very important, because Baz is an indirect dependency for App. Since it is a dependency, it also needs to be locked down to a version that is known to work.

The day project Baz released a major update as, say, version 3.0 nothing would have changed for Peter and his App project. His requirements file would have still requested version 2.6, so that's the version that pip would have installed. This alone could have solved all of Peter problems with dependencies!

Specifying dependencies for a reusable component

The best way to specify dependencies for a reusable component is through the setup.py file. Reusable components are, by definition, going to be imported as dependencies by other projects, and you want pip to be able to sort out the dependencies for parent projects automatically, without giving the developer the extra work of having to install indirect dependencies manually.

Let's look at what project Bar's install_requires section of the setup script in Peter's example might have looked like:

install_requires=[
    "Baz"
]

And this is pretty bad. Bar is saying that any release of Baz will do as a dependency. But as a developer of a component you do not want to open up to such risk.

If the Bar developer only verified that the project works with Baz version 2.6 a sure way to not get exposed to dependency problems is to request that version explicitly:

install_requires=[
    "Baz==2.6"
]

But while requesting explicit versions for applications is a very good idea, for reusable components it is less so. The problem is that if every project requests specific versions of its dependencies the risk of having a dependency conflict increases. In the example above Bar wants version 2.6 of Baz. What would happen if project Foo also depended on Baz but requested version 2.5 in its setup script? With a conflict like that pip would not be able to resolve the dependencies and would just fail.

To avoid dependency conflicts it is expected that reusable components offer some amount of flexibility in their dependency declaration, so that package managers like pip can have some room to figure out a set of versions that work for all the packages.

An improvement would be to define Bar's dependencies with a lower bound:

install_requires=[
    "Baz>=2.0"
]

So now pip will never accept Baz 1.x as valid. If the developer of Bar only tested version 2.6, then it makes no sense to allow an older release that may or may not work.

But is it okay to leave the upper side unbound?

In most cases it is not. Only for extremely reputable projects that have a track history of not making changes that can break existing applications it might be okay to leave an open upper bound. You as the component developer would have to evaluate your risks if you decide to do that.

In almost all cases, however, it is a much better idea to have an upper bound for all your dependencies.

Projects typically change version numbers in a more dramatic way when they introduce incompatibilities with existing applications. For some projects this means a change in the major version component, for others it may be a change in the major or minor version numbers. You'll have to figure out what the version style of your dependencies is to look for a safe upper bound.

For Bar's project it would have been useful to have dependencies specified like this:

install_requires=[
    "Baz>=2.0, <3.0"
]

With this dependency declaration Bar can get minor updates to Baz, but not major updates, so it is open to receive bug fixes and small improvements, but not major changes that may require code changes.

Eventually there will be users out there that may want to have a version of Bar that works with Baz 3.x. These users will submit feature requests to Bar's developer instead of bug reports. And most importantly, they would not think that project Bar is broken.

Dependencies in the Real World

In an ideal world all application and component developers declare their dependencies in a reliable way, and as a result there are no problems with package dependencies. Unfortunately that is not the world we live in.

You can be extremely careful in the way you design your own project dependencies, but there is always the chance that some of those dependencies will not declare their own dependencies in a foolproof way.

If your project is an application then you are safe, because your requirements.txt file should list the versions of every dependency you have, including indirect ones.

You will more likely be affected if your project is a reusable component, because as discussed above, for this type of project you only list the direct dependencies with version ranges. You just need one of your dependencies to be sloppy with its own dependencies and that could cause your project to fail at some point in the future.

If this happens to you then the first thing you should do is report the problem to the appropriate project administrator.

I think it is also important for a project reputation to have the perception of being stable, so having your project temporarily broken because of a third party dependency is not acceptable. You can claim that the problem is not yours, but people will be forced to look elsewhere if your project does not work.

An emergency measure you can take to restore your project is to force the indirect dependency to load a version that is known to work in your setup.py script.

For example, if there was a component that depended on the Bar project from Peter's example, its setup script could have declared the dependency as follows:

install_requires=[
    "Bar>=1.0, <2.0"
]

This dependency declaration is well specified, but the day project Baz goes from 2.6 to 3.0 your component will break if project Bar's dependencies aren't too specific on the versions of Baz required.

To address the problem on your side you could just change your setup script to add Baz as a dependency:

install_requires=[
    "Baz>=2.x, <3.0",
    "Bar>=1.0, <2.0"
]

This is not ideal because your project does not really have a dependency on Baz, but if that's what you have to do to keep your project running, so be it. You can always remove the dependency once things settle and project Bar is fixed.

Conclusion

I hope you found this article useful in understanding how to work with dependencies in Python. Before I end the article I'll leave you a summary of the take-away points:

  • If you develop an application
    • Include a requirements.txt file in the root of your project, naming all your dependencies (direct and indirect) with the explicit versions that you have tested.
    • Document how to install dependencies using this requirements file.
  • If you develop a reusable component
    • Include a install_requires clause in your setup.py file, listing only your direct dependencies.
    • Always define a lower bound version for each dependency.
    • Unless you have a good reason not to, also define an upper bound version for each dependency. Use common sense to decide what the upper bound for each dependency needs to be.
    • To help parent projects decide their dependencies, document how your versions will change when you introduce an incompatible change. For example, say that whenever incompatible changes will be introduced the major version number will be increased.
    • If you ship your component with an example application, include a requirements.txt file for the example, so that at least there is a record of a set of specific versions that are known to work with your component.

If there are any aspects of version dependencies that you think I haven't covered please let me know below in the comments!

Miguel

Become a Patron!

Hello, and thank you for visiting my blog! If you enjoyed this article, please consider supporting my work on this blog on Patreon!

23 comments
  • #1 LDPG said

    My biggest grief with package management in Python is that you can't specify a local repository or set of directories for packages. Sometimes your app is on a system that isn't connected to the outside internet or for security reasons shouldn't / isn't allowed to download software from external third parties.

  • #2 Marius Gedminas said

    My biggest grief with package management in Python is that you can't specify a local repository or set of directories for packages.

    What? Yes, you can. pip -f /path/to/local/directory/full/of/packages install whatever-you-want.

  • #3 Charles Doutriaux said

    This is all fine in a pure Python world. We're developing Python packages that rely on "non python" library. The dependencies blues at that point turns quickly inot dependencies nightmare... Especially when your app is deployed on newer/older system with "many" version of needed libraries and/or in non-standard locations... (/opt/usr/usr/local,...)
    CMake really helps but it is still not perfect.
    Just my two bits.

  • #4 Charle Doutriaux said

    Glad you mentioned pip rather than easy_install though! It does a much better job!

  • #5 A. Jesse Jiryu Davis said

    Great post, thanks for the advice.

  • #6 Thomas Güttler said

    Thank you for this post. It explains setup.py vs requirements.txt very good. Unfortunately the python world has not settled on the words. In the Django world a "project" is the stuff where you only have configs (and a requirements.txt file). I personaly prefere the world "library" to "reusable app".

  • #7 Travis Oliphant said

    For those looking for a better approach to managing dependencies for Python packages that also includes support for non-Python binary dependencies, please see conda.pydata.org. It is free and OSS but does have commercial support at http://www.continuum.io

  • #8 Nathan Hardy said

    Thank you for your article, it really helps.

    And, how about this situation? We are using a opensource pip package, while we cloned it to local and modified some code because we were not satisfied with some of its feature or implement. Then we use it by "pip install git+xxx". What should be the version number of this new copy? We just want to use it locally.

    Thank you

  • #9 Miguel Grinberg said

    @Nathan: you should have your version of this package in your own github repository, and then include the git+xxx location in your requirements.txt file. You may also want to rename the project, so that it is clear that you are using your own fork. In my opinion the version number is not the biggest problem, you should keep it close to the version that originated your fork, maybe add a suffix to it with your own versions, maybe a .0 that you can then increment if you need to make more changes in your fork.

  • #10 useless said

    holy crap, that was the best article i've found talking about setup.py VS requirements.txt

    well done sir.

  • #11 Matt Hall said

    Great article, thank you.

    I see a lot of people read the requirements.txt file in setup.py, via something like

    with open("requirements.txt", "r") as f:
        requirements = f.read().splitlines()
    

    and pass it to install_requires. I've also seen people use pip.req.parse_requirements to the same end (creating another dependency, but without having to worry about comments etc).

    I wondered what you think of these approaches, and if there's a reason you didn't mention them in your post. I guess I'd like to avoid having to maintain two lists of requirements...

  • #12 Miguel Grinberg said

    @Matt: As I explain in the article, the use cases for setup.py and requirements.txt are different.

    I prefer to give a little bit of flexibility in dependencies given in the setup.py file. If you have a requirements.txt style dependency list, I definitely do want to allow minor version differences in the dependencies. The stricter you are when you specify dependencies in setup.py the more you risk creating a situation where the dependencies of your package are incompatible with the dependencies of some other package.

    However, if you are building an application, you want to be absolutely sure the user does not experience any problems, so giving explicitly tested versions of each dependency makes sense. Here you can expect the user will use a dedicated virtual environment, so the chances of collisions with other packages are much smaller.

  • #13 qeesung said

    awesome!!! thanks a lot

  • #14 Giacomo Debidda said

    Great article Michael, thanks for sharing.

    What do you think about the pip workflow proposed by Kenneth Reitz in this article?
    https://www.kennethreitz.org/essays/a-better-pip-workflow

    He proposes to use 2 requirement files:

    1. requirements-to-freeze.txt is used to specify your top-level dependencies, and any explicit versions you need to specify.
    2. requirements.txt contains the output of pip freeze after pip install requirements-to-freeze.txt has been run.
  • #15 Miguel Grinberg said

    @Giacomo: I think that is an improvement, but it still requires you to manage the requirements file. The Pipfile spec (https://github.com/pypa/pipfile) is better in that the detailed requirements will be managed automatically by pip.

  • #16 Michael Warkentin said

    Kenneth Reitz followed up that article with a new tool, pipenv: https://pipenv.readthedocs.io/

    It uses pipfile/pipfile.lock to simplify application requirement management while still being able to lock down explicit versions of sub-dependencies.. I've been meaning to try it out in more depth soon..

  • #17 Miguel Grinberg said

    @Michael: Yes, I will probably look at pipenv when it stabilizes and Kenneth stops putting out so many new releases a day. On one side I think is great that he is putting out so much time, but a codebase that changes so much worries me a bit, I'm not sure I want to use it on production yet.

  • #18 Stefano Borini said

    Great article, however note that applications are also potentially installed via pip, so there's no way to access the requirements.txt file anyway.

  • #19 Miguel Grinberg said

    @Stefano: the concept is that for applications you need to have pinned packages to avoid problems. You can use the same version constraints that you put in requirements.txt in the setup.py file. Many applications in fact generate the setup.py requirements by parsing the requirements.txt file.

  • #20 radek wojcik said

    Just reference your direct deps in your requirements.txt file.. do not use pip freeze. You will get into a whole slew of problems, dependency madness! Trust your 3rd party modules and pip to do the dependency resolution. Have good tests before releasing your code to prod and if you want yes do a pip freeze reqs file and compare the two as a sanity as part of your release. Look to big players in the industry i.e. mozilla - https://github.com/mozilla/bedrock/tree/master/requirements to see what they are doing. They don't add post deps to their reqs file, but they do split it up nicely into base, dev, prod etc..

  • #21 Miguel Grinberg said

    @radek: I disagree 100% with you. If you don't list indirect dependencies, then each installation of your project will end up having different versions. Sorry, but if you find it difficult to work with a requirements.txt file you should use another installer, maybe pipenv or poetry.

  • #22 Paddy Mccrudden said

    Thanks - nice article. One thing I'd like to get a better handle on is how to test that we can run all versions within the various ranges of the setup.py file. Typically, I'd run tests using pytest with requirements.txt, and more recently running tests with tox. It is easy to manage different versions of python in tox, but how should I ensure that if I say pandas >= 1.1, < 1.2 (for example), then this is something that I actually test.

  • #23 Miguel Grinberg said

    @Paddy: tox allows you to create a text matrix that can include a list of Python versions, and a list of pandas versions, and it will run all the combinations of them. See https://tox.readthedocs.io/en/latest/example/basic.html#compressing-dependency-matrix.

Leave a Comment