Why python needs virtual environments

2023 Feb 04  |  10 min read  |  tags: package-management (1) python (2) blog (4)

I was writing a python package manager to understand the issue. This lead to 3 core insights, which I now think are the roots of problems we have:

  1. Broken storage: Can store only one version of a package per environment
  2. Lack of feedback: No mechanism for python to check and tell mismatches of required package versions and available package versions at runtime
  3. Use of external files to manage package version being installed and used, that lead to an explosion of third party tools that manage isolated environments and sometimes package compatibility

The only solutions I could think of involve tightly coupling the language with the package manager.

I understand that separation of concerns is a thing, and that language and package management must be separate. Merging them into a single thing is ugly from a language point of view, but it is beautiful from the developer-experience point of view.

Does the dev experience really care about what's under the hood? I don't think so. As long as the outcome works well, I don't think the devs really care care. I don't think they care if the language and package management is tightly coupled, loosely coupled, or decoupled. Most of the people I've met only care about getting things done, and moving on to other things. Deciding how to build the system, to keep is manageable, is the problem of the system builders.

It is fine if the problem isn't solved. But it is not fine if the problem is not clearly visible.

Inability to store multiple package versions into a single environment

Consider a package flask. Assume that flask 1 and 2 are incompatible. So, scripts that use flask1 cannot use flask 2 without upgrading.

Now, Suppose you installed flask 1 and wrote a script old.py that runs on flask 1

# old.py
import flask
...

From where does python get flask? From the environment (in this case, from the global environment)

Later on, you wrote a script new.py that uses flask 2

# new.py
import flask
...

But after installing flask 2, you will see that flask 1 is gone.

If you look into the global environment, you'll see that pip stores packages like this:

packages/
└── flask/
    ├── flask_file1
    ├── flask_file2
    └── flask_file3

ONLY ONE VERSION OF FLASK CAN BE STORED AT A TIME. There is no mechanism to store multiple versions of flask.

After installing flask2, new.py runs fine because it gets flask2. However, old.py has stopped working. Why? because it needs flask1, which is now missing.

If you re-install flask1, it will replace flask2 in your packages directory. So at any point in time, either old.py or new.py can run. Both can never run at the same time.

How do we work around this problem?

Instead of using a global shared environment, we create a dedicated, isolated environment for every project. A virtual environment.

  • old.py uses a virtual environment old-env that contains flask 1
  • new.py uses a virtual environment new-env that contains flask 2

We use some package & env managers to get this done. But the fact that we have 13 major python package managers means that we still don't have a definitive solution.

Python's inability to warn us

Again, currently:

  • old.py uses an isolated environment old-env that contains flask 1
  • new.py uses an isolated environment new-env that contains flask 2

So, while running old.py, you have to activate old-env and then run the script. While running new.py, you have to activate new-env and then run the script.

You either need discipline of activating the correct environment for the correct script (which needs proper organization of environments), or you need to automate it.

Because when python runs old.py, it doesn't check if the environment is providing flask1 or flask2. It will just run the script. You'll only know about your error when your script crashes or doesn't give intended output (assuming you have startup checkups in place, which people normally don't).

Why doesn't python warn us? Because python has no awareness of what package version the script needs

We are importing packages like this:

# old.py
import flask
...

There is no mechanism to import packages like this:

# old.py
import flask==1
...

How do we work around this problem?

Python isn't going to tell us anything. So we have to make sure that we avoid making mistakes.

Currently we do it by using external files to co-ordinate package version in the environment. In this case, the file is the seed that installs packages in to the virtual environment.

Eg: using requirements.txt or pyproject.toml to make sure the env has the correct package version

# requirements.txt
flask==1

And exclusively installing dependencies through requirements.txt

pip install -r requirements.txt

And automating environment loading when opening a project in ide, or when executing a project on the server. Using things like automatic env loading in pycharm ide, or using bash scripts (that setup and load venv) to start the project on servers.

We got around the problem through discipline and automation. This isn't a good solution. Anything that breaks without discipline is just bad design.

Another glaring problem: package incompatibility checks

Since python cannot check for incompatibilities in scripts' requirements and available package versions in the environment, it cannot tell when we load 2 packages that are not compatible.

Eg: we have installed flask1, cli1, arrow2. flask1 needs cli1, but arrow2 needs cli2. Python won't catch this incompatibility that flask1 and arrow2 cannot run at the same time.

Who catches this incompatibility? The package managers does.

How is this incompatibility detected? Every package declares the version of dependencies it needs.

Eg:

  • flask1 declares that it works only with cli 1
    • this range has to be given by the developers of flask after rigorous testing
  • arrow2 declares that it works only with cli 2 and above
    • this range has to be given by the developers of arrow after rigorous testing

The package maintainer is responsible for giving these ranges, and the package manager is responsible for checking compatibility of packages declared in requirements.txt.

However, look at the packages in your production projects. I bet there are too many packages that have given bullshit dependency version ranges, or have never updated dependency version ranges due to overburdened package maintainers who simply don't have the time to do this.

Look at the random package you installed from pypi. Maybe it doesn't even specify its dependency version ranges.

It is a proper shitshow. The ecosystem works on trust (blind trust?), hope and duct tape.

Solutions

It is fine if the problem isn't solved. But it is not fine if the problem is not clearly visible.

Possible solutions:

  • Make the problems visible. Move the package version awareness as far down the stack as possible. In this case, give python itself awareness of required package versions (additions to import syntax)
    • This causes great feedbacks
      • Once this is possible, tools like static analyzers will be able to pick up discrepancies in the script and the provided environment (in this case, the dev environment).
      • If the incompatibility still slides in, python will throw up the package incompatibilities warnings at runtime, right when a package is imported.
    • This thought will irritate the language designers, because they prefer to isolate the language from the package managers. In python's case, the language and its package manager have been kept isolated. Python's package manager is a shitshow. But it probably ended up like this because of conscious decisions that I'm obviously not aware of.
  • Allow pip to check the version of currently installed packages (pip has this functionality right now)
  • Finally, because the real world isn't perfect, add a flag to ignore incompatibility and run the program anyways.

Eg - app.py demands version 1, but a dependency of app.py demands version 2. Handle it the same way we handle it right now - don't handle it, and just run the code anyways.

Ending notes

Ask around and you'll find plenty of people who will passionately curse python's package management. Although these same people will curse package management in almost all languages. Package management is an unsolved problem. I am a part of these people.

This IS a problem. A proper big problem. Acknowledgement goes a long way.

I love python enough to use it, despite hating the mechanics of the ecosystem around it. It is just that good for the class of problems I focus on. Python inherently seems to think about user comfort and speed to solution. If you need some convincing, use something like c++ or java for project euler problems, then try using python.

I think I lack the competence in language design & development to attempt a solution. I for sure lack the passion for language design. This isn't just a hard technical problem, it is also a soft operations problem.

But it doesn't take much smartness to see the problem and its causes.