Niraj Zade | Website is a work in progress.

Why python needs virtual environments

2023 Feb 04  |  10 min read  |  tags: package-management 1 python 2 blog 4

Table Of Contents:

I was writing a python package manager to understand the issue. This lead to 3 core insights, which I now think are the roots of problems we have:

  1. Broken storage: Can store only one version of a package per environment
  2. Lack of feedback: No mechanism for python to check and tell mismatches of required package versions and available package versions at runtime
  3. Use of external files to manage package version being installed and used, that lead to an explosion of third party tools that manage isolated environments and sometimes package compatibility

The only solutions I could think of involve tightly coupling the language with the package manager.

I understand that separation of concerns is a thing, and that language and package management must be separate. Merging them into a single thing is ugly from a language point of view, but it is beautiful from the developer-experience point of view.

Does the dev experience really care about what's under the hood? I don't think so. As long as the outcome works well, I don't think the devs really care care. I don't think they care if the language and package management is tightly coupled, loosely coupled, or decoupled. Most of the people I've met only care about getting things done, and moving on to other things. Deciding how to build the system, to keep is manageable, is the problem of the system builders.

It is fine if the problem isn't solved. But it is not fine if the problem is not clearly visible.

Inability to store multiple package versions into a single environment

Consider a package flask. Assume that flask 1 and 2 are incompatible. So, scripts that use flask1 cannot use flask 2 without upgrading.

Now, Suppose you installed flask 1 and wrote a script old.py that runs on flask 1

# old.py
import flask
...

From where does python get flask? From the environment (in this case, from the global environment)

Later on, you wrote a script new.py that uses flask 2

# new.py
import flask
...

But after installing flask 2, you will see that flask 1 is gone.

If you look into the global environment, you'll see that pip stores packages like this:

packages/
└── flask/
    ├── flask_file1
    ├── flask_file2
    └── flask_file3

ONLY ONE VERSION OF FLASK CAN BE STORED AT A TIME. There is no mechanism to store multiple versions of flask.

After installing flask2, new.py runs fine because it gets flask2. However, old.py has stopped working. Why? because it needs flask1, which is now missing.

If you re-install flask1, it will replace flask2 in your packages directory. So at any point in time, either old.py or new.py can run. Both can never run at the same time.

How do we work around this problem?

Instead of using a global shared environment, we create a dedicated, isolated environment for every project. A virtual environment.

We use some package & env managers to get this done. But the fact that we have 13 major python package managers means that we still don't have a definitive solution.

Python's inability to warn us

Again, currently:

So, while running old.py, you have to activate old-env and then run the script. While running new.py, you have to activate new-env and then run the script.

You either need discipline of activating the correct environment for the correct script (which needs proper organization of environments), or you need to automate it.

Because when python runs old.py, it doesn't check if the environment is providing flask1 or flask2. It will just run the script. You'll only know about your error when your script crashes or doesn't give intended output (assuming you have startup checkups in place, which people normally don't).

Why doesn't python warn us? Because python has no awareness of what package version the script needs

We are importing packages like this:

# old.py
import flask
...

There is no mechanism to import packages like this:

# old.py
import flask==1
...

How do we work around this problem?

Python isn't going to tell us anything. So we have to make sure that we avoid making mistakes.

Currently we do it by using external files to co-ordinate package version in the environment. In this case, the file is the seed that installs packages in to the virtual environment.

Eg: using requirements.txt or pyproject.toml to make sure the env has the correct package version

# requirements.txt
flask==1

And exclusively installing dependencies through requirements.txt

pip install -r requirements.txt

And automating environment loading when opening a project in ide, or when executing a project on the server. Using things like automatic env loading in pycharm ide, or using bash scripts (that setup and load venv) to start the project on servers.

We got around the problem through discipline and automation. This isn't a good solution. Anything that breaks without discipline is just bad design.

Another glaring problem: package incompatibility checks

Since python cannot check for incompatibilities in scripts' requirements and available package versions in the environment, it cannot tell when we load 2 packages that are not compatible.

Eg: we have installed flask1, cli1, arrow2. flask1 needs cli1, but arrow2 needs cli2. Python won't catch this incompatibility that flask1 and arrow2 cannot run at the same time.

Who catches this incompatibility? The package managers does.

How is this incompatibility detected? Every package declares the version of dependencies it needs.

Eg:

The package maintainer is responsible for giving these ranges, and the package manager is responsible for checking compatibility of packages declared in requirements.txt.

However, look at the packages in your production projects. I bet there are too many packages that have given bullshit dependency version ranges, or have never updated dependency version ranges due to overburdened package maintainers who simply don't have the time to do this.

Look at the random package you installed from pypi. Maybe it doesn't even specify its dependency version ranges.

It is a proper shitshow. The ecosystem works on trust (blind trust?), hope and duct tape.

Solutions

It is fine if the problem isn't solved. But it is not fine if the problem is not clearly visible.

Possible solutions:

Finally, because the real world isn't perfect, add a flag to ignore incompatibility and run the program anyways. Set the flag and handle issues the same way ostriches do - bury our head in the sand and pretend that the problem doesn't exist.

Eg - app.py demands version 1, but a dependency of app.py demands version 2. Handle it the same way we handle it right now - don't handle it, and just run the code anyways.

Ending notes

Ask around and you'll find plenty of people who will passionately curse python's package management. Although these same people will curse package management in almost all languages. Package management is an unsolved problem. I am a part of these people.

This IS a problem. A proper big problem. Acknowledgement goes a long way.

I love python enough to use it, despite hating the mechanics of the ecosystem around it. It is just that good for the class of problems I focus on. Python inherently seems to think about user comfort and speed to solution. If you need some convincing, use something like c++ or java for project euler problems, then try using python.

I think I lack the competence in language design & development to attempt a solution. I for sure lack the passion for language design. This isn't just a hard technical problem, it is also a soft operations problem.

But it doesn't take much smartness to see the problem and its causes.

All Articles

Blog

Resources