It’s easy to train a Mask RCNN with detectron2, isn’t it? I haven’t seen a quicker solution to train a segmentation models and there’s barely any hassle. However, detectron2 has been a pain in production. First, because it seems to have strict installation requirements. Which I’m going to talk about here. Second, memory leaks if not served! Third, lack of flexibility to customise (quantize, prune, etc.) unlike Vanilla torch models. The off the shelf solutions for scripting or tracing and pushes you into an hell hole of bugs.

Today, I wanted to profile one of our detectron model that is in production. To test against a newly trained model. Unsurprisingly, I ran into installation issues. To be honest, in hindsight, I have made silly errors. Maybe have been ignorant. But can’t help but blame detectron2 because this is the case every damn time.

Let’s say I have the following pyproject.toml file

We download detectron 2 with requirements cuda 11.3, torch 1.10 and python3.8 (Look at the URL). I’m using cuda 11.7

I use poetry to setup my environment but don’t let that scare you. You can just write a requirements.txt to install with pip but you may have to be a lil more specific with your versions.

poetry install --with test

(Actually poetry install is enough as well)

All well and good.

I have the following started code:

A fairly simple script. Here are the following steps

  1. Parse the arguments
  2. Load the model
  3. load the image
  4. predict the mask/objects/(anything) about the image
python --config_file config.yml --image image.jpg

Running the script gives the following error:

module 'PIL.Image' has no attribute 'LINEAR'

Well, googling it I find this GitHub issue:

They have changed the code from LINEAR to BILINEAR.

that I could just go in the detectron code and change the LINEAR TO BILINEAR. That’s in file /path/to/venv/site-packages/detectron2/data/transforms/ in the init of class ExtentTransform.

Or I could downgrade my Pillow version to 9.0.0 and reinstall with poetry. If you are using poetry, just the run the poetry update command

Okay, error one solved. Let’s rerun

And we get an error: cannot open shared object file: No such file or directory

Now we are finally in the detectron2 installation hell hole

One users says degrading 11.0 to 10.1 solved the issue. This is not an ideal solution since I have other libraries depending on my current cuda.

This required changing torch versions. But I have installed the required torch version already.

The I tumble upon two links - one is the detectron2’s installation itself and the other is the almighty stack overflow

Installation — detectron2 0.6 documentation

This mentions running the following code and checking the cuda versions

python -m detectron2.utils.collect_env

The version of NVCC you use to build detectron2 or torchvision does not match the version of CUDA you are running with. This often happens when using anaconda’s CUDA runtime.

Use python -m detectron2.utils.collect_env to find out inconsistent CUDA versions. In the output of this command, you should expect “Detectron2 CUDA Compiler”, “CUDA_HOME”, “PyTorch built with - CUDA” to contain cuda libraries of the same version.

When they are inconsistent, you need to either install a different build of PyTorch (or build by yourself) to match your local CUDA installation, or install a different version of CUDA to match PyTorch.

To my horror I notice this


This mentions using installing the same torch but compiled with cuda 11.3. SMH

OSError: cannot open shared object file: No such file or directory

Let’s try making that change to our pyproject.toml

numpy = "^1.24.1"
torch = { url = '' }
torchvision = { url = '' }
Pillow = "9.0.0"
opencv-python = "^4.7.0"
detectron2 = { url = "" }
memory_profiler = "*"
visdom = "*"

or you could use

poetry add torch@
poetry add torchvision@

Now I have torch 1.10+cu111 and it works.

The main issue was the installed torch 1.10 was compiled with cuda 10.2 instead of cuda 11.3. But torch compiled with cuda 11.1 works because in the official installation docs , you don’t see 1.10.1 compiled with 11.3. But this works!

I spent 4 hours. It is a simple solution. I probably should have read the docs better but this was a frustrating experience regardless. I’m just glad that after numerous trials with docker and dev containers (as suggested by chatGPT) different version installations of detectron2, I was able to understand and solve the error. I hope you don’t have to waste as much time installing it.

