Home Assistant - Enabling CUDA GPU support for Wyoming Whisper Docker container

Jul 13, 2023 · 3 min read ·

Update: ab-tools has created a Docker image that has CUDA support baked in. You should be able to skip the below instructions and use this instead. Thanks ab-tools! https://hub.docker.com/r/abtools/wyoming-whisper-cuda

When trying to use the built in CUDA support on the Wyoming Whisper Docker container I received the error: "RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version." Here is how I resolved it.

Prerequisites:

A working Docker/NVIDIA environment. See: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Docker Compose
Wyoming Whisper/Home Assistant configured. See https://www.home-assistant.io/integrations/whisper/ and https://community.home-assistant.io/t/run-whisper-on-external-server/567449
I used Ubuntu 22.04. Other distros will work but this is what I used. If you are also using Ubuntu, run the following to imstall the required packages: apt install libcublas11 libcudnn8

Here is an example of the full error received after attempting to use the CUDA support in Wyoming Whisper:

docker-compose logs -f
whisper    | Traceback (most recent call last):
whisper    |   File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
whisper    |     return _run_code(code, main_globals, None,
whisper    |   File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
whisper    |     exec(code, run_globals)
whisper    |   File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/__main__.py", line 135, in <module>
whisper    |     asyncio.run(main())
whisper    |   File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
whisper    |     return loop.run_until_complete(main)
whisper    |   File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
whisper    |     return future.result()
whisper    |   File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/__main__.py", line 112, in main
whisper    |     whisper_model = WhisperModel(
whisper    |   File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/transcribe.py", line 58, in __init__
whisper    |     self.model = ctranslate2.models.Whisper(
whisper    | RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version

I was using a configuration that I normally use for any of my other containers that leverage my GPU:

1    deploy:
2      resources:
3        reservations:
4          devices:
5            - capabilities: [gpu]

The following docker-compose.yml is what eventually got things in working order.

 1services:
 2  whisper:
 3    container_name: whisper
 4    image: rhasspy/wyoming-whisper:latest
 5    command: --model small-int8 --language en --beam-size 5 --device cuda
 6    volumes:
 7      - /home/whisper/whisper-data:/data
 8      - /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:ro
 9      - /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:ro
10      - /usr/lib/x86_64-linux-gnu/libcublasLt.so.11:/usr/lib/x86_64-linux-gnu/libcublasLt.so.12:ro
11      - /usr/lib/x86_64-linux-gnu/libcublas.so.11:/usr/lib/x86_64-linux-gnu/libcublas.so.12:ro
12    restart: always
13    ports:
14      - 10300:10300
15    runtime: nvidia
16    deploy:
17        resources:
18          reservations:
19            devices:
20              - driver: nvidia
21                count: 1
22                capabilities: [gpu]

Notice a few libraries are manually imported into the container. Without these I would get errors such as below:

docker-compose logs -f
whisper    | INFO:__main__:Ready
whisper    | Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
whisper    | Please make sure libcudnn_ops_infer.so.8 is in your library path!
whisper    | Aborted                 (core dumped) python3 -m wyoming_faster_whisper --uri 'tcp://0.0.0.0:10300' --data-dir /data --download-dir /data "$@"

In the end, things are working as expected. Responses went from ~10 seconds using CPU down to ~1 second using GPU (GTX1060).

docker-compose logs -f
whisper    | INFO:__main__:Ready
whisper    | INFO:wyoming_faster_whisper.handler: Turn on living room lights.