photoprism/photoprism

TensorFlow: Running models with GPU support

Open

#5.618 geöffnet am 26. Mai 2026

Auf GitHub ansehen
 (2 Kommentare) (0 Reaktionen) (1 zugewiesene Person)Go (39.670 Stars) (2.263 Forks)batch import
help wanted

Beschreibung

What is not working as documented?

TensorFlow GPU acceleration does not appear to work in PhotoPrism Plus 260523, even though the container has working NVIDIA runtime access and CUDA libraries available.

According to the TensorFlow GPU setup and recent TensorFlow 2 integration work, GPU inference should initialize CUDA devices when TensorFlow models are executed. However, TensorFlow inference in PhotoPrism always runs on CPU and never creates or registers a GPU device.

The following works correctly:

  • NVIDIA Container Toolkit
  • CUDA device access inside the container
  • NVENC hardware transcoding with FFmpeg
  • TensorFlow model loading and inference itself

However, the following expected GPU behavior never occurs:

  • no cuInit
  • no Created device /device:GPU:0
  • no CUDA loader logs
  • no GPU utilization during TensorFlow inference

This happens even with maximum TensorFlow CUDA debug logging enabled.

Relevant implementation work:

  • TensorFlow GPU initialization via PHOTOPRISM_INIT=tensorflow-gpu

How can we reproduce it?

  1. Install Docker, NVIDIA drivers, and NVIDIA Container Toolkit on a Linux host with an NVIDIA GPU.

  2. Verify that the NVIDIA runtime works outside PhotoPrism:

docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
  1. Start PhotoPrism with TensorFlow GPU support enabled:
environment:
  PHOTOPRISM_INIT: "tensorflow-gpu"
  PHOTOPRISM_FFMPEG_ENCODER: "nvidia"
  NVIDIA_VISIBLE_DEVICES: "all"
  NVIDIA_DRIVER_CAPABILITIES: "all"
  1. Verify that PhotoPrism sees TensorFlow vision models:
docker exec -it photoprism sh -c 'photoprism vision ls'

Expected output includes:

nasnet   │ labels │ tensorflow
nsfw     │ nsfw   │ tensorflow
facenet  │ face   │ tensorflow
  1. Run TensorFlow inference with CUDA debug logging enabled:
docker exec -it photoprism sh -c '
TF_CPP_MIN_LOG_LEVEL=0 \
TF_CPP_VMODULE=dso_loader=5,dlopen_checker=5,cuda_gpu_executor=5,gpu_device=5 \
photoprism vision run -m labels --force --count 10 public:true
'
  1. Observe that TensorFlow loads and runs the model, but no GPU initialization happens.

Actual result:

tensorflow: loading nasnet
Reading SavedModel from: /opt/photoprism/assets/models/nasnet
SavedModel load for tags { photoprism }; Status: success

But there are no logs for:

cuInit
Created device /device:GPU:0
Successfully opened dynamic library libcuda.so.1

Also, nvidia-smi shows no GPU activity during inference.

Have you verified that no similar reports exist?

  • This is a new bug that has not yet been reported or documented

What behavior do you expect?

I expect TensorFlow inference to initialize and use the NVIDIA GPU when PhotoPrism is started with:

PHOTOPRISM_INIT: "tensorflow-gpu"
NVIDIA_VISIBLE_DEVICES: "all"
NVIDIA_DRIVER_CAPABILITIES: "all"

Specifically, during photoprism vision run -m labels, TensorFlow should:

  • load CUDA libraries such as libcuda.so.1
  • call cuInit
  • register a GPU device, for example /device:GPU:0
  • show CUDA/GPU initialization messages in the TensorFlow logs
  • use the GPU during TensorFlow inference

Expected log examples:

Successfully opened dynamic library libcuda.so.1
Created device /device:GPU:0

GPU utilization should also be visible in nvidia-smi while TensorFlow vision models are running.

What could be the cause?

Based on the investigation, this does not appear to be caused by Docker GPU passthrough, NVIDIA device permissions, missing CUDA libraries, or unsupported GPU architecture.

The NVIDIA runtime works, /dev/nvidia* devices are available in the PhotoPrism container, CUDA/cuDNN/cuBLAS libraries are visible, and the shipped TensorFlow library contains GPU/CUDA symbols, including sm_61 support for the Tesla P4.

The likely cause seems to be that PhotoPrism's TensorFlow integration loads and runs the TensorFlow models, but does not trigger TensorFlow GPU device discovery or registration. Even with:

TF_CPP_MIN_LOG_LEVEL=0
TF_CPP_VMODULE=dso_loader=5,dlopen_checker=5,cuda_gpu_executor=5,gpu_device=5

there are no CUDA loader logs, no cuInit, and no Created device /device:GPU:0.

This suggests one of the following:

  • the TensorFlow C API / Go wrapper session is initialized in a way that only uses CPU devices;
  • the PhotoPrism TensorFlow 2.18 runtime package contains GPU support, but CUDA platform registration is not active at runtime;
  • or PHOTOPRISM_INIT=tensorflow-gpu installs GPU-capable TensorFlow libraries, but the current PhotoPrism vision pipeline does not actually initialize TensorFlow GPU devices.

In short: the issue appears to be in the PhotoPrism TensorFlow runtime integration or initialization path, rather than in the host NVIDIA setup.

Additional Findings

libtensorflow_framework.so contains CUDA loader strings


strings /usr/lib/libtensorflow_framework.so.2.18.0 | grep -Ei "cuInit|libcuda|Created device"

Includes:


Failed call to cuInit

Cannot dlopen some GPU libraries

Skipping registering GPU devices

Created device

libcuda.so.1

So GPU support appears compiled into the binary.


TensorFlow RUNPATH


readelf -d /usr/lib/libtensorflow.so.2.18.0

Shows CUDA-related RUNPATH entries:


.../nvidia/cudnn/lib

.../nvidia/cublas/lib

.../nvidia/cuda_runtime/lib


Additional Checks

PhotoPrism vision models


docker exec -it photoprism sh -c 'photoprism vision ls'

Output:


nasnet   │ labels │ tensorflow

nsfw     │ nsfw   │ tensorflow

facenet  │ face   │ tensorflow


Environment variables inside container


docker exec -it photoprism sh -c 'env | grep -Ei "CUDA|NVIDIA|TF|LD_LIBRARY"'

Output:


LD_LIBRARY_PATH=/usr/local/cuda-compat-libs:/usr/lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu

PHOTOPRISM_FFMPEG_ENCODER=nvidia

TF_CPP_MIN_LOG_LEVEL=4

TF_ENABLE_ONEDNN_OPTS=1

NVIDIA_VISIBLE_DEVICES=all

NVIDIA_DRIVER_CAPABILITIES=all

### Which software versions do you use?


- PhotoPrism Plus: `260523-0544f71c1-Linux-AMD64-Plus`
- Host OS: Debian 13 “Trixie”
- NVIDIA driver: `550.163.01`
- CUDA reported by `nvidia-smi`: `12.4`
- NVIDIA Container Toolkit: installed and working
- Docker: NVIDIA runtime available
- TensorFlow library in PhotoPrism container: `libtensorflow.so.2.18.0`
- GPU TensorFlow initialization: `PHOTOPRISM_INIT=tensorflow-gpu`
- FFmpeg hardware encoder: `h264_nvenc`



### On what device is PhotoPrism installed?


PhotoPrism is installed on a self-hosted Debian server/NAS running Docker.

Hardware details:

- CPU: AMD FX-6300 Six-Core Processor
- Cores / Threads: 6 / 6
- Architecture: x86_64
- RAM: 32 GB
- GPU: NVIDIA Tesla P4
- NVIDIA Container Toolkit enabled
- Docker GPU passthrough working correctly
- `/dev/nvidia0`, `/dev/nvidiactl`, and `/dev/nvidia-uvm` are available inside the container

### Do you use a reverse proxy, firewall, VPN, or CDN?

No, not relevant to this issue

### Logs, Sample Files, or Screenshots

_No response_

Contributor Guide