mozilla-ai/llamafile

GPU numbering on Windows possibly in wrong order

Open

#26 opened on Dec 1, 2023

View on GitHub
 (8 comments) (0 reactions) (0 assignees)C++ (24,439 stars) (1,358 forks)batch import
enhancementhelp wanted

Description

I have multiple NVIDIA GPUs and originally thought it was reporting usage of the wrong one. Now I'm not sure it's using either of them. Is there a way to check for sure, or to pass in preferred device?

Windows 11 session here, x64 native tools command prompt

https://gist.github.com/danbri/d8a387321642b14336701dedf166527f (excerpts only below)

It correctly finds 2 NVIDIA CUDA GPU devices:

ggml_init_cublas: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6 Device 1: NVIDIA GeForce RTX 3080 Ti Laptop GPU, compute capability 8.6

[...]

Later it reports:

ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3090) as main device llm_load_tensors: mem required = 8801.76 MB llm_load_tensors: offloading 0 repeating layers to GPU llm_load_tensors: offloaded 0/43 layers to GPU llm_load_tensors: VRAM used: 0.00 MB

In the Web UI on :8082 when I start a task, I see the supposedly "main device" GPU (a 3090, external usb box; not the most efficient use of it but hey) at 0% utilization in Task Manager. The built-in NVIDIA appears to be in low level use (4% max) but that seems to be background Window Manager usage. CPU usage goes to 45 or 50% while generating response tokens. Given the "offloading nothing to GPU" log messages, I guess it isn't actually using either NVIDIA GPU, despite noticing them?

If we disconnect the external 3090 NVIDIA GPU, and re-run llamafile, it recognises the remaining internal NVIDIA, and things seem similar except the log now says just

llm_load_tensors: using CUDA for GPU acceleration llm_load_tensors: mem required = 8801.76 MB llm_load_tensors: offloading 0 repeating layers to GPU llm_load_tensors: offloaded 0/43 layers to GPU

The only processes Task Manager reports for what it calls GPU 1 (the NVIDIA) are Desktop Window Manager and Client Server Runtime Process.

I started out thinking it was using the wrong GPU, I'm not convinced now that either GPU is being used.

Contributor guide