ggml-org/llama.cpp

Eval bug: llama server response hangs for /slots/0?action=erase

Open

#17 387 ouverte le 19 nov. 2025

Voir sur GitHub
 (5 commentaires) (0 réactions) (0 assignés)C++ (18 202 forks)batch import
bughelp wantedmedium severityserver/api

Métriques du dépôt

Stars
 (110 169 stars)
Métriques de merge PR
 (Merge moyen 5j 11h) (457 PRs mergées en 30 j)

Description

Name and Version

b52edd25586fabb70f0c21b274473b307cf14499

Operating systems

Linux

GGML backends

CPU

Hardware

Mac M4

Models

llama3.2

Problem description & steps to reproduce

When running llama-server using ramalama (which runs llama.cpp inside the container) and with the necessary argument -slot-save-path /tmp to enable the slots feature when I try to do this command curl -X POST "http://localhost:8080/slots/0?action=erase" it will hang until i do control c then on the server side i see the response. But the response is never received by the curl command. I tried doing it inside the container as well to avoid networking issues but it still hangs

My goal is to clear the prompt cache for a summarization feature ie when the context size is reached clear the cache summarize the history and feed it back. The workaround is to just specify a small timeout but this seems like a bug.

ramalama latest llama.cpp commit = b52edd25586fabb70f0c21b274473b307cf14499

First Bad Commit

No response

Relevant log output

bmahabir@bmahabir-mac ramalama % curl -X POST "http://localhost:8080/slots/0?action=erase"
^C
bmahabir@bmahabir-mac ramalama % 


srv  remove_waiti: remove task 9 from waiting list. current waiting = 1 (before remove)
srv  log_server_r: request: POST /slots/0 192.168.127.1 200
srv  log_server_r: request:  
srv  log_server_r: response: {"id_slot":0,"n_erased":43}

The server log only happens after the control C. something is hanging in the llamaserver

Guide contributeur