ggml-org/llama.cpp

Eval bug: llama server response hangs for /slots/0?action=erase

Open

#17,387 建立於 2025年11月19日

在 GitHub 查看
 (5 留言) (0 反應) (0 負責人)C++ (110,169 star) (18,202 fork)batch import
bughelp wantedmedium severityserver/api

描述

Name and Version

b52edd25586fabb70f0c21b274473b307cf14499

Operating systems

Linux

GGML backends

CPU

Hardware

Mac M4

Models

llama3.2

Problem description & steps to reproduce

When running llama-server using ramalama (which runs llama.cpp inside the container) and with the necessary argument -slot-save-path /tmp to enable the slots feature when I try to do this command curl -X POST "http://localhost:8080/slots/0?action=erase" it will hang until i do control c then on the server side i see the response. But the response is never received by the curl command. I tried doing it inside the container as well to avoid networking issues but it still hangs

My goal is to clear the prompt cache for a summarization feature ie when the context size is reached clear the cache summarize the history and feed it back. The workaround is to just specify a small timeout but this seems like a bug.

ramalama latest llama.cpp commit = b52edd25586fabb70f0c21b274473b307cf14499

First Bad Commit

No response

Relevant log output

bmahabir@bmahabir-mac ramalama % curl -X POST "http://localhost:8080/slots/0?action=erase"
^C
bmahabir@bmahabir-mac ramalama % 


srv  remove_waiti: remove task 9 from waiting list. current waiting = 1 (before remove)
srv  log_server_r: request: POST /slots/0 192.168.127.1 200
srv  log_server_r: request:  
srv  log_server_r: response: {"id_slot":0,"n_erased":43}

The server log only happens after the control C. something is hanging in the llamaserver

貢獻者指南