Eval bug: llama server response hangs for /slots/0?action=erase · ggml-org/llama.cpp#17387

倉庫指標

Star: (110,169 star)
PR 合併指標: (平均合併 6天 8小時) (30 天內合併 389 個 PR)

描述

Name and Version

b52edd25586fabb70f0c21b274473b307cf14499

Operating systems

Linux

GGML backends

CPU

Hardware

Mac M4

Models

llama3.2

Problem description & steps to reproduce

When running llama-server using ramalama (which runs llama.cpp inside the container) and with the necessary argument -slot-save-path /tmp to enable the slots feature when I try to do this command curl -X POST "http://localhost:8080/slots/0?action=erase" it will hang until i do control c then on the server side i see the response. But the response is never received by the curl command. I tried doing it inside the container as well to avoid networking issues but it still hangs

My goal is to clear the prompt cache for a summarization feature ie when the context size is reached clear the cache summarize the history and feed it back. The workaround is to just specify a small timeout but this seems like a bug.

ramalama latest llama.cpp commit = b52edd25586fabb70f0c21b274473b307cf14499

First Bad Commit

No response

Relevant log output

bmahabir@bmahabir-mac ramalama % curl -X POST "http://localhost:8080/slots/0?action=erase"
^C
bmahabir@bmahabir-mac ramalama % 


srv  remove_waiti: remove task 9 from waiting list. current waiting = 1 (before remove)
srv  log_server_r: request: POST /slots/0 192.168.127.1 200
srv  log_server_r: request:  
srv  log_server_r: response: {"id_slot":0,"n_erased":43}

The server log only happens after the control C. something is hanging in the llamaserver

貢獻者指南

研究方向: 檢查/slots/端點的伺服器程式碼，特別是erase動作，找出掛起的原因。查找缺少的回應刷新、沒有超時的阻塞操作或POST請求處理不當。
技術棧: cpp
領域: backend
議題類型: 錯誤
難度: 3
預計時間: 半天
活動狀態: 活躍
清晰度: 清晰
前置要求: C++server development
新手友善度: 40