Eval bug: llama server response hangs for /slots/0?action=erase · ggml-org/llama.cpp#17387

Métriques du dépôt

Stars: (110 169 stars)
Métriques de merge PR: (Merge moyen 5j 11h) (457 PRs mergées en 30 j)

Description

Name and Version

b52edd25586fabb70f0c21b274473b307cf14499

Operating systems

Linux

GGML backends

CPU

Hardware

Mac M4

Models

llama3.2

Problem description & steps to reproduce

When running llama-server using ramalama (which runs llama.cpp inside the container) and with the necessary argument -slot-save-path /tmp to enable the slots feature when I try to do this command curl -X POST "http://localhost:8080/slots/0?action=erase" it will hang until i do control c then on the server side i see the response. But the response is never received by the curl command. I tried doing it inside the container as well to avoid networking issues but it still hangs

My goal is to clear the prompt cache for a summarization feature ie when the context size is reached clear the cache summarize the history and feed it back. The workaround is to just specify a small timeout but this seems like a bug.

ramalama latest llama.cpp commit = b52edd25586fabb70f0c21b274473b307cf14499

First Bad Commit

No response

Relevant log output

bmahabir@bmahabir-mac ramalama % curl -X POST "http://localhost:8080/slots/0?action=erase"
^C
bmahabir@bmahabir-mac ramalama % 


srv  remove_waiti: remove task 9 from waiting list. current waiting = 1 (before remove)
srv  log_server_r: request: POST /slots/0 192.168.127.1 200
srv  log_server_r: request:  
srv  log_server_r: response: {"id_slot":0,"n_erased":43}

The server log only happens after the control C. something is hanging in the llamaserver

Guide contributeur

Direction de recherche: Examinez le code du serveur pour le point de terminaison /slots/, en particulier l'action erase, afin d'identifier la cause du blocage. Recherchez un vidage de réponse manquant, des opérations bloquantes sans délai d'attente ou un traitement inapproprié de la requête POST.
Stack technique: cpp
Domaine: backend
Type d'issue: Bug
Difficulté: 3
Temps estimé: Une demi journée
Statut d'activité: Active
Clarté: Claire
Prérequis: C++server development
Accessibilité débutant: 40