Eval bug: Gemma4 fails with NotImplemented: map: filter-mapping not implemented · ggml-org/llama.cpp#21547

Métriques du dépôt

Stars: (110 169 stars)
Métriques de merge PR: (Merge moyen 5j 11h) (457 PRs mergées en 30 j)

Description

Name and Version

llama-cli --version ggml_cuda_init: found 1 ROCm devices (Total VRAM: 81920 MiB): Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 81920 MiB version: 8683 (d0a6dfeb2) built with GNU 15.2.1 for Linux x86_64

Operating systems

Linux

GGML backends

HIP

Hardware

AMD 395+ AI (Strix Halo)

Models

https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF with Q4_K_M

Problem description & steps to reproduce

I'm testing gemma4 (https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF with Q4_K_M) with opencode. The model loads correctly. When I issue a prompt via opencode, I see the following message in the logs of llama-server:

[36203] srv operator(): got exception: {"error":{"code":500,"message":"NotImplemented: map: filter-mapping not implemented","type":"server_error"}}

The request fails with this.

I run strix-halo toolboxes in nightly with the image of today.

Serverstart:

#!/bin/sh
llama-server \
  --models-preset ./models.ini \
  --host 0.0.0.0 \
  --port 8080 \
  --models-max 3 \
  --cont-batching \
  --jinja \
  --metrics \
  --kv-unified

Modelfile:

[*]
slots = 4

slot-save-path = /cache/prompt-cache/

threads = 12
flash-attn = on
mlock = off
mmap = off
fit = off
warmup = off
batch-size = 4096
ubatch-size = 512
cache-type-k = q8_0
cache-type-v = q8_0
jinja = true
direct-io = on
cache-prompt = true
#cache-reuse = 256
#cache-ram = 8192
...

[reasoner2]
model = /home/XXX/llm/models/gemma4/gemma-4-26B-A4B-it-UD-Q4_K_M.gguf
chat-template-file = /home/XXX/llm/models/gemma4/template2.jinja
alias = reasoner2
ctx-size = 131072
n-predict = -1
mlock = true
jinja = enabled
...

I did not found any other information on this so far, any ideas?

First Bad Commit

No response

Relevant log output

[36203] srv    operator(): got exception: {"error":{"code":500,"message":"NotImplemented: map: filter-mapping not implemented","type":"server_error"}}

Guide contributeur

Direction de recherche: Recherchez dans le code 'filter mapping not implemented'. Il se trouve probablement dans la logique d'echantillonnage ou d'inference. Tracez le chemin d'appel depuis la requete du serveur jusqu'a cette erreur. Cherchez la condition qui declenche l'exception, probablement liee a l'architecture du modele Gemma4 (par ex. MoE ou attention).
Stack technique: cpp
Domaine: backend
Type d'issue: Bug
Difficulté: 3
Temps estimé: 1-3 heures
Statut d'activité: Active
Clarté: Claire
Prérequis: C++llama.cpp server
Accessibilité débutant: 50