Eval bug: Gemma4 fails with NotImplemented: map: filter-mapping not implemented · ggml-org/llama.cpp#21547

Repository metrics

Stars: (110,169 stars)
PR merge metrics: (Avg merge 6d 8h) (389 merged PRs in 30d)

Description

Name and Version

llama-cli --version ggml_cuda_init: found 1 ROCm devices (Total VRAM: 81920 MiB): Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 81920 MiB version: 8683 (d0a6dfeb2) built with GNU 15.2.1 for Linux x86_64

Operating systems

Linux

GGML backends

HIP

Hardware

AMD 395+ AI (Strix Halo)

Models

https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF with Q4_K_M

Problem description & steps to reproduce

I'm testing gemma4 (https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF with Q4_K_M) with opencode. The model loads correctly. When I issue a prompt via opencode, I see the following message in the logs of llama-server:

[36203] srv operator(): got exception: {"error":{"code":500,"message":"NotImplemented: map: filter-mapping not implemented","type":"server_error"}}

The request fails with this.

I run strix-halo toolboxes in nightly with the image of today.

Serverstart:

#!/bin/sh
llama-server \
  --models-preset ./models.ini \
  --host 0.0.0.0 \
  --port 8080 \
  --models-max 3 \
  --cont-batching \
  --jinja \
  --metrics \
  --kv-unified

Modelfile:

[*]
slots = 4

slot-save-path = /cache/prompt-cache/

threads = 12
flash-attn = on
mlock = off
mmap = off
fit = off
warmup = off
batch-size = 4096
ubatch-size = 512
cache-type-k = q8_0
cache-type-v = q8_0
jinja = true
direct-io = on
cache-prompt = true
#cache-reuse = 256
#cache-ram = 8192
...

[reasoner2]
model = /home/XXX/llm/models/gemma4/gemma-4-26B-A4B-it-UD-Q4_K_M.gguf
chat-template-file = /home/XXX/llm/models/gemma4/template2.jinja
alias = reasoner2
ctx-size = 131072
n-predict = -1
mlock = true
jinja = enabled
...

I did not found any other information on this so far, any ideas?

First Bad Commit

No response

Relevant log output

[36203] srv    operator(): got exception: {"error":{"code":500,"message":"NotImplemented: map: filter-mapping not implemented","type":"server_error"}}

Contributor guide

Research direction: Search the codebase for 'filter mapping not implemented'. It likely appears in the sampling or inference logic. Trace the call path from the server request to that error. Look for any condition that triggers that exception, possibly related to Gemma4's model architecture (e.g., MoE or attention).
Tech stack: cpp
Domain: backend
Issue type: Bug
Difficulty: 3
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: C++llama.cpp server
Newbie friendliness: 50