Eval bug: Gemma4 fails with NotImplemented: map: filter-mapping not implemented
#21547 opened on Apr 7, 2026
Description
Name and Version
llama-cli --version ggml_cuda_init: found 1 ROCm devices (Total VRAM: 81920 MiB): Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 81920 MiB version: 8683 (d0a6dfeb2) built with GNU 15.2.1 for Linux x86_64
Operating systems
Linux
GGML backends
HIP
Hardware
AMD 395+ AI (Strix Halo)
Models
https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF with Q4_K_M
Problem description & steps to reproduce
I'm testing gemma4 (https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF with Q4_K_M) with opencode. The model loads correctly. When I issue a prompt via opencode, I see the following message in the logs of llama-server:
[36203] srv operator(): got exception: {"error":{"code":500,"message":"NotImplemented: map: filter-mapping not implemented","type":"server_error"}}
The request fails with this.
I run strix-halo toolboxes in nightly with the image of today.
Serverstart:
#!/bin/sh
llama-server \
--models-preset ./models.ini \
--host 0.0.0.0 \
--port 8080 \
--models-max 3 \
--cont-batching \
--jinja \
--metrics \
--kv-unified
Modelfile:
[*]
slots = 4
slot-save-path = /cache/prompt-cache/
threads = 12
flash-attn = on
mlock = off
mmap = off
fit = off
warmup = off
batch-size = 4096
ubatch-size = 512
cache-type-k = q8_0
cache-type-v = q8_0
jinja = true
direct-io = on
cache-prompt = true
#cache-reuse = 256
#cache-ram = 8192
...
[reasoner2]
model = /home/XXX/llm/models/gemma4/gemma-4-26B-A4B-it-UD-Q4_K_M.gguf
chat-template-file = /home/XXX/llm/models/gemma4/template2.jinja
alias = reasoner2
ctx-size = 131072
n-predict = -1
mlock = true
jinja = enabled
...
I did not found any other information on this so far, any ideas?
First Bad Commit
No response
Relevant log output
[36203] srv operator(): got exception: {"error":{"code":500,"message":"NotImplemented: map: filter-mapping not implemented","type":"server_error"}}