Docs: current Codex local llama.cpp guide does not work as written with Responses API
#5141 opened on Apr 23, 2026
Description
Hi, I think the How to Run Local LLMs with OpenAI Codex doc is outdated or incomplete for current Codex CLI versions:
https://unsloth.ai/docs/basics/codex
Summary
I tested the local Codex + Unsloth + llama.cpp flow on macOS using an under-10GB Unsloth model and was able to get it working, but not with the doc steps as written.
The main problem is not the model itself. The issue is that current Codex CLI sends a /v1/responses tool payload that llama.cpp rejects unless a compatibility layer is added.
The page already hints at this with:
{"error":{"code":400,"message":"'type' of tool must be 'function'","type":"invalid_request_error"}}
But the page still reads like the local Codex flow works directly, and the fallback it mentions (wire_api = "chat") is no longer a practical current solution.
What I tested
I used:
- Codex CLI
v0.123.0 llama.cppbuilt locally- Unsloth model:
unsloth/Qwen3.6-27B-GGUF - quant:
Qwen3.6-27B-UD-IQ2_XXS.gguf(under 10GB) - macOS on Apple Silicon
I verified:
llama-serverstarted correctly/v1/modelsworked/healthworked/v1/chat/completionsworked/v1/responsesworked for direct non-agent requests
What failed
A real codex exec against the local llama.cpp endpoint failed with:
{"error":{"code":400,"message":"'type' of tool must be 'function'","type":"invalid_request_error"}}
I captured the outbound Codex request and found that Codex sent mixed tool types in tools[], including:
functionweb_searchimage_generationnamespace
llama.cpp rejected the non-function entries.
After filtering those non-function tools through a tiny local proxy, Codex got past that error.
Then I hit a second issue:
{"error":{"code":400,"message":"request (18383 tokens) exceeds the available context size (8192 tokens), try increasing it","type":"exceed_context_size_error","n_prompt_tokens":18383,"n_ctx":8192}}
That was fixed by starting llama-server with a larger context size (--ctx-size 32768).
After those two fixes, Codex worked end to end and successfully wrote files through the local Unsloth model.
Why I think this is a docs issue
The current page suggests a direct local Codex setup, but with current Codex CLI that setup does not work as written.
The real situation seems to be:
- Direct local
llama.cpp+ Unsloth model works. - Direct current Codex ->
llama.cppResponses API does not work cleanly. - A compatibility shim or proxy is needed.
- Codex also needs a larger context window than a small default server config.
Suggested doc changes
It would help to update the page to clarify one of these:
Option A: document the current limitation clearly
State that current Codex CLI is not directly compatible with llama.cpp Responses tool payloads without a compatibility layer.
Option B: document a working workaround
Document a tiny compatibility proxy that drops unsupported non-function tool types before forwarding to llama.cpp.
Option C: point users to a tested local provider path
If there is a better currently-supported route than raw llama.cpp Responses mode, document that instead.
Also worth updating:
- remove or revise the
wire_api = "chat"fallback guidance if it is no longer realistic for current Codex - mention that Codex may require a larger
--ctx-sizethan a minimal local server setup - possibly note that the issue is not specific to one Unsloth model; it is in the Codex/local-provider integration path
Repro summary
Working direct model server:
- local
llama.cppserver on port8001 - direct
/v1/responsesrequests succeed
Failing direct Codex path:
- Codex CLI -> local
llama.cpp/v1/responses - fails on non-function tool types
Working final setup:
- Codex CLI -> tiny compatibility proxy -> local
llama.cpp llama-serverstarted with larger context window- verified successful file creation through Codex
If useful, I can provide the exact captured tool payload shape and the minimal proxy logic used for the workaround.