Docs: current Codex local llama.cpp guide does not work as written with Responses API · unslothai/unsloth#5141

(8 comments) (1 reaction) (0 assignees)Python (5,658 forks)batch import

good first issue

Repository metrics

Stars: (64,271 stars)
PR merge metrics: (Avg merge 3d 15h) (525 merged PRs in 30d)

Description

Hi, I think the How to Run Local LLMs with OpenAI Codex doc is outdated or incomplete for current Codex CLI versions:

Summary

I tested the local Codex + Unsloth + llama.cpp flow on macOS using an under-10GB Unsloth model and was able to get it working, but not with the doc steps as written.

The main problem is not the model itself. The issue is that current Codex CLI sends a /v1/responses tool payload that llama.cpp rejects unless a compatibility layer is added.

The page already hints at this with:

{"error":{"code":400,"message":"'type' of tool must be 'function'","type":"invalid_request_error"}}

But the page still reads like the local Codex flow works directly, and the fallback it mentions (wire_api = "chat") is no longer a practical current solution.

What I tested

I used:

Codex CLI v0.123.0
llama.cpp built locally
Unsloth model: unsloth/Qwen3.6-27B-GGUF
quant: Qwen3.6-27B-UD-IQ2_XXS.gguf (under 10GB)
macOS on Apple Silicon

I verified:

llama-server started correctly
/v1/models worked
/health worked
/v1/chat/completions worked
/v1/responses worked for direct non-agent requests

What failed

A real codex exec against the local llama.cpp endpoint failed with:

{"error":{"code":400,"message":"'type' of tool must be 'function'","type":"invalid_request_error"}}

I captured the outbound Codex request and found that Codex sent mixed tool types in tools[], including:

function
web_search
image_generation
namespace

llama.cpp rejected the non-function entries.

After filtering those non-function tools through a tiny local proxy, Codex got past that error.

Then I hit a second issue:

{"error":{"code":400,"message":"request (18383 tokens) exceeds the available context size (8192 tokens), try increasing it","type":"exceed_context_size_error","n_prompt_tokens":18383,"n_ctx":8192}}

That was fixed by starting llama-server with a larger context size (--ctx-size 32768).

After those two fixes, Codex worked end to end and successfully wrote files through the local Unsloth model.

Why I think this is a docs issue

The current page suggests a direct local Codex setup, but with current Codex CLI that setup does not work as written.

The real situation seems to be:

Direct local llama.cpp + Unsloth model works.
Direct current Codex -> llama.cpp Responses API does not work cleanly.
A compatibility shim or proxy is needed.
Codex also needs a larger context window than a small default server config.

Suggested doc changes

It would help to update the page to clarify one of these:

Option A: document the current limitation clearly

State that current Codex CLI is not directly compatible with llama.cpp Responses tool payloads without a compatibility layer.

Option B: document a working workaround

Document a tiny compatibility proxy that drops unsupported non-function tool types before forwarding to llama.cpp.

Option C: point users to a tested local provider path

If there is a better currently-supported route than raw llama.cpp Responses mode, document that instead.

Also worth updating:

remove or revise the wire_api = "chat" fallback guidance if it is no longer realistic for current Codex
mention that Codex may require a larger --ctx-size than a minimal local server setup
possibly note that the issue is not specific to one Unsloth model; it is in the Codex/local-provider integration path

Repro summary

Working direct model server:

local llama.cpp server on port 8001
direct /v1/responses requests succeed

Failing direct Codex path:

Codex CLI -> local llama.cpp /v1/responses
fails on non-function tool types

Working final setup:

Codex CLI -> tiny compatibility proxy -> local llama.cpp
llama-server started with larger context window
verified successful file creation through Codex

If useful, I can provide the exact captured tool payload shape and the minimal proxy logic used for the workaround.

Contributor guide

Research direction: Update the documentation to clarify that current Codex CLI is not directly compatible with llama.cpp Responses API without a compatibility proxy, and that a larger ctx size is needed. Include a working proxy example or point to a tested local provider path.
Tech stack: pythoncpp
Domain: documentation
Issue type: Documentation
Difficulty: 3
Estimated time: Half day
Activity status: Active
Clarity: Clear
Prerequisites: Pythonlocal LLM setupAPI concepts
Newbie friendliness: 70