unslothai/unsloth

Docs: current Codex local llama.cpp guide does not work as written with Responses API

Open

#5141 opened on Apr 23, 2026

View on GitHub
 (4 comments) (1 reaction) (0 assignees)Python (64,271 stars) (5,658 forks)batch import
good first issue

Description

Hi, I think the How to Run Local LLMs with OpenAI Codex doc is outdated or incomplete for current Codex CLI versions:

https://unsloth.ai/docs/basics/codex

Summary

I tested the local Codex + Unsloth + llama.cpp flow on macOS using an under-10GB Unsloth model and was able to get it working, but not with the doc steps as written.

The main problem is not the model itself. The issue is that current Codex CLI sends a /v1/responses tool payload that llama.cpp rejects unless a compatibility layer is added.

The page already hints at this with:

{"error":{"code":400,"message":"'type' of tool must be 'function'","type":"invalid_request_error"}}

But the page still reads like the local Codex flow works directly, and the fallback it mentions (wire_api = "chat") is no longer a practical current solution.

What I tested

I used:

  • Codex CLI v0.123.0
  • llama.cpp built locally
  • Unsloth model: unsloth/Qwen3.6-27B-GGUF
  • quant: Qwen3.6-27B-UD-IQ2_XXS.gguf (under 10GB)
  • macOS on Apple Silicon

I verified:

  • llama-server started correctly
  • /v1/models worked
  • /health worked
  • /v1/chat/completions worked
  • /v1/responses worked for direct non-agent requests

What failed

A real codex exec against the local llama.cpp endpoint failed with:

{"error":{"code":400,"message":"'type' of tool must be 'function'","type":"invalid_request_error"}}

I captured the outbound Codex request and found that Codex sent mixed tool types in tools[], including:

  • function
  • web_search
  • image_generation
  • namespace

llama.cpp rejected the non-function entries.

After filtering those non-function tools through a tiny local proxy, Codex got past that error.

Then I hit a second issue:

{"error":{"code":400,"message":"request (18383 tokens) exceeds the available context size (8192 tokens), try increasing it","type":"exceed_context_size_error","n_prompt_tokens":18383,"n_ctx":8192}}

That was fixed by starting llama-server with a larger context size (--ctx-size 32768).

After those two fixes, Codex worked end to end and successfully wrote files through the local Unsloth model.

Why I think this is a docs issue

The current page suggests a direct local Codex setup, but with current Codex CLI that setup does not work as written.

The real situation seems to be:

  1. Direct local llama.cpp + Unsloth model works.
  2. Direct current Codex -> llama.cpp Responses API does not work cleanly.
  3. A compatibility shim or proxy is needed.
  4. Codex also needs a larger context window than a small default server config.

Suggested doc changes

It would help to update the page to clarify one of these:

Option A: document the current limitation clearly

State that current Codex CLI is not directly compatible with llama.cpp Responses tool payloads without a compatibility layer.

Option B: document a working workaround

Document a tiny compatibility proxy that drops unsupported non-function tool types before forwarding to llama.cpp.

Option C: point users to a tested local provider path

If there is a better currently-supported route than raw llama.cpp Responses mode, document that instead.

Also worth updating:

  • remove or revise the wire_api = "chat" fallback guidance if it is no longer realistic for current Codex
  • mention that Codex may require a larger --ctx-size than a minimal local server setup
  • possibly note that the issue is not specific to one Unsloth model; it is in the Codex/local-provider integration path

Repro summary

Working direct model server:

  • local llama.cpp server on port 8001
  • direct /v1/responses requests succeed

Failing direct Codex path:

  • Codex CLI -> local llama.cpp /v1/responses
  • fails on non-function tool types

Working final setup:

  • Codex CLI -> tiny compatibility proxy -> local llama.cpp
  • llama-server started with larger context window
  • verified successful file creation through Codex

If useful, I can provide the exact captured tool payload shape and the minimal proxy logic used for the workaround.

Contributor guide