zeroclaw-labs/zeroclaw

[Feature]: Provider-scoped model fallback chains (not global model fallback only)

Open

#4647 opened on Mar 25, 2026

View on GitHub
 (2 comments) (1 reaction) (0 assignees)Rust (31,341 stars) (4,614 forks)batch import
configenhancementhelp wantedpriority:p2providerprovider:reliablerisk: mediumstatus:acceptedstatus:no-stale

Description

Summary

Provider-scoped model fallback chains (not global model fallback only)

Problem statement

Summary

Today reliability.fallback_providers and reliability.model_fallbacks are configured separately. This makes fallback behavior hard to control when each provider has different valid models. I’d like provider-aware fallback chains, so each fallback provider can define its own model list (predefined or custom).

Current behavior

Fallback order is effectively:

  1. model chain (model_fallbacks)
  2. provider chain (fallback_providers)
  3. retries Because model fallback is global, a model may be attempted on providers where it is invalid/unavailable before moving on.

Requested feature

Support provider-scoped model fallback (priority chain of provider+model pairs), for example:

  • predefined provider IDs (openai, ollama, etc.)
  • custom providers (custom:https://...)
  • optional per-entry profile/credential selection

Proposed solution

Possible config design (example)

[reliability]
provider_retries = 2
provider_backoff_ms = 500
[[reliability.fallback_chain1]]
provider = "openrouter"
models = ["anthropic/claude-sonnet-4-6"]
[[reliability.fallback_chain2]]
provider = "custom:http://127.0.0.1:1234/v1"
models = ["model-a", "model-b"]
[[reliability.fallback_chain3]]
provider = "openai"
models = ["gpt-4o-mini", "gpt-4.1-mini"]
Expected behavior
Runtime should try entries in order:
1. openrouter + claude-sonnet-4-6
2. custom + model-a
3. custom + model-b
4. openai + gpt-4o-mini
5. openai + gpt-4.1-mini
(With configured retries/backoff at each step.)
Why this helps
- avoids invalid model/provider combinations
- predictable failover in production
- easier to reason about outages and costs
- supports mixed cloud/local/custom routing

### Non-goals / out of scope

_No response_

### Alternatives considered

_No response_

### Acceptance criteria

_No response_

### Architecture impact

_No response_

### Risk and rollback

_No response_

### Breaking change?

No

### Data hygiene checks

- [x] I removed personal/sensitive data from examples, payloads, and logs.
- [x] I used neutral, project-focused wording and placeholders.

Contributor guide