agentscope-ai/agentscope-java

[Feature]: Support proactive per-model rate limiting (e.g. max requests per minute with blocking wait)

Open

#974 opened on Mar 16, 2026

View on GitHub
 (3 comments) (0 reactions) (0 assignees)Java (3,253 stars) (693 forks)user submission
enhancementhelp wanted

Description

Feature Description

Add proactive per-model rate limiting support for Java AgentScope, so callers can configure a request budget such as "max 20 requests per minute" and the SDK will wait before sending the next model request instead of only failing after the provider starts throttling.

This is different from retry-after-failure. The need here is client-side throttling before requests are sent.

Motivation

In multi-agent workflows, one top-level agent may trigger several sub-agents and model calls in a short time window. Even when each individual call is valid, the aggregate request rate can exceed the provider capacity and produce errors like:

  • Too many requests. Your requests are being throttled due to system capacity limits.
  • HTTP 429
  • provider-specific 500/503 responses that actually mean throttling or temporary overload

AgentScope Java already exposes:

  • ExecutionConfig.maxAttempts(...)
  • ExecutionConfig.initialBackoff(...)
  • ExecutionConfig.maxBackoff(...)
  • ReActAgent.Builder.maxIters(...)

These are useful, but they do not solve the proactive throttling use case.

Proposed API Shape

One possible design is to support rate limiting in model execution config or model transport config, for example:

ExecutionConfig.builder()
    .maxAttempts(3)
    .initialBackoff(Duration.ofSeconds(2))
    .maxBackoff(Duration.ofSeconds(30))
    .maxRequestsPerMinute(20)
    .build();

Or a dedicated rate-limit config object:

RateLimitConfig.builder()
    .maxRequests(20)
    .per(Duration.ofMinutes(1))
    .burst(5)
    .waitMode(BLOCK)
    .build();

And then:

ReActAgent.builder()
    .model(model)
    .modelExecutionConfig(modelExecutionConfig)
    .build();

Expected Behavior

  • Support per-model or per-agent model request throttling
  • Support blocking/waiting when the budget is exhausted
  • Work for both non-streaming and streaming model calls
  • Be shared across concurrent sub-agent calls when they use the same underlying model/client
  • Keep retry/backoff behavior as a separate concern

Why This Matters

Without proactive rate limiting, users of multi-agent systems have to implement their own global throttling layer outside AgentScope. That is possible, but awkward because the throttling concern belongs very close to model execution.

Built-in support would make AgentScope Java more reliable for provider backends with strict RPM/QPM limits, especially in orchestrated workflows.

Additional Context

Contributor guide