[Feature]: Support proactive per-model rate limiting (e.g. max requests per minute with blocking wait) · agentscope-ai/agentscope-java#974

(3 comments) (0 reactions) (0 assignees)Java (693 forks)user submission

area/core/modelenhancementhelp wanted

Repository metrics

Stars: (3,253 stars)
PR merge metrics: (Avg merge 2d 8h) (58 merged PRs in 30d)

Description

Feature Description

Add proactive per-model rate limiting support for Java AgentScope, so callers can configure a request budget such as "max 20 requests per minute" and the SDK will wait before sending the next model request instead of only failing after the provider starts throttling.

This is different from retry-after-failure. The need here is client-side throttling before requests are sent.

Motivation

In multi-agent workflows, one top-level agent may trigger several sub-agents and model calls in a short time window. Even when each individual call is valid, the aggregate request rate can exceed the provider capacity and produce errors like:

Too many requests. Your requests are being throttled due to system capacity limits.
HTTP 429
provider-specific 500/503 responses that actually mean throttling or temporary overload

AgentScope Java already exposes:

ExecutionConfig.maxAttempts(...)
ExecutionConfig.initialBackoff(...)
ExecutionConfig.maxBackoff(...)
ReActAgent.Builder.maxIters(...)

These are useful, but they do not solve the proactive throttling use case.

Proposed API Shape

One possible design is to support rate limiting in model execution config or model transport config, for example:

ExecutionConfig.builder()
    .maxAttempts(3)
    .initialBackoff(Duration.ofSeconds(2))
    .maxBackoff(Duration.ofSeconds(30))
    .maxRequestsPerMinute(20)
    .build();

Or a dedicated rate-limit config object:

RateLimitConfig.builder()
    .maxRequests(20)
    .per(Duration.ofMinutes(1))
    .burst(5)
    .waitMode(BLOCK)
    .build();

And then:

ReActAgent.builder()
    .model(model)
    .modelExecutionConfig(modelExecutionConfig)
    .build();

Expected Behavior

Support per-model or per-agent model request throttling
Support blocking/waiting when the budget is exhausted
Work for both non-streaming and streaming model calls
Be shared across concurrent sub-agent calls when they use the same underlying model/client
Keep retry/backoff behavior as a separate concern

Why This Matters

Without proactive rate limiting, users of multi-agent systems have to implement their own global throttling layer outside AgentScope. That is possible, but awkward because the throttling concern belongs very close to model execution.

Built-in support would make AgentScope Java more reliable for provider backends with strict RPM/QPM limits, especially in orchestrated workflows.

Additional Context

Contributor guide

Research direction: Study existing rate limiting libraries (e.g., Guava RateLimiter, Resilience4j RateLimiter) and identify where to hook into AgentScope's model execution pipeline (likely in `ModelExecutor` or `ExecutionConfig`). Design a configuration object that supports maxRequests, perDuration, burst, and waitMode, then integrate it with the existing retry/backoff logic so throttling happens before calls are sent. Ensure thread safety and compatibility with streaming calls.
Tech stack: java
Domain: backend
Issue type: Feature
Difficulty: 3
Estimated time: 1-2 days
Activity status: Fresh
Clarity: Clear
Prerequisites: JavaAgentScope
Newbie friendliness: 50