[runtime] WeCom IM channel intermittently gets 401 on runs.stream when Gateway runs with multiple workers
#3149 opened on May 22, 2026
Description
Problem summary
When using the WeCom IM channel with Docker deployment and the default multi-worker Gateway, follow-up messages intermittently fail with 401 Unauthorized during client.runs.stream(...). The same setup becomes stable when GATEWAY_WORKERS=1, which suggests the issue is related to process-local internal authentication not being shared across Uvicorn workers.
Expected behavior
IM channel workers should be able to call Gateway's LangGraph-compatible API reliably regardless of which Gateway worker receives the internal HTTP request.
WeCom messages should not randomly fail with 401 when Gateway is running with multiple workers.
Actual behavior
The first WeCom conversation often succeeds, but later messages intermittently fail.
Gateway logs show the WeCom WebSocket message was received and enqueued, then ChannelManager invoked runs.stream, but the internal POST request returned 401 Unauthorized:
POST /api/threads/<thread_id>/runs/stream HTTP/1.1" 401 Unauthorized langgraph_sdk.errors.AuthenticationError: 401 Unauthorized
Setting GATEWAY_WORKERS=1 makes the 401 disappear.
Operating system
Linux
Platform details
Docker Compose deployment. Gateway container: deer-flow-gateway Frontend container: deer-flow-frontend Nginx container: deer-flow-nginx WeCom IM channel enabled via WebSocket.
Python version
Python 3.12 in backend container
Node.js version
Not checked
pnpm version
Not checked
uv version
Not checked
How are you running DeerFlow?
Docker (make docker-dev)
Reproduction steps
- Deploy DeerFlow using docker/docker-compose.yaml.
- Keep the default Gateway worker setting: command includes: --workers ${GATEWAY_WORKERS:-4}
- Enable channels.wecom in config.yaml with valid bot_id / bot_secret.
- Configure: channels.langgraph_url: http://gateway:8001/api channels.gateway_url: http://gateway:8001
- Start the stack.
- Send a first message to the WeCom bot. It usually succeeds.
- Send follow-up messages in the same WeCom conversation.
- Observe intermittent 401 Unauthorized from POST /api/threads/<thread_id>/runs/stream.
- Set GATEWAY_WORKERS=1 and restart the stack.
- Repeat the same WeCom messages. The 401 no longer appears.
Relevant logs
[Bus] inbound enqueued: channel=wecom
[Manager] received inbound
[Manager] invoking runs.stream(thread_id=dda8d36f-6a1d-42c1-b03c-1c3cb42ed20e)
POST /api/threads/dda8d36f-6a1d-42c1-b03c-1c3cb42ed20e/runs/stream HTTP/1.1" 401 Unauthorized
langgraph_sdk.errors.AuthenticationError: 401 Unauthorized
Traceback:
File "/app/backend/app/channels/manager.py", line 858, in _handle_streaming_chat
async for chunk in client.runs.stream(
File "/app/backend/.venv/lib/python3.12/site-packages/langgraph_sdk/_async/http.py", line 238, in stream
await _araise_for_status_typed(res)
File "/app/backend/.venv/lib/python3.12/site-packages/langgraph_sdk/errors.py", line 221, in _araise_for_status_typed
raise err
[Manager] streaming response completed: error=401 Unauthorized
Git state
commit: c810e9f tag/version context: v2.0-m1-rc1 PR included: #2932
Additional context
This does not look like a WeCom WebSocket disconnect issue, because failed messages still reach Gateway:
[Bus] inbound enqueued [Manager] received inbound
The failure happens later when ChannelManager calls client.runs.stream(...).
Code-level hypothesis:
-
docker/docker-compose.yaml starts Gateway with multiple Uvicorn workers by default: --workers ${GATEWAY_WORKERS:-4}
-
app.gateway.internal_auth currently generates the internal auth token process-locally: _INTERNAL_AUTH_TOKEN = secrets.token_urlsafe(32)
-
app.channels.manager creates the LangGraph SDK client with: create_internal_auth_headers() plus CSRF cookie/header
-
With multiple Gateway workers, the IM channel worker may create an internal HTTP request using worker A's process-local token, but the HTTP request can be handled by worker B.
-
Worker B has a different process-local token, so AuthMiddleware does not accept X-DeerFlow-Internal-Token. Because the request has no browser access_token cookie, it returns 401.
The web UI does not hit this issue because browser requests use access_token cookie/JWT + CSRF, not the process-local internal auth token.
Suggested fix:
Introduce a shared internal auth token, for example DEER_FLOW_INTERNAL_AUTH_TOKEN, and have all Gateway workers use that value. If the env var is absent, fallback to the current process-local token for single-worker local development. Docker/deploy scripts could generate and persist this token similarly to BETTER_AUTH_SECRET / AUTH_JWT_SECRET.
After this fix, IM channels should work with GATEWAY_WORKERS > 1.