[Bug] SslServerTlsHandler.exceptionCaught() swallows exceptions without closing channel, causing permanent half-open TLS connections
#16201 opened on Apr 9, 2026
Description
Pre-check
- I am sure that all the content I provide is in English.
Search before asking
- I had searched in the issues and found no similar issues.
Apache Dubbo Component
Java SDK (apache/dubbo)
Dubbo Version
Dubbo Java 3.3 (also affects 3.2.x). Netty 4.1.x.
Steps to reproduce this issue
Scenario: Provider has a broken Netty dependency (e.g., incompatible netty-buffer version causing NoClassDefFoundError: Could not initialize class io.netty.buffer.PooledUnsafeDirectByteBuf).
- Consumer connects to Provider — TCP three-way handshake succeeds (handled by OS kernel, unaffected by the Netty bug).
NettyClient.doConnect()only waits for TCP handshake completion, so it considers the connection successful.DubboInvokeris created and added tovalidInvokers.- TLS handshake begins asynchronously — Consumer sends
ClientHello. - Provider's Netty read loop tries to allocate a
ByteBufto read the incoming data →NoClassDefFoundErroris thrown. - Netty's
NioByteUnsafe.handleReadException()firespipeline.fireExceptionCaught(cause)but does not close the channel (Netty only auto-closes forIOExceptionorOutOfMemoryError). - The exception reaches
SslServerTlsHandler.exceptionCaught(), which only logs the error — it neither closes the channel nor propagates the exception. - The channel remains TCP-active but is completely non-functional at the application layer.
- Consumer's
DubboInvoker.isAvailable()returnstrue(it only checkschannel.isActive()), so the invoker is never removed fromvalidInvokers. - All RPC requests routed to this Provider time out after 10 seconds.
What you expected to happen
When SslServerTlsHandler.exceptionCaught() is invoked, the channel should be closed (via ctx.close()), just like the userEventTriggered() method in the same class already does on TLS handshake failure. This would allow:
- The Consumer to detect
channelInactive→isConnected()=false→isAvailable()=false - Dubbo's
addInvalidateInvokermechanism to remove the broken invoker fromvalidInvokers - The self-healing loop to work as designed
Current behavior of SslServerTlsHandler.exceptionCaught() (line 60-68):
@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception {
logger.error(INTERNAL_ERROR, "unknown error in remoting module", "",
"TLS negotiation failed when trying to accept new connection.", cause);
// BUG: no ctx.close() and no ctx.fireExceptionCaught(cause)
// The exception is silently swallowed, channel stays open but broken
}
Compare with userEventTriggered() in the same class (line 81-89), which correctly closes the channel:
} else {
logger.error(INTERNAL_ERROR, "", "",
"TLS negotiation failed when trying to accept new connection.",
handshakeEvent.cause());
ctx.close(); // ← correctly closes the channel
}
Similarly, SslClientTlsHandler.userEventTriggered() on the Consumer side fires ctx.fireExceptionCaught() on TLS failure but does not close the channel, which can also lead to half-open connections.
Anything else
Root cause analysis:
The exception propagation chain breaks at SslServerTlsHandler.exceptionCaught():
Netty read loop: allocate ByteBuf → NoClassDefFoundError
↓
NioByteUnsafe.handleReadException() → pipeline.fireExceptionCaught(cause)
(Netty does NOT auto-close: NoClassDefFoundError is not IOException/OutOfMemoryError)
↓
SslServerTlsHandler.exceptionCaught() → logs error, BUT:
✗ Does NOT call ctx.close()
✗ Does NOT call ctx.fireExceptionCaught(cause)
→ Exception is silently swallowed
→ Channel remains TCP-active but application-dead
↓
NettyServerHandler.exceptionCaught() → NEVER reached (exception stopped above)
↓
Consumer side: channel still active → isAvailable()=true → invoker never removed
→ Continuous timeout on every RPC call routed to this Provider
This is not limited to NoClassDefFoundError — any non-IOException/non-OutOfMemoryError exception during the Netty read loop would trigger the same behavior, leaving the channel in a zombie state.
Are you willing to submit a pull request to fix on your own?
- Yes I am willing to submit a pull request on my own!
Code of Conduct
- I agree to follow this project's Code of Conduct