dynamic_modules/tls: support auto_host_sni for runtime hosts and SNI-scoped session reuse
#45.962 aberto em 3 de jul. de 2026
Métricas do repositório
- Stars
- (27.997 stars)
- Métricas de merge de PR
- (Mesclagem média 8d) (378 fundiu PRs em 30d)
Description
Title: dynamic_modules/tls: support auto_host_sni for runtime hosts and SNI-scoped session reuse
Description:
I would like to propose making auto_host_sni usable with hosts added at runtime by dynamic-module clusters, and making upstream TLS session reuse safe when one shared UpstreamTlsContext connects to multiple effective SNI names.
cc @wbpcode for visibility/context.
Today, dynamic-module clusters can add hosts dynamically, but the host-add ABI only carries socket addresses. For HTTPS upstreams using:
auto_host_sni: true
auto_sni_san_validation: true
Envoy needs a logical hostname on the selected HostDescription, separate from the concrete socket address used to connect. Without that, runtime-added hosts cannot cleanly use host-driven SNI/SAN validation without pushing per-host transport socket config through xDS.
There is also a related TLS correctness issue: upstream client TLS sessions are currently cached at ClientContextImpl scope. That is fine when a client TLS context maps to one server name, but when effective SNI varies by selected host, a session established for one SNI must not be offered to another SNI.
The proposed behavior is:
- Extend the dynamic-module cluster host-add API so runtime-added hosts may carry an optional logical hostname.
- Scope upstream client TLS session caching by effective SNI.
- Include the router/async host-selection support needed for this to work after async
ChooseHost.
Public API/interface notes:
- Add a dynamic-module cluster ABI path for adding hosts with optional hostnames.
- Preserve existing address-only ABI behavior and compatibility.
- Add or discuss TLS config surface for bounded SNI-scoped client session caching.
- No xDS per-host
transport_socket_matchesshould be required for the target use case.
The PoC demonstrates one dynamic-module cluster with two HTTPS upstreams. Each runtime-added host has:
- a concrete resolved socket address for connection, and
- a distinct logical hostname for SNI/SAN validation.
The Envoy config uses one shared UpstreamTlsContext with:
auto_host_sni: true
auto_sni_san_validation: true
Expected validation for an upstream implementation:
- dynamic-module cluster tests cover hostnames passed through the ABI, null/empty hostname legacy behavior, and synthesized hostname preservation.
- TLS tests cover session reuse within the same SNI and no reuse across different SNI names.
- TLS tests cover bounded eviction and empty-SNI behavior if SNI-scoped caching is configurable/bounded.
- router/async tests cover worker-local host resolution and transport socket option rebuild after async host selection.
- integration or regression coverage alternates requests between at least two runtime-added HTTPS hosts with distinct hostnames and verifies:
- both upstreams complete TLS successfully,
- SAN validation uses the selected host hostname,
- request order does not affect correctness,
- session resumption remains possible within the same SNI bucket only.
I am happy to split the implementation into reviewable PRs if maintainers prefer, but I think the issue should track the full behavior because the pieces interact.
[optional Relevant Links:]
- Minimal runnable example: https://github.com/dio/auto-sni-choose-host
- Prototype patch and notes: https://gist.github.com/dio/965d1e555909c02013ca882a2b3caa78
- Envoy contribution guidance: https://github.com/envoyproxy/envoy/blob/main/CONTRIBUTING.md