[RFC]: Improve environment variable declaration and handling
#31249 opened on Dec 24, 2025
Description
Motivation.
Currently, all environment variables are declared twice: the type definition and the custom getter lambda function (shown below). Due to the sheer amount of env vars, it is difficult to keep these in sync, often leading to incorrect type declarations and defaults. Documentation is usually in the form of comments near the getter instead of a docstring on the type declaration, making it unavailable to the IDE. Finally, apart from strings, most variables are ints, bools, or paths, reimplementing custom parser logic. While #25700 argues for limiting the use of envvars (which I agree with), we should still handle the env vars we do have in a robust way.
if TYPE_CHECKING:
VLLM_HOST_IP: str = ""
VLLM_PORT: Optional[int] = None
VLLM_CACHE_ROOT: str = os.path.expanduser("~/.cache/vllm")
...
environment_variables: dict[str, Callable[[], Any]] = {
# Root directory for vLLM cache files
# Defaults to `~/.cache/vllm` unless `XDG_CACHE_HOME` is set
"VLLM_CACHE_ROOT":
lambda: os.path.expanduser(
os.getenv(
"VLLM_CACHE_ROOT",
os.path.join(get_default_cache_root(), "vllm"),
)),
# used in distributed environment to determine the ip address
# of the current node, when the node has multiple network interfaces.
# If you are using multi-node inference, you should set this differently
# on each node.
'VLLM_HOST_IP':
lambda: os.getenv('VLLM_HOST_IP', ""),
# used in distributed environment to manually set the communication port
# Note: if VLLM_PORT is set, and some code asks for multiple ports, the
# VLLM_PORT will be used as the first port, and the rest will be generated
# by incrementing the VLLM_PORT value.
'VLLM_PORT':
get_vllm_port,
...
}
Proposed Change.
Instead of duplicating the type annotations and getter dictionary, I propose the following:
- Use the type-hinted declaration as the single source of truth, including defaults and custom parsing, and move them into
envs/_variables.py - Import those directly
if TYPE_CHECKING, otherwise use them as a default store - Allow lazy defaults
- Standardize argument parsing for
str/int/float/bool/Path/list[str]"trivial" types - Add a pre-commit check to validate _variables.py (check lazy default type consistency) and that they are never imported directly
I worked on a prototype of this a while ago in #23601 (warning: I couldn't get GH Copilot to clean up the code as intended so it's not nearly finished).
Example declarations:
VLLM_LOGGING_LEVEL: str = env_factory("INFO", lambda x: x.upper())
"""Logging level for vLLM. Default is "INFO"."""
VLLM_HOST_IP: str = ""
"""
Used in distributed environment to determine the ip address of the current node,
when the node has multiple network interfaces.
If you are using multi-node inference, you should set this differently on each node.
"""
def get_vllm_port(env_port: str) -> int:
...
VLLM_PORT: Optional[int] = env_factory(None, get_vllm_port)
"""
Used in distributed environment to manually set the communication port.
Note: if VLLM_PORT is set, and some code asks for multiple ports, the VLLM_PORT will be
used as the first port, and the rest will be generated by incrementing the VLLM_PORT value.
"""
VLLM_CACHE_ROOT: Path = env_default_factory(lambda: parse_path(
os.getenv("XDG_CACHE_HOME", "~/.cache")) / "vllm")
"""Root directory for vLLM cache files. Defaults to `~/.cache/vllm` unless `XDG_CACHE_HOME` is set."""
...
from typing import TYPE_CHECKING, get_type_hints
import envs._variables
from envs._variables import _defaults as _env_defaults
from envs.utils import EnvFactory, parse_list, parse_path, unwrap_optional, is_type_with_args
if TYPE_CHECKING:
# This way IDEs & type checkers get the declarations directly
from envs._variables import *
_type_hints = get_type_hints(_variables)
def __getattr__(name):
if name not in _env_defaults:
raise AttributeError(f"module {__name__} has no attribute {name}")
default = _env_defaults[name]
if (value := os.getenv(name)) is None:
# Not found, return default value
if isinstance(default, EnvFactory):
default = default.default_value
# Call if lazy-initialized
return default() if callable(default) else default
if isinstance(default, EnvFactory):
# If factory provided, use it to parse string
return default.parse(value)
# Automatic parsing of "trivial" data types
var_type = get_type_hints(_variables)[name] # type(default_value)
var_type = unwrap_optional(var_type)
if var_type is str:
return value
if var_type in (int, float):
return var_type(value)
if var_type is bool:
return value.strip().lower() in ("1", "true")
if var_type is Path:
return parse_path(value)
if is_type_with_args(var_type, list,[str,]):
return parse_list(value)
raise ValueError(f"Unsupported type {var_type} for "
f"environment variable {name}")
Alternatively, we could create a class Envs with variables as properties and use function decorators, but the above proposal seems the most scalable (we have a lot of env vars so the simpler the declaration for each the better).
Feedback Period.
12/23 - 1/5
CC List.
@hmellor @aarnphm @simon-mo @zhuohan123 @ywang96 @DarkLight1337
Any Other Things.
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.