configuration: global options settable via environment variables natively
#9,766 opened on Oct 22, 2021
Description
Current Vector Version
vector 0.17.3 (x86_64-unknown-linux-gnu d72c6e7 2021-10-21)
Use-cases
We found ourselves very frequently writing this in our Vector pipeline files:
data_dir: ${VECTOR_DATA_DIR:-/var/lib/vector}
sources: ...
transforms: ...
sinks: ...
The reason for doing that is because we have different deployment scenarios that require slight changes on where Vector data needs to go. For example, in containers it doesn't matter as we bind-mount but on bare-metal we need to place the data in non-standard custom directories.
Attempted Solutions
As demonstrated on the above snippet, what we currently do is to use a template variable (with a default) to set the data_dir global configuration in the pipeline file itself.
Note that we believe that none of these variations of data_dir in the described use-case are actually business of the pipeline itself, but more on the operations side, i.e. the value depends on where you deploy the pipeline.
Proposal
Inspired by other containerised software out there, we think that a nice Quality-of-Life:tm: improvement for Vector would be to allow global configuration options to be settable via environment variables natively, without the need to use template variables if not desired. This would allow to clearly separate pipeline concerns from deployment concerns and would make the pipeline files cleaner/more concise.
For our use-case, given the following generic pipeline file:
sources: ...
transforms: ...
sinks: ...
These would be the desired behaviours of the same pipeline file when Vector is executed with:
| Env | Data Dir Value Used | Note |
|---|---|---|
| none | /var/lib/vector |
The default in Vector itself |
| VECTOR_DATA_DIR=/some/path | /some/path |
Taken from the env variable |
The same can be done for other global configuration options in Vector. The ones I think might be useful as environment variables (because they are very deployment-dependant) are:
healthchecks.enabled=>VECTOR_HEALTHCHECKS_ENABLEDhealthchecks.require_healthy=>VECTOR_HEALTHCHECKS_REQUIRE_HEALTHYproxy.*=>VECTOR_PROXY_*timezone=>VECTOR_TIMEZONE
Another thought: If an environment variable is given and the same configuration is present in the pipeline file, I think that the environment variable should take precedence.