influxdata/telegraf

Consider allowing win_eventlog xpath queries with invalid channels

Open

#13,951 opened on Sep 20, 2023

View on GitHub
 (2 comments) (0 reactions) (0 assignees)Go (9,892 stars) (4,161 forks)batch import
feature requesthelp wantedsize/l

Description

Use Case

For Windows event logging, while in many cases you may know exactly which channels you need and can adjust the config per node/role/etc, there may be scenarios where you want "every channel in this list/xpath query I give you, if it exists". Rather than error out and not run, you might get a validated list of logs that exist, cross reference with what has been requested, log a warning for any channels that the user specified but which do not exist, adjust the xpath query that ultimately runs, and thus take input from those that do exist.

Expected behavior

Honestly this is just the behavior of using xpath, so it is expected, but if you were to interpret, validate, and re-write the xpath, or, to provide another format that goes through validation before being written to xpath, that would be very helpful.

Example:

On a system without the Microsoft-Windows-TPM-WMI channel:

PS > $Channels = Get-WinEvent -ListLog * -ErrorAction SilentlyContinue | Select -ExpandProperty LogName

PS > $Channels -contains 'System'
True

PS > $Channels -contains 'Microsoft-Windows-TPM-WMI'
False

Allow specifying either...

An xpath query with invalid channels:

  xpath_query = '''
  <QueryList>
  <Query Id="0">
    <Select Path="System">*</Select>
    <Select Path="Microsoft-Windows-TPM-WMI">*</Select>
  </Query>
  </QueryList>
  '''

You might parse that xpath xml-ish data to extract paths, run some 'Get-Me-All-Event-Channels-On-This-Computer' code, remove any paths not in that list, write warnings for those, re-write the xpath query to not include them so that the query does not error out.

Another option, is presumably what winlogbeat does. They have a DSL for defining what to collect. As an example:

winlogbeat.event_logs:
  - name: Security
    ignore_older: 168h
  - name: 'Microsoft-Windows-DNS-Client/Operational'
    ignore_older: 168h
    processors:
      - drop_event.when.not:
          and:
          - equals.event_id: 3008
          - and:
            - not.equals.event_data.QueryOptions: '140737488355328'
            - not.equals.event_data.QueryResults: ''

Presumably, they interpret the yaml, and build an xpath query from it - this allows them to validate that an event channel exists, and if it does not, winlogbeat continues processing logs with a warning noting that it couldn't find whichever channels do not exist.

To be honest, both seem complicated options to implement, and it would be nice if everyone knew exactly what to collect from what node/role/etc. but for some, I could see this being valuable.

Actual behavior

If any channel that does not exist is found, telegraf just relays the generic error Windows hands back, and does not start.

For example, using the system without Microsoft-Windows-TPM-WMI" from the Expected Behavior section, and the xpath query including that channel:

2023-09-20T11:38:59Z E! [telegraf] Error running agent: starting input inputs.win_eventlog::win_eventlog: subscription of Windows Event Log failed: The specified channel could not be found.

Additional info

Apologies, there are likely low level APIs or methods to list valid channels on a system. I've simply been using $Channels = Get-WinEvent -ListLog * -ErrorAction SilentlyContinue | Select -ExpandProperty LogName in PowerShell, which is definitely not going to be performant.

Without this, an alternative would be for users to spend the time identifying exactly what is needed from everything they run, or, to create some Frankenstein code that does this for them and adjusts the conf dynamically (really not ideal!)

Cheers!

Contributor guide