follow-up: preserve more structure than parsed_cmd.type="unknown" for shell-wrapper telemetry · rtk-ai/rtk#1460

(1 留言) (0 反應) (0 負責人)Rust (2,914 fork)batch import

area:cieffort-mediumenhancementfilter-qualityhelp wantedpriority:medium

倉庫指標

Star: (48,085 star)
PR 合併指標: (平均合併 11天 1小時) (30 天內合併 45 個 PR)

描述

Summary

Follow-up to #1459.

The lower-level exec telemetry appears to collapse complex shell wrappers and multi-line shell programs into a single fallback record:

"parsed_cmd": [{ "type": "unknown", "cmd": "...entire shell program..." }]

That fallback is understandable as a safety valve, but it throws away structure that downstream analytics need in order to distinguish:

real top-level commands,
wrapper-generated internals,
shell syntax/builtins,
and orchestrator invocations.

Once everything becomes type="unknown", higher layers have no reliable way to avoid false positives.

Why this deserves its own issue

#1459 is about the user-facing false positives in top-unhandled-command reporting.

This issue is narrower: the telemetry/parser layer itself is losing too much structure before analytics even run.

Concrete examples

Two representative cases from local Codex session JSONL logs:

1) Multi-line shell wrapper recorded as one unknown blob

A wrapper script containing multiple commands (git rev-parse, gstack-slug, find, tail, etc.) was stored as:

"parsed_cmd": [
  {
    "type": "unknown",
    "cmd": "set -e\nROOT=$(git rev-parse --show-toplevel)\nBRANCH=$(git branch --show-current)\n..."
  }
]

2) Single higher-level command still recorded as unknown

Even a single orchestrator command like:

omx explore --prompt 'Find repo files/symbols related to trk caching context ...'

was stored as:

"parsed_cmd": [{ "type": "unknown", "cmd": "omx explore --prompt '...'" }]

Expected

Telemetry should preserve more structure when possible, even if full semantic parsing is not available.

For example, instead of flattening everything to unknown, it could emit something like:

source_kind: single_command | shell_script | wrapper | builtin | syntax_fragment
argv0: omx or argv0: bash / argv0: zsh
top_level_command: omx explore
is_multiline: true
contains_shell_control_flow: true
confidence: low

Even partial structure would let analytics avoid treating the entire blob as one missing command.

Actual

The fallback record only preserves the raw shell text and type="unknown", which makes downstream classification much harder than it needs to be.

Why this matters

Without structured fallback metadata, downstream consumers are forced to infer meaning from raw shell text. That leads directly to issues like:

false “top unhandled commands”
command-count inflation from wrapper internals
inability to separate shell syntax from real executable gaps
brittle heuristics in analytics/reporting layers

Potential fixes

A few possible directions:

Keep unknown, but enrich it with source-kind / top-level token / multiline flags.
Special-case simple single-command invocations so omx explore ... is not indistinguishable from a 20-line shell script.
Emit a parser confidence level so analytics can exclude low-confidence fallbacks from user-facing ranking.
Represent shell wrappers hierarchically (shell -> top-level exec -> nested tokens) instead of flattening to one opaque string.

Relation to existing issues

#1459 — user-facing false positives from shell-wrapper fallbacks
#1441 — false positives caused by pre-hook vs post-hook command text

This issue sits below both: preserve better telemetry structure so later layers can reason correctly.

Environment

Observed on: April 22, 2026
Context: Codex / OMX / gstack-heavy workflows
OS: macOS

貢獻者指南

研究方向: 檢查程式碼庫中的遙測解析器，特別是用來產生型別為'unknown'的'parsed cmd'的部分。了解 shell wrapper 如何將多個指令折疊成一個 unknown 記錄。然後實作其中一個建議的修正，例如加入'source kind'和'top level command'欄位。用問題中的範例測試，確保結構被保留。
技術棧: rust
領域: clideveloper experience
議題類型: 功能
難度: 3
預計時間: 1-2 天
活動狀態: 活躍
清晰度: 清晰
前置要求: Rust
新手友善度: 65