trycua/cua

cua-driver get_window_state omits per-element geometry

Open

Aperta il 18 mag 2026

Vedi su GitHub
 (2 commenti) (0 reazioni) (0 assegnatari)HTML (16.722 star) (1051 fork)batch import
enhancementgood first issue

Descrizione

Summary

get_window_state returns tree_markdown and screenshot dimensions, but no per-element geometry (bounds, frame, x/y/width/height, AXPosition, AXSize, etc.).

This makes it impossible for downstream clients to map element indexes/refs to real coordinates.

Reproduction

list_windows includes window bounds

cua-driver call list_windows '{"on_screen_only":false}'

Example result:

{
  "app_name": "Safari浏览器",
  "bounds": { "x": 0, "y": 0, "width": 1920, "height": 30 },
  "pid": 2395,
  "window_id": 3882
}

get_window_state does not include element geometry

cua-driver call get_window_state '{"pid":2395,"window_id":140}' --raw

Observed structuredContent:

{
  "bundle_id": "com.apple.Safari",
  "element_count": 474,
  "name": "Safari浏览器",
  "pid": 2395,
  "screenshot_height": 304,
  "screenshot_scale_factor": 2,
  "screenshot_width": 388,
  "tree_markdown": "...",
  "turn_id": 8
}

Missing fields:

  • bounds
  • frame
  • elements
  • AXPosition
  • AXSize
  • per-element x/y/width/height

Expected

Either:

  1. return a structured element list with geometry, or
  2. expose a geometry map keyed by element index, or
  3. document explicitly that get_window_state does not provide per-element geometry.

Environment

  • macOS 26.5
  • CuaDriver.app version 0.2.0

Question

Is this intentional API design, or should get_window_state expose per-element geometry?

Guida contributor