enhancementgood first issue
Description
Summary
get_window_state returns tree_markdown and screenshot dimensions, but no per-element geometry (bounds, frame, x/y/width/height, AXPosition, AXSize, etc.).
This makes it impossible for downstream clients to map element indexes/refs to real coordinates.
Reproduction
list_windows includes window bounds
cua-driver call list_windows '{"on_screen_only":false}'
Example result:
{
"app_name": "Safari浏览器",
"bounds": { "x": 0, "y": 0, "width": 1920, "height": 30 },
"pid": 2395,
"window_id": 3882
}
get_window_state does not include element geometry
cua-driver call get_window_state '{"pid":2395,"window_id":140}' --raw
Observed structuredContent:
{
"bundle_id": "com.apple.Safari",
"element_count": 474,
"name": "Safari浏览器",
"pid": 2395,
"screenshot_height": 304,
"screenshot_scale_factor": 2,
"screenshot_width": 388,
"tree_markdown": "...",
"turn_id": 8
}
Missing fields:
boundsframeelementsAXPositionAXSize- per-element
x/y/width/height
Expected
Either:
- return a structured element list with geometry, or
- expose a geometry map keyed by element index, or
- document explicitly that
get_window_statedoes not provide per-element geometry.
Environment
- macOS 26.5
- CuaDriver.app version 0.2.0
Question
Is this intentional API design, or should get_window_state expose per-element geometry?