trycua/cua

cua-driver get_window_state omits per-element geometry

Open

#1 564 ouverte le 18 mai 2026

Voir sur GitHub
 (2 commentaires) (0 réactions) (0 assignés)HTML (1 051 forks)batch import
enhancementgood first issue

Métriques du dépôt

Stars
 (16 722 stars)
Métriques de merge PR
 (Merge moyen 7h 42m) (198 PRs mergées en 30 j)

Description

Summary

get_window_state returns tree_markdown and screenshot dimensions, but no per-element geometry (bounds, frame, x/y/width/height, AXPosition, AXSize, etc.).

This makes it impossible for downstream clients to map element indexes/refs to real coordinates.

Reproduction

list_windows includes window bounds

cua-driver call list_windows '{"on_screen_only":false}'

Example result:

{
  "app_name": "Safari浏览器",
  "bounds": { "x": 0, "y": 0, "width": 1920, "height": 30 },
  "pid": 2395,
  "window_id": 3882
}

get_window_state does not include element geometry

cua-driver call get_window_state '{"pid":2395,"window_id":140}' --raw

Observed structuredContent:

{
  "bundle_id": "com.apple.Safari",
  "element_count": 474,
  "name": "Safari浏览器",
  "pid": 2395,
  "screenshot_height": 304,
  "screenshot_scale_factor": 2,
  "screenshot_width": 388,
  "tree_markdown": "...",
  "turn_id": 8
}

Missing fields:

  • bounds
  • frame
  • elements
  • AXPosition
  • AXSize
  • per-element x/y/width/height

Expected

Either:

  1. return a structured element list with geometry, or
  2. expose a geometry map keyed by element index, or
  3. document explicitly that get_window_state does not provide per-element geometry.

Environment

  • macOS 26.5
  • CuaDriver.app version 0.2.0

Question

Is this intentional API design, or should get_window_state expose per-element geometry?

Guide contributeur