hkust-nlp/ToolathlonPython
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
(384 stars) (38 forks) (0 件の索引済み issue) (0 件のオープンな good first issue)
リポジトリ
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution