倉庫

open-compass 的倉庫

[ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2 & [ICLR 2025] Mask-DPO

最近提交 2025年4月30日

 (65 stars) (3 forks) (0 個已索引 issue) (0 個開放 good first issue)

Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.

最近提交 2025年9月8日

 (111 stars) (5 forks) (0 個已索引 issue) (0 個開放 good first issue)

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

最近提交 2026年5月28日

 (7,047 stars) (780 forks) (1 個已索引 issue) (1 個開放 good first issue)