sMiNT0S/AIBugBench
From prompt to paste: evaluate AI / LLM output under a strict Python sandbox and get actionable scores across 7 categories, including security, correctness and upkeep.
Details
仓库信息
From prompt to paste: evaluate AI / LLM output under a strict Python sandbox and get actionable scores across 7 categories, including security, correctness and upkeep.
Stats
Loading...
Loading
--
Loading
--
Loading
--
Loading
--