▌ GitHub radar
The AI Agent Eval Bible Is Now Open
443+ verified resources on AI agent evaluation — papers, tools, benchmarks — with deep reading notes and a runnable playbook. The gap-audited list that the field was missing.
A curated, actively-maintained library of 443+ links covering everything about evaluating AI agents — papers, frameworks, talks with timestamps, and a runnable PATTERNS.md playbook with code examples. It stands out because it was built with depth-4 citation analysis of 11,600 papers and adversarial gap audits, not just link-dumping. The viral moment came as frontier labs released major eval-focused posts and RL training made evaluation infrastructure the real bottleneck.
Why a vibe-coder should care
If you build anything with AI agents — or just want to understand why some tools feel reliable and others hallucinate constantly — this library explains how to measure that difference. Not just theory: there's runnable code showing exactly how to set up evaluation for your own projects.
More finds