▌ GitHub radar

The AI Agent Eval Bible Is Now Open

2026-06-26

443+ verified resources on AI agent evaluation — papers, tools, benchmarks — with deep reading notes and a runnable playbook. The gap-audited list that the field was missing.

New posts every dayFollow me on TelegramWhere the AI world lives — daily AI news + Claude Code tipsFollow →

01benchflow-ai/awesome-evals★ 364

A curated, actively-maintained library of 443+ links covering everything about evaluating AI agents — papers, frameworks, talks with timestamps, and a runnable PATTERNS.md playbook with code examples. It stands out because it was built with depth-4 citation analysis of 11,600 papers and adversarial gap audits, not just link-dumping. The viral moment came as frontier labs released major eval-focused posts and RL training made evaluation infrastructure the real bottleneck.

Why a vibe-coder should care

If you build anything with AI agents — or just want to understand why some tools feel reliable and others hallucinate constantly — this library explains how to measure that difference. Not just theory: there's runnable code showing exactly how to set up evaluation for your own projects.

Open on GitHub →

More finds

2026-06-26

Open Source Reimagines Sakana's LLM Router

2026-06-25

Open Oura: your ring data, no cloud required

2026-06-25

Liquid Glass for the Web: Apple Effect in React

All finds →