AI-powered code review assistant for GitHub pull requests. Differentiated by a reproducible evaluation harness — 100 hand-labeled PRs with precision/recall/F1 per comment category, regression-gated in CI.
Every AI portfolio project in 2026 wraps an LLM with LangChain and calls it done. What's missing — and what production AI teams actually care about — is evaluation. How do you know your system works? How do you catch regressions when you change a prompt? Sentinel exists to answer those questions with numbers, not vibes.
| Decision | Choice | Why |
|---|---|---|
| Retrieval | Hybrid BM25 + dense | Pure vector misses exact identifiers. Hybrid is state of the art for code search. |
| LLM layer | Custom gateway, not LangChain | 200-line gateway with retries, cost tracking, and fallback. More debuggable and impressive. |
| Eval dataset | Hand-labeled, not LLM-labeled | Avoids the echo chamber. 100 PRs from 5 OSS repos, labeled by hand. |
| Output format | Pydantic + JSON mode | Type-safe structured output enables automated eval scoring. |