back to projects
preview coming soon
full-stack + aieval: 100 PRs labeled

Sentinel

AI-powered code review assistant for GitHub pull requests. Differentiated by a reproducible evaluation harness — 100 hand-labeled PRs with precision/recall/F1 per comment category, regression-gated in CI.

The problem

Every AI portfolio project in 2026 wraps an LLM with LangChain and calls it done. What's missing — and what production AI teams actually care about — is evaluation. How do you know your system works? How do you catch regressions when you change a prompt? Sentinel exists to answer those questions with numbers, not vibes.

Architecture

GitHub WebhookFastAPIDiffParserHybrid RetrieverLLM GatewayStructured OutputGitHub Check RunsEval HarnessDashboardEval Harness (CI)Dashboard (Next.js)

Key decisions

DecisionChoiceWhy
RetrievalHybrid BM25 + densePure vector misses exact identifiers. Hybrid is state of the art for code search.
LLM layerCustom gateway, not LangChain200-line gateway with retries, cost tracking, and fallback. More debuggable and impressive.
Eval datasetHand-labeled, not LLM-labeledAvoids the echo chamber. 100 PRs from 5 OSS repos, labeled by hand.
Output formatPydantic + JSON modeType-safe structured output enables automated eval scoring.

Results

SECURITY F1
0.62
across 100 labeled PRs
AVG LATENCY
8.3s
per review
DAILY COST
$1.40
at 10 PRs/day