Quorum
Task-aware model routing (DeepSeek to Haiku to Sonnet to Opus) with K=3 adversarial skeptic verification and full trace UI.
Results
| Metric | Baseline | K=3 Verified |
|---|---|---|
| False Positive Rate | 27.8% | 0.0% |
| 95% CI | [11.1, 50.0] | [0, 0] |
| Recall | 100% | 77.8% |
| Labeled set | 36 snippets (incl. prompt-injection traps) | |
| Held-out bugs | 3/3 found, 0 surviving FP | |
| Cost per run | approx. $0.25 | |
58 tests. ruff + mypy + CI green. make eval-dry reproduces offline.