Teardowns of systems we've built. Opinions on where the AI consulting space is wrong. The occasional rant. No "what is an agent" listicles.
A 4,200-word walk-through of the eval suite we built for a fintech client — 600 cases, weighted toward the edge ones, run on every commit. It's the unsexy part of AI work that nobody writes about because it isn't a model launch.
if (confidence < 0.8) { escalate(); }
$ npm i @your-life-savings
SELECT * FROM honesty
Twice a month, max. Long-form teardowns and the occasional opinion. Unsubscribe with one click — we won't act offended.