Tag

Evals

6 articles tagged with Evals.

6 articles tagged Evals

AI Product Building

AI Software Quality Needs a Factory Again

AI software quality is a production discipline. Code got cheap, but review, evals, rollback, and observability did not.

22 May8 min read

Read →

AI Product Building

Tests Pass. Does It Think?

When AI writes the code, green CI isn't enough. The new discipline is understanding and defending the choices the model made — not just the ones you made.

2 Mar8 min read

Read →

AI Product BuildingDeep dive

How to Measure an AI Product (When Traditional Metrics Lie)

DAU, time-in-app, and NPS were built for a world where humans do the work. AI products need different metrics. A framework for what to measure and why.

9 Feb10 min read

Read →

AI Product BuildingDeep dive

Your Agent Evals Are Vibes. Here's How to Make Them Infrastructure.

Most teams evaluate agents with manual chats and gut feel. A practical framework for eval suites that let you ship, starting with 20 examples, not 20,000.

5 Jan13 min read

Read →

AI Product Building

Stop Building AI Agents. Start Building SOPs Wrapped in Code.

A 5-step agent at 95% accuracy per step is only 77% reliable. The path forward isn't better agents, it's narrower ones. Three rules for workflows that ship.

15 Dec9 min read

Read →

AI Product Building

Stop Picking Winners in the Model Race. Build the Router Instead.

Building for a single model is technical debt with a short shelf life. The winning strategy is orchestration, evals, and governance, not leaderboard loyalty.

8 Dec8 min read

Read →