Healthcare: The Notes That Looked Fine
A Series B clinical AI company's ambient scribe had been in production for months. Evals were passing. We analysed 847 notes and found 127 failures - 23 severity-critical.
How teams use Composo to catch failures and ship with confidence.
A Series B clinical AI company's ambient scribe had been in production for months. Evals were passing. We analysed 847 notes and found 127 failures - 23 severity-critical.
A Series B SaaS company's AI support agent was making unauthorised financial commitments, inventing product features, and ignoring escalation requests. We found 189 failures in 1,243 responses.
A financial planning platform had 20+ AI agents across customer operations. A model update caused subtle drift that went undetected for three weeks - hundreds of slightly-off decisions compounding.
How an enterprise SaaS platform achieved 99.7% AI agent reliability, reduced QA costs by $1.2M, and closed $4.5M in deals using Composo evaluation.
How a seed-stage legal tech startup shipped their AI lease review MVP in 4 weeks, cut evaluation costs by 96%, and passed enterprise pilots with Composo.