Today's approaches to LLM evaluation are severely limited, with most teams ultimately resorting to manual human review due to the limitations of automated methods like "LLM as a judge." These approaches fall short, especially when tasked with evaluating complex, real-world outputs:
Most real-world LLM applications require a nuanced approach that goes beyond simple checklists. Evaluating complex criteria like assessing a "compelling legal argument," determining "appropriate empathy," verifying "alignment with medical guidelines," or measuring "clarity and engagement" for educational content demands flexibility, expertise, and context.
At Composo, we've developed Composo Align specifically for the most complex, nuanced use cases. Using a best-in-class foundation model built for evaluation, Composo Align learns to align LLM outputs with expert-level judgment, providing precise, scalable, and repeatable evaluations that meet the high demands of intricate applications.
Composo Align combines a reward model with a language model architecture, making it highly specialized for determining LLM output quality. Trained on a large dataset of expert evaluations, Composo Align is dedicated to reliably assessing quality across a range of criteria.
Composo Align's extensive training on diverse datasets makes it highly capable across various use cases, from creative generation to highly structured outputs. This general capability enables it to excel in specialized, complex fields like healthcare, finance, and legal.
While Composo Align is generally capable, we know that many applications require highly customized evaluation. Composo Align's framework allows for easy personalization by incorporating human preference data to adapt its core model to your unique requirements.
Evaluation is not static—quality standards evolve as applications and user expectations shift. Composo Align is designed to continually learn and update in real time, refining its measure of quality with more data over time.
Today we're launching a public API for Composo Align, giving you direct access to our highly capable evaluation model to try it out for your use cases. Try out the API for free here.
Get in touch to hear more or to get access to your own personalised version based on our most powerful evaluation models here.