The Generative AI evaluation platform

Replace manual vibe checks with effortless, automated evaluation that works.

We get it, evaluating the performance of GenAI apps is tough.

Teams are spending a lot of time doing manual testing by vibes, and finding that most automated evaluation approaches don’t really work.

Composo makes GenAI evaluation easy.

We have developed the highest quality evaluation methods, and put them in a platform that’s extremely easy to use.

build around...

Effortless to use

Link an app in 5 minutes, run evaluations with no code.

Testing in progress...

Powerful evaluation

Proprietary, research-backed methods that are better than a human.

Prototype to production

Use Composo whether you’re at the earliest stage of prototyping, or in production at scale.

learning curve

How it works

Why companies choose Composo

Composo is the most powerful GenAI tool you don't have to be a developer to use.

Simple set up

With a simple setup, just three lines of code are needed to link an app, allowing you to run evaluations straight out of the box.

No code evaluation & iteration

There’s no need to leave Composo to run evaluations or iterate on app parameters such as prompts, models, or RAG settings. Your app continues to live in code with our unique two-way app integration.

Collaborate between engineering & product

Engineers can keep modifying code in their environment, while others easily iterate and evaluate within Composo, enabling smooth collaboration between engineering and product teams.

Any application

Our solution works with any application, from chatbots to copilots, and supports complex setups including agents, RAG, and tool integrations.

Industry leading, research-backed approach

We go beyond using LLMs for judgment and ground-truth comparisons, incorporating state-of-the-art hallucination detection and custom-trained evaluation models to deliver the best performance.

Evaluation that learns with your app

Our custom models learn to emulate human judgment, handling the complexity and subjectivity of LLM outputs with precision.

A smooth, yet
powerful workflow

all your apps

Our blog

Our Team

seb

Sebastian Fox

CEO

Ex-McKinsey & QuantumBlack
Oxford University

luke

Luke Markham

CTO

Ex-Graphcore ML Engineer
Oxford University

Ready to try Composo?

No additional steps, complexity, or changes to how you already work. Start using Composo today and build the future of AI.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
faqs

FAQs

How does Composo work?

Composo simplifies & automates the process of testing and optimizing LLM-based applications to achieve high quality, accuracy, and safety.

Composo links to an application with just a few lines of code, and then enables you to access & iterate on all the core components of a GenAI system (e.g. prompts, models, and RAG settings).

You can use Composo to do both quick, ad-hoc iteration, as well as to conduct rigorous, systematic testing of an application to evaluate how different configurations of an application impact the quality of generated outputs.

One of the key benefits of Composo, is that it requires no code to use once set up. This means that anyone can run tests and optimize an application while an engineer simultaneously continues to build out the codebase as normal. This is particularly crucial in instances where someone other than the developer needs to be able to test, evaluate & provide feedback on an application (e.g. legal, HR, medical, customer service applications).

How do you determine accuracy & detect hallucinations?

To determine accuracy and detect hallucinations, we employ a range of methods including:

1. Comparison to ground-truth or gold standard answers: We evaluate the model's output against verified, correct responses to assess its accuracy.

2. RAG metrics: Retrieval augmented generation can be quantified with a range of measures such as faithfulness to source material (i.e. number of claims in answer that are supported by underlying source material), relevance to the query and precision & recall of the context provided.

3. AI critic: We use language models to assess the quality and accuracy of the generated output. This is powered by the Composo AI critic which is built to identify hallucinations or inconsistencies.

How secure is my data?

At Composo, data security is our top priority. We serve many enterprise customers in regulated industries and employ robust measures to safeguard your information.

Your application runs entirely on your own server and Composo only has access to inputs and outputs, never your underlying data or systems.

For highly sensitive data, we recommend anonymization. However, we also offer end-to-end encryption and dedicated instances where absolutely necessary.

Can Composo deploy on my company's infra?

Yes. Reach out to us to discuss further!

How easy is it to use?

Composo is designed to be really easy to use.

The initial setup for an engineer is very quick, requiring only a few lines of code.

After this set up, using Composo is intuitive & requires no-code. This makes it perfect for anyone whether technical or not (e.g. a business user, domain expert or product manager).

Does Composo work for complex applications such as agentic systems?

Yes, Composo is designed to be completely flexible to handle any complexity of application, whether it has agents, retrieval-augmented generation (RAG), tool use, or anything else.

It provides the ability to test and optimize both full end-to-end performance, as well as to isolate and evaluate the performance of individual components or agents within a GenAI system.

Still have questions?

Let’s get in touch, we’d love to learn about what you’re building!

Contact Us

Start using Composo today

With evaluations built specifically for complex, highly specific domains, we make
it easy to take your GenAI apps testing to the next level.