Eval API.
This page covers writing offline evaluations. For online evaluations, see Online evaluations.
Prerequisites
- Follow the procedure in Quickstart to set up Axiom AI SDK in your TypeScript project.
- Wrap your AI model with
wrapAISDKModelfor automatic tracing. See Instrumentation with Axiom AI SDK for details.
Authenticate with OAuth
Authenticate with OAuth
The Axiom AI SDK includes a CLI for authenticating and running offline evaluations. Authenticate so that evaluation runs are recorded in Axiom and attributed to your user account.This opens your browser and prompts you to authorize the CLI with your Axiom account. Once authorized, the CLI stores your credentials locally.
Login
Check authentication status
Switch organizations
If you belong to multiple Axiom organizations:Logout
Anatomy of an offline evaluation
TheEval function defines a complete test suite for your capability. Here’s the basic structure:
Key parameters
data: An array of test cases, or a function that returns an array of test cases. Each test case has aninput(what you send to your capability) and anexpectedoutput (the ground truth).task: An async function that executes your capability for a given input and returns the output.scorers: An array of scorer functions that evaluate the output against the expected result.metadata: Optional metadata like a description or tags.
Create collections
Thedata parameter defines your collection of test cases. Start with a small set of examples and grow it over time as you discover edge cases.
Inline collections
For small collections, define test cases directly in the offline evaluation:External collections
For larger collections, load test cases from external files or databases:Define tasks
Thetask function executes your AI capability for each test case. It receives the input from the test case and should return the output your capability produces.
The task function should generally be the same code you use in your actual capability. This ensures your evaluations accurately reflect real-world behavior.
Create scorers
Scorers evaluate your capability’s output. In offline evaluations, scorers receiveinput, output, and expected (ground truth), and return a score. For the full Scorer API reference including return types, patterns, and third-party integrations, see Scorers.
Here’s a quick example of an offline scorer that compares output to expected values:
Complete example
Here’s a complete evaluation for a support ticket classification system:src/lib/capabilities/classify-ticket/evaluations/spam-classification.eval.ts
File naming conventions
Name your evaluation files with the.eval.ts extension so they’re automatically discovered by the Axiom CLI:
**/*.eval.{ts,js,mts,mjs,cts,cjs} based on your axiom.config.ts configuration.
What’s next?
- To parameterize your capabilities and run experiments, see Flags and experiments.
- To run offline evaluations using the CLI, see Run offline evaluations.