Experiments let you systematically evaluate your agent by running multiple strategies against a fixed set of test cases. Whether you’re comparing prompt variations, model choices, or architectural changes, experiments provide the structured framework to measure what actually works.
Each experiment contains test cases (the inputs to evaluate) and results (traces linked to strategies). This structure makes it easy to compare how different approaches perform on identical inputs.
Prerequisites
Before running experiments, you need:
- An experiment created in your Lemma project (find the experiment ID in your dashboard)
- Tracing set up in your agent (see Tracing Your Agent)
- Your API key from your project settings
Workflow Overview
A typical experiment workflow looks like this:
- Get test cases — Fetch the inputs defined for your experiment
- Run your agent — Execute each strategy against the test cases
- Record results — Link each trace to the experiment with its strategy name
- Analyze — Compare performance across strategies in the dashboard
Get Test Cases
Retrieve all test cases for an experiment to iterate over them:
async function getTestCases(experimentId: string, projectId: string) {
const response = await fetch(
`https://api.uselemma.ai/experiments/${experimentId}/test-cases?project_id=${projectId}`,
{
headers: {
Authorization: `Bearer ${process.env.LEMMA_API_KEY}`,
},
}
);
if (!response.ok) {
throw new Error(`Failed to get test cases: ${response.statusText}`);
}
return response.json();
}
Run Your Agent and Record Results
For each strategy you want to test, run your agent against all test cases and record the results:
import { wrapAgent } from "@uselemma/tracing";
import { tracerProvider } from "./tracer"; // Your tracer setup
async function runExperiment(
experimentId: string,
projectId: string,
strategyName: string,
runAgent: (input: Record<string, any>) => Promise<{ result: any; runId: string }>
) {
// 1. Get test cases
const testCases = await getTestCases(experimentId, projectId);
// 2. Run agent on each test case and collect results
const results = [];
for (const testCase of testCases) {
const { result, runId } = await runAgent(testCase.inputData);
results.push({
runId,
testCaseId: testCase.id,
});
}
await tracerProvider.forceFlush(); // ensure all spans are sent to Lemma
// 3. Record all results for this strategy
await recordResults(experimentId, projectId, strategyName, results);
}
Calling tracerProvider.forceFlush() ensures all spans are sent to Lemma before recording results. This is important because the RunBatchSpanProcessor batches all spans for an agent run and exports them together when the run ends.
Record Results
Link traces to your experiment with a strategy name:
async function recordResults(
experimentId: string,
projectId: string,
strategyName: string,
results: Array<{ runId: string; testCaseId?: string }>
) {
const response = await fetch(
`https://api.uselemma.ai/experiments/${experimentId}/results?project_id=${projectId}`,
{
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${process.env.LEMMA_API_KEY}`,
},
body: JSON.stringify({
strategyName,
results,
}),
}
);
if (!response.ok) {
throw new Error(`Failed to record results: ${response.statusText}`);
}
return response.json();
}
Including testCaseId lets you compare how different strategies performed on the exact same input in the dashboard.
Example: Comparing Prompt Strategies
Here’s a complete example comparing two prompt strategies:
import { wrapAgent } from "@uselemma/tracing";
import { tracerProvider } from "./tracer";
const EXPERIMENT_ID = "your-experiment-id";
const PROJECT_ID = "your-project-id";
// Define your strategies
const strategies = {
concise: {
systemPrompt: "You are a helpful assistant. Be brief and direct.",
},
detailed: {
systemPrompt:
"You are a helpful assistant. Provide thorough explanations with examples when relevant.",
},
};
// Agent runner for a specific strategy
function createAgentRunner(strategyConfig: { systemPrompt: string }) {
return async (input: Record<string, any>) => {
const wrappedFn = wrapAgent(
"support-agent",
async ({ onComplete }, agentInput) => {
const result = await callLLM(strategyConfig.systemPrompt, agentInput.query);
onComplete(result);
return result;
},
{ isExperiment: true }
);
const { result, runId } = await wrappedFn(input);
return { result, runId };
};
}
// Run experiment for each strategy
async function main() {
for (const [strategyName, config] of Object.entries(strategies)) {
console.log(`Running strategy: ${strategyName}`);
const agentRunner = createAgentRunner(config);
await runExperiment(EXPERIMENT_ID, PROJECT_ID, strategyName, agentRunner);
}
}
main();
Experiment Mode
Instead of passing isExperiment: true to each wrapAgent call, you can enable experiment mode globally. When enabled, all wrapAgent calls are automatically tagged as experiment runs:
import { enableExperimentMode, disableExperimentMode } from "@uselemma/tracing";
enableExperimentMode();
// All agent runs in this block are tagged as experiments
for (const testCase of testCases) {
await runAgent(testCase.inputData);
}
disableExperimentMode();
This is useful in experiment scripts where every agent run should be tagged as an experiment.
Viewing Results
Once you’ve recorded results, head to your experiment in the Lemma dashboard to:
- Compare strategies side-by-side — See how each approach performed on the same inputs
- Analyze traces — Drill into individual executions to understand behavior differences
- Track metrics — If your experiment has an associated metric, view aggregated feedback per strategy
- Identify patterns — Find which inputs cause problems for certain strategies