Prerequisites
Before running experiments, you need:
- An experiment created in your Lemma project (find the experiment ID in your dashboard)
- Tracing set up in your agent (see Tracing Your Agent)
- Your API key and project ID (set
LEMMA_API_KEY and LEMMA_PROJECT_ID in your environment)
Installation
npm install @uselemma/experiments @uselemma/tracing
pip install uselemma-experiments uselemma-tracing
Quick Start
import { LemmaExperimentRunner } from "@uselemma/experiments";
import { wrapAgent } from "@uselemma/tracing";
const runner = new LemmaExperimentRunner();
const wrappedAgent = wrapAgent(
"support-agent",
async ({ onComplete }, input) => {
const result = await callLLM(input.query);
onComplete(result);
return result;
},
{ autoEndRoot: true }
);
const summary = await runner.runExperiment({
experimentId: "your-experiment-id",
strategyName: "baseline",
agent: async (input) => {
const { runId } = await wrappedAgent(input);
return { runId };
},
});
console.log(`${summary.successful}/${summary.total} completed`);
from uselemma_experiments import LemmaExperimentRunner
from uselemma_tracing import wrap_agent
runner = LemmaExperimentRunner()
async def run_agent(ctx, input):
result = await call_llm(input["query"])
ctx.on_complete(result)
return result
wrapped = wrap_agent("support-agent", run_agent, auto_end_root=True)
async def agent(input):
result, run_id, span = await wrapped(input)
return {"runId": run_id}
summary = await runner.run_experiment(
experiment_id="your-experiment-id",
strategy_name="baseline",
agent=agent,
)
print(f"{summary['successful']}/{summary['total']} completed")
The runner calls enableExperimentMode() and sets up the OTel exporter internally. You don’t need to pass isExperiment: true to wrapAgent or configure tracing separately.
Example: Comparing Prompt Strategies
Here’s a complete example comparing two prompt strategies against the same test cases:
import { LemmaExperimentRunner } from "@uselemma/experiments";
import { wrapAgent } from "@uselemma/tracing";
const runner = new LemmaExperimentRunner();
const EXPERIMENT_ID = "your-experiment-id";
const strategies = {
concise: {
systemPrompt: "You are a helpful assistant. Be brief and direct.",
},
detailed: {
systemPrompt:
"You are a helpful assistant. Provide thorough explanations with examples when relevant.",
},
};
for (const [strategyName, config] of Object.entries(strategies)) {
const wrappedAgent = wrapAgent(
"support-agent",
async ({ onComplete }, input) => {
const result = await callLLM(config.systemPrompt, input.query);
onComplete(result);
return result;
},
{ autoEndRoot: true }
);
const summary = await runner.runExperiment({
experimentId: EXPERIMENT_ID,
strategyName,
agent: async (input) => {
const { runId } = await wrappedAgent(input);
return { runId };
},
});
console.log(`${strategyName}: ${summary.successful}/${summary.total}`);
}
from uselemma_experiments import LemmaExperimentRunner
from uselemma_tracing import wrap_agent
runner = LemmaExperimentRunner()
EXPERIMENT_ID = "your-experiment-id"
strategies = {
"concise": {"system_prompt": "You are a helpful assistant. Be brief and direct."},
"detailed": {"system_prompt": "You are a helpful assistant. Provide thorough explanations with examples when relevant."},
}
for strategy_name, config in strategies.items():
async def run_agent(ctx, input, cfg=config):
result = await call_llm(cfg["system_prompt"], input["query"])
ctx.on_complete(result)
return result
wrapped = wrap_agent("support-agent", run_agent, auto_end_root=True)
async def agent(input):
result, run_id, span = await wrapped(input)
return {"runId": run_id}
summary = await runner.run_experiment(
experiment_id=EXPERIMENT_ID,
strategy_name=strategy_name,
agent=agent,
)
print(f"{strategy_name}: {summary['successful']}/{summary['total']}")
Concurrency and Progress
By default, the runner executes all test cases in parallel and shows a terminal progress bar. You can limit concurrency or disable the progress bar:
await runner.runExperiment({
experimentId: "exp-1",
strategyName: "baseline",
agent,
concurrency: 5, // max 5 parallel runs
progress: false, // no progress bar (useful in CI)
});
await runner.run_experiment(
experiment_id="exp-1",
strategy_name="baseline",
agent=agent,
concurrency=5, # max 5 parallel runs
progress=False, # no progress bar (useful in CI)
)
Viewing Results
Once you’ve recorded results, head to your experiment in the Lemma dashboard to:
- Compare strategies side-by-side — See how each approach performed on the same inputs
- Analyze traces — Drill into individual executions to understand behavior differences
- Track metrics — If your experiment has an associated metric, view aggregated feedback per strategy
- Identify patterns — Find which inputs cause problems for certain strategies
Next Steps
- SDK source — Full types, constructor options, and implementation in the uselemma/sdk repo
- Understand experiments — Experiments Overview covers the conceptual model