Running Experiments

Prerequisites

Before running experiments, you need:

An experiment created in your Lemma project (find the experiment ID in your dashboard)
Tracing set up in your agent (see Tracing Your Agent)
Your API key and project ID (set LEMMA_API_KEY and LEMMA_PROJECT_ID in your environment)

Installation

TypeScript
Python

npm install @uselemma/experiments @uselemma/tracing

pip install uselemma-experiments uselemma-tracing

Quick Start

TypeScript
Python

import { LemmaExperimentRunner } from "@uselemma/experiments";
import { wrapAgent } from "@uselemma/tracing";

const runner = new LemmaExperimentRunner();

const wrappedAgent = wrapAgent(
  "support-agent",
  async ({ onComplete }, input) => {
    const result = await callLLM(input.query);
    onComplete(result);
    return result;
  },
  { autoEndRoot: true }
);

const summary = await runner.runExperiment({
  experimentId: "your-experiment-id",
  strategyName: "baseline",
  agent: async (input) => {
    const { runId } = await wrappedAgent(input);
    return { runId };
  },
});

console.log(`${summary.successful}/${summary.total} completed`);

from uselemma_experiments import LemmaExperimentRunner
from uselemma_tracing import wrap_agent

runner = LemmaExperimentRunner()

async def run_agent(ctx, input):
    result = await call_llm(input["query"])
    ctx.on_complete(result)
    return result

wrapped = wrap_agent("support-agent", run_agent, auto_end_root=True)

async def agent(input):
    result, run_id, span = await wrapped(input)
    return {"runId": run_id}

summary = await runner.run_experiment(
    experiment_id="your-experiment-id",
    strategy_name="baseline",
    agent=agent,
)

print(f"{summary['successful']}/{summary['total']} completed")

The runner calls enableExperimentMode() and sets up the OTel exporter internally. You don’t need to pass isExperiment: true to wrapAgent or configure tracing separately.

Example: Comparing Prompt Strategies

Here’s a complete example comparing two prompt strategies against the same test cases:

TypeScript
Python

import { LemmaExperimentRunner } from "@uselemma/experiments";
import { wrapAgent } from "@uselemma/tracing";

const runner = new LemmaExperimentRunner();
const EXPERIMENT_ID = "your-experiment-id";

const strategies = {
  concise: {
    systemPrompt: "You are a helpful assistant. Be brief and direct.",
  },
  detailed: {
    systemPrompt:
      "You are a helpful assistant. Provide thorough explanations with examples when relevant.",
  },
};

for (const [strategyName, config] of Object.entries(strategies)) {
  const wrappedAgent = wrapAgent(
    "support-agent",
    async ({ onComplete }, input) => {
      const result = await callLLM(config.systemPrompt, input.query);
      onComplete(result);
      return result;
    },
    { autoEndRoot: true }
  );

  const summary = await runner.runExperiment({
    experimentId: EXPERIMENT_ID,
    strategyName,
    agent: async (input) => {
      const { runId } = await wrappedAgent(input);
      return { runId };
    },
  });

  console.log(`${strategyName}: ${summary.successful}/${summary.total}`);
}

from uselemma_experiments import LemmaExperimentRunner
from uselemma_tracing import wrap_agent

runner = LemmaExperimentRunner()
EXPERIMENT_ID = "your-experiment-id"

strategies = {
    "concise": {"system_prompt": "You are a helpful assistant. Be brief and direct."},
    "detailed": {"system_prompt": "You are a helpful assistant. Provide thorough explanations with examples when relevant."},
}

for strategy_name, config in strategies.items():
    async def run_agent(ctx, input, cfg=config):
        result = await call_llm(cfg["system_prompt"], input["query"])
        ctx.on_complete(result)
        return result

    wrapped = wrap_agent("support-agent", run_agent, auto_end_root=True)

    async def agent(input):
        result, run_id, span = await wrapped(input)
        return {"runId": run_id}

    summary = await runner.run_experiment(
        experiment_id=EXPERIMENT_ID,
        strategy_name=strategy_name,
        agent=agent,
    )

    print(f"{strategy_name}: {summary['successful']}/{summary['total']}")

Concurrency and Progress

By default, the runner executes all test cases in parallel and shows a terminal progress bar. You can limit concurrency or disable the progress bar:

TypeScript
Python

await runner.runExperiment({
  experimentId: "exp-1",
  strategyName: "baseline",
  agent,
  concurrency: 5,   // max 5 parallel runs
  progress: false,  // no progress bar (useful in CI)
});

await runner.run_experiment(
    experiment_id="exp-1",
    strategy_name="baseline",
    agent=agent,
    concurrency=5,   # max 5 parallel runs
    progress=False,  # no progress bar (useful in CI)
)

Viewing Results

Once you’ve recorded results, head to your experiment in the Lemma dashboard to:

Compare strategies side-by-side — See how each approach performed on the same inputs
Analyze traces — Drill into individual executions to understand behavior differences
Track metrics — If your experiment has an associated metric, view aggregated feedback per strategy
Identify patterns — Find which inputs cause problems for certain strategies

Next Steps

SDK source — Full types, constructor options, and implementation in the uselemma/sdk repo
Understand experiments — Experiments Overview covers the conceptual model

Getting Started

Tracing

Experiments

Integrations

Running Experiments

Prerequisites

Installation

Quick Start

Example: Comparing Prompt Strategies

Concurrency and Progress

Viewing Results

Next Steps

Getting Started

Tracing

Experiments

Integrations

​Prerequisites

​Installation

​Quick Start

​Example: Comparing Prompt Strategies

​Concurrency and Progress

​Viewing Results

​Next Steps

Prerequisites

Installation

Quick Start

Example: Comparing Prompt Strategies

Concurrency and Progress

Viewing Results

Next Steps