Skip to main content

Prerequisites

Before running experiments, you need:
  1. An experiment created in your Lemma project (find the experiment ID in your dashboard)
  2. Tracing set up in your agent (see Tracing Your Agent)
  3. Your API key and project ID (set LEMMA_API_KEY and LEMMA_PROJECT_ID in your environment)

Installation

npm install @uselemma/experiments @uselemma/tracing

Quick Start

import { LemmaExperimentRunner } from "@uselemma/experiments";
import { wrapAgent } from "@uselemma/tracing";

const runner = new LemmaExperimentRunner();

const wrappedAgent = wrapAgent(
  "support-agent",
  async ({ onComplete }, input) => {
    const result = await callLLM(input.query);
    onComplete(result);
    return result;
  },
  { autoEndRoot: true }
);

const summary = await runner.runExperiment({
  experimentId: "your-experiment-id",
  strategyName: "baseline",
  agent: async (input) => {
    const { runId } = await wrappedAgent(input);
    return { runId };
  },
});

console.log(`${summary.successful}/${summary.total} completed`);
The runner calls enableExperimentMode() and sets up the OTel exporter internally. You don’t need to pass isExperiment: true to wrapAgent or configure tracing separately.

Example: Comparing Prompt Strategies

Here’s a complete example comparing two prompt strategies against the same test cases:
import { LemmaExperimentRunner } from "@uselemma/experiments";
import { wrapAgent } from "@uselemma/tracing";

const runner = new LemmaExperimentRunner();
const EXPERIMENT_ID = "your-experiment-id";

const strategies = {
  concise: {
    systemPrompt: "You are a helpful assistant. Be brief and direct.",
  },
  detailed: {
    systemPrompt:
      "You are a helpful assistant. Provide thorough explanations with examples when relevant.",
  },
};

for (const [strategyName, config] of Object.entries(strategies)) {
  const wrappedAgent = wrapAgent(
    "support-agent",
    async ({ onComplete }, input) => {
      const result = await callLLM(config.systemPrompt, input.query);
      onComplete(result);
      return result;
    },
    { autoEndRoot: true }
  );

  const summary = await runner.runExperiment({
    experimentId: EXPERIMENT_ID,
    strategyName,
    agent: async (input) => {
      const { runId } = await wrappedAgent(input);
      return { runId };
    },
  });

  console.log(`${strategyName}: ${summary.successful}/${summary.total}`);
}

Concurrency and Progress

By default, the runner executes all test cases in parallel and shows a terminal progress bar. You can limit concurrency or disable the progress bar:
await runner.runExperiment({
  experimentId: "exp-1",
  strategyName: "baseline",
  agent,
  concurrency: 5,   // max 5 parallel runs
  progress: false,  // no progress bar (useful in CI)
});

Viewing Results

Once you’ve recorded results, head to your experiment in the Lemma dashboard to:
  • Compare strategies side-by-side — See how each approach performed on the same inputs
  • Analyze traces — Drill into individual executions to understand behavior differences
  • Track metrics — If your experiment has an associated metric, view aggregated feedback per strategy
  • Identify patterns — Find which inputs cause problems for certain strategies

Next Steps

  • SDK source — Full types, constructor options, and implementation in the uselemma/sdk repo
  • Understand experimentsExperiments Overview covers the conceptual model