Skip to main content
Understanding these core concepts will help you get the most out of Lemma’s observability and evaluation platform.

Traces

A trace represents a single execution of your agent from start to finish. It captures:
  • Inputs — The initial state and parameters passed to your agent
  • Outputs — The final result produced by your agent
  • Spans — Nested operations within the execution (LLM calls, tool invocations, etc.)
  • Timing — Duration and timing of each operation
  • Metadata — Additional context like model names, token counts, and error states
Traces are built on OpenTelemetry, the industry-standard observability framework. Each trace has a unique trace ID and contains one or more spans organized hierarchically.

Run ID

The run ID is Lemma’s identifier for a specific agent execution. It’s returned by wrapAgent and used to:
  • Link metric events to specific traces
  • Associate experiment results with test cases
  • Query and filter traces in the dashboard
Think of the run ID as the primary key for a trace in Lemma’s system.

Spans

Spans are the building blocks of a trace. Each span represents a single operation within your agent’s execution, such as:
  • An LLM generation call
  • A tool or function invocation
  • A database query
  • A custom operation you want to track
Spans are hierarchical — a parent span can contain multiple child spans, creating a tree structure that represents your agent’s execution flow. This hierarchy makes it easy to understand:
  • Which operations happened in what order
  • How long each operation took
  • Where errors occurred in the execution path

Metrics

A metric in Lemma is a named feedback signal that you can record against traces. Metrics capture qualitative or quantitative assessments of your agent’s performance, such as:
  • User satisfaction (thumbs up/down)
  • Content moderation results
  • Factual accuracy scores
  • Task completion success
Each metric has:
  • A metric ID — Used when recording metric events
  • A name — Displayed in the dashboard
  • A type — The structure of values it accepts (boolean, number, string, etc.)

Metric Events

A metric event is a specific instance of feedback recorded against a trace. It connects:
  • A metric — What you’re measuring
  • A run ID — Which agent execution you’re measuring
  • A value — The feedback or assessment (e.g., { feedback: true, description: "Helpful response" })
Metric events power Lemma’s analysis features — you can filter traces by feedback, aggregate metrics across strategies, and track how changes affect user satisfaction over time.

Experiments

An experiment is a structured framework for evaluating your agent by running multiple strategies against a fixed set of test cases. Experiments help you answer questions like:
  • Which prompt performs better on customer support queries?
  • Does increasing temperature improve creativity without hurting accuracy?
  • How does GPT-4 compare to Claude on our specific use case?

Test Cases

Test cases are the inputs used to evaluate your agent in an experiment. Each test case contains:
  • Input data — The parameters to pass to your agent (e.g., user message, context)
  • Test case ID — A unique identifier used to link results across strategies
  • Expected output (optional) — A reference answer for comparison
You define test cases once, then run multiple strategies against the same set.

Strategies

A strategy is a specific configuration or approach you’re testing in an experiment. Examples include:
  • Different system prompts
  • Different models
  • Different temperature settings
  • Different agent architectures
When recording experiment results, you tag each trace with a strategy name. Lemma then groups results by strategy, making it easy to compare performance side-by-side.

Results

Results link your agent’s traces to the experiment and strategy. Each result contains:
  • Run ID — The trace for this execution
  • Test case ID — Which input was used
  • Strategy name — Which approach was tested
Results are recorded after running your agent on the experiment’s test cases. Once recorded, you can analyze them in the dashboard to see:
  • How each strategy performed on specific test cases
  • Aggregate metrics across all test cases
  • Patterns in failures or edge cases

Projects

A project is the top-level container in Lemma. It groups:
  • All traces from your agent(s)
  • Metrics you’ve defined
  • Experiments you’re running
Each project has:
  • A project ID — Used when sending traces and making API calls
  • An API key — For authentication
  • A dashboard for viewing and analyzing data
Most organizations use one project per application or product, but you can create multiple projects to separate environments (dev/staging/prod) or different agent types.

Tracer Provider

The tracer provider is the OpenTelemetry component responsible for:
  • Creating and managing spans
  • Exporting trace data to Lemma’s OTLP endpoint
  • Handling batching and retries
You configure the tracer provider once in your application with:
  • Lemma’s OTLP endpoint URL
  • Your API key and project ID
  • Span processors (RunBatchSpanProcessor groups all spans for an agent run and exports them together when the run completes)
Once registered, the tracer provider automatically captures spans from your agent and supported frameworks.

Next Steps

Now that you understand Lemma’s core concepts: