Concepts [WIP]

Understanding these core concepts will help you get the most out of Lemma’s observability and evaluation platform.

Traces

A trace represents a single execution of your agent from start to finish. It captures:

Inputs — The initial state and parameters passed to your agent
Outputs — The final result produced by your agent
Spans — Nested operations within the execution (LLM calls, tool invocations, etc.)
Timing — Duration and timing of each operation
Metadata — Additional context like model names, token counts, and error states

Traces are built on OpenTelemetry, the industry-standard observability framework. Each trace has a unique trace ID and contains one or more spans organized hierarchically.

Run ID

The run ID is Lemma’s identifier for a specific agent execution. It’s returned by wrapAgent and used to:

Link metric events to specific traces
Associate experiment results with test cases
Query and filter traces in the dashboard

Think of the run ID as the primary key for a trace in Lemma’s system.

Spans

Spans are the building blocks of a trace. Each span represents a single operation within your agent’s execution, such as:

An LLM generation call
A tool or function invocation
A database query
A custom operation you want to track

Spans are hierarchical — a parent span can contain multiple child spans, creating a tree structure that represents your agent’s execution flow. This hierarchy makes it easy to understand:

Which operations happened in what order
How long each operation took
Where errors occurred in the execution path

Metrics

A metric in Lemma is a named feedback signal that you can record against traces. Metrics capture qualitative or quantitative assessments of your agent’s performance, such as:

User satisfaction (thumbs up/down)
Content moderation results
Factual accuracy scores
Task completion success

Each metric has:

A metric ID — Used when recording metric events
A name — Displayed in the dashboard
A type — The structure of values it accepts (boolean, number, string, etc.)

Metric Events

A metric event is a specific instance of feedback recorded against a trace. It connects:

A metric — What you’re measuring
A run ID — Which agent execution you’re measuring
A value — The feedback or assessment (e.g., { feedback: true, description: "Helpful response" })

Metric events power Lemma’s analysis features — you can filter traces by feedback, aggregate metrics across strategies, and track how changes affect user satisfaction over time.

Experiments

An experiment is a structured framework for evaluating your agent by running multiple strategies against a fixed set of test cases. Experiments help you answer questions like:

Which prompt performs better on customer support queries?
Does increasing temperature improve creativity without hurting accuracy?
How does GPT-4 compare to Claude on our specific use case?

Test Cases

Test cases are the inputs used to evaluate your agent in an experiment. Each test case contains:

Input data — The parameters to pass to your agent (e.g., user message, context)
Test case ID — A unique identifier used to link results across strategies
Expected output (optional) — A reference answer for comparison

You define test cases once, then run multiple strategies against the same set.

Strategies

A strategy is a specific configuration or approach you’re testing in an experiment. Examples include:

Different system prompts
Different models
Different temperature settings
Different agent architectures

When recording experiment results, you tag each trace with a strategy name. Lemma then groups results by strategy, making it easy to compare performance side-by-side.

Results

Results link your agent’s traces to the experiment and strategy. Each result contains:

Run ID — The trace for this execution
Test case ID — Which input was used
Strategy name — Which approach was tested

Results are recorded after running your agent on the experiment’s test cases. Once recorded, you can analyze them in the dashboard to see:

How each strategy performed on specific test cases
Aggregate metrics across all test cases
Patterns in failures or edge cases

Projects

A project is the top-level container in Lemma. It groups:

All traces from your agent(s)
Metrics you’ve defined
Experiments you’re running

Each project has:

A project ID — Used when sending traces and making API calls
An API key — For authentication
A dashboard for viewing and analyzing data

Most organizations use one project per application or product, but you can create multiple projects to separate environments (dev/staging/prod) or different agent types.

Tracer Provider

The tracer provider is the OpenTelemetry component responsible for:

Creating and managing spans
Exporting trace data to Lemma’s OTLP endpoint
Handling batching and retries

You configure the tracer provider once in your application with:

Lemma’s OTLP endpoint URL
Your API key and project ID
Span processors (RunBatchSpanProcessor groups all spans for an agent run and exports them together when the run completes)

Once registered, the tracer provider automatically captures spans from your agent and supported frameworks.

Next Steps

Now that you understand Lemma’s core concepts:

Explore Tracing Integrations to start sending traces
Learn about Recording Metric Events to capture feedback
Discover Running Experiments to evaluate your agent systematically

Getting Started

Tracing

​Traces

​Run ID

​Spans

​Metrics

​Metric Events

​Experiments

​Test Cases

​Strategies

​Results

​Projects

​Tracer Provider

​Next Steps