Traces
A trace represents a single execution of your agent from start to finish. It captures:- Inputs — The initial state and parameters passed to your agent
- Outputs — The final result produced by your agent
- Spans — Nested operations within the execution (LLM calls, tool invocations, etc.)
- Timing — Duration and timing of each operation
- Metadata — Additional context like model names, token counts, and error states
Run ID
The run ID is Lemma’s identifier for a specific agent execution. It’s returned bywrapAgent and used to:
- Link metric events to specific traces
- Associate experiment results with test cases
- Query and filter traces in the dashboard
Spans
Spans are the building blocks of a trace. Each span represents a single operation within your agent’s execution, such as:- An LLM generation call
- A tool or function invocation
- A database query
- A custom operation you want to track
- Which operations happened in what order
- How long each operation took
- Where errors occurred in the execution path
Metrics
A metric in Lemma is a named feedback signal that you can record against traces. Metrics capture qualitative or quantitative assessments of your agent’s performance, such as:- User satisfaction (thumbs up/down)
- Content moderation results
- Factual accuracy scores
- Task completion success
- A metric ID — Used when recording metric events
- A name — Displayed in the dashboard
- A type — The structure of values it accepts (boolean, number, string, etc.)
Metric Events
A metric event is a specific instance of feedback recorded against a trace. It connects:- A metric — What you’re measuring
- A run ID — Which agent execution you’re measuring
- A value — The feedback or assessment (e.g.,
{ feedback: true, description: "Helpful response" })
Experiments
An experiment is a structured framework for evaluating your agent by running multiple strategies against a fixed set of test cases. Experiments help you answer questions like:- Which prompt performs better on customer support queries?
- Does increasing temperature improve creativity without hurting accuracy?
- How does GPT-4 compare to Claude on our specific use case?
Test Cases
Test cases are the inputs used to evaluate your agent in an experiment. Each test case contains:- Input data — The parameters to pass to your agent (e.g., user message, context)
- Test case ID — A unique identifier used to link results across strategies
- Expected output (optional) — A reference answer for comparison
Strategies
A strategy is a specific configuration or approach you’re testing in an experiment. Examples include:- Different system prompts
- Different models
- Different temperature settings
- Different agent architectures
Results
Results link your agent’s traces to the experiment and strategy. Each result contains:- Run ID — The trace for this execution
- Test case ID — Which input was used
- Strategy name — Which approach was tested
- How each strategy performed on specific test cases
- Aggregate metrics across all test cases
- Patterns in failures or edge cases
Projects
A project is the top-level container in Lemma. It groups:- All traces from your agent(s)
- Metrics you’ve defined
- Experiments you’re running
- A project ID — Used when sending traces and making API calls
- An API key — For authentication
- A dashboard for viewing and analyzing data
Tracer Provider
The tracer provider is the OpenTelemetry component responsible for:- Creating and managing spans
- Exporting trace data to Lemma’s OTLP endpoint
- Handling batching and retries
- Lemma’s OTLP endpoint URL
- Your API key and project ID
- Span processors (
RunBatchSpanProcessorgroups all spans for an agent run and exports them together when the run completes)
Next Steps
Now that you understand Lemma’s core concepts:- Explore Tracing Integrations to start sending traces
- Learn about Recording Metric Events to capture feedback
- Discover Running Experiments to evaluate your agent systematically

