Skip to main content
A tool call is a single tool invocation inside a trace — its name, arguments, result, and success or failure. Typing a span as a tool tells Lemma to show it as a tool execution with its input and output. Tool execution usually happens in your application code, so framework auto-instrumentation often cannot see it. Recording tool calls explicitly is one of the highest-value things you can do. Create tool spans inside the trace root callback so they nest under the trace.

Record a tool call

import { startActiveObservation } from "@langfuse/tracing";

const docs = await startActiveObservation(
  "search_docs",
  async (tool) => {
    const result = await searchDocs(query);
    tool.update({ input: { query }, output: result });
    return result;
  },
  { asType: "tool" },
);
The observation name is the tool name. Record the arguments as input and the result as output.

Record failures

A tool that fails is exactly what you will want to debug later. Mark it:
await startActiveObservation(
  "lookup_customer",
  async (tool) => {
    tool.update({ input: { customerId } });
    try {
      const customer = await lookupCustomer(customerId);
      tool.update({ output: customer });
      return customer;
    } catch (error) {
      tool.update({
        level: "ERROR",
        statusMessage: error instanceof Error ? error.message : String(error),
      });
      throw error;
    }
  },
  { asType: "tool" },
);

Wrapping a tool registry

If your agent runs many tools, wrap the executor once so every tool call is traced consistently:
import { startActiveObservation } from "@langfuse/tracing";

async function runTool<T>(name: string, args: unknown, fn: () => Promise<T>): Promise<T> {
  return startActiveObservation(
    name,
    async (tool) => {
      tool.update({ input: args });
      const output = await fn();
      tool.update({ output });
      return output;
    },
    { asType: "tool" },
  );
}

// usage
const docs = await runTool("search_docs", { query }, () => searchDocs(query));
Capture arguments and results only when safe. Redact secrets, credentials, and sensitive user data before passing them to input / output.

Next steps

Spans

Trace retrieval, ranking, and app logic.

Threads & context

Group conversations and attach users and metadata.