You have to be logged in to leave a comment.

sidebar_position
55

Tracing

Promptfoo supports OpenTelemetry (OTLP) tracing to help you understand the internal operations of your LLM providers during evaluations.

This feature allows you to collect detailed performance metrics and debug complex provider implementations.

Overview

Promptfoo acts as an OpenTelemetry receiver, collecting traces from your providers and displaying them in the web UI. This eliminates the need for external observability infrastructure during development and testing.

Tracing provides visibility into:

Provider execution flow: See how your providers process requests internally
Performance bottlenecks: Identify slow operations in RAG pipelines or multi-step workflows
Error tracking: Trace failures to specific operations
Resource usage: Monitor external API calls, database queries, and other operations

Key Features

Standard OpenTelemetry support: Use any OpenTelemetry SDK in any language
Built-in OTLP receiver: No external collector required for basic usage
Web UI visualization: View traces directly in the Promptfoo interface
Automatic correlation: Traces are linked to specific test cases and evaluations
Flexible forwarding: Send traces to Jaeger, Tempo, or any OTLP-compatible backend

Quick Start

1. Enable Tracing

Add tracing configuration to your promptfooconfig.yaml:

tracing:
  enabled: true # Required to send OTLP telemetry
  otlp:
    http:
      enabled: true # Required to start the built-in OTLP receiver

2. Instrument Your Provider

Promptfoo passes a W3C trace context to providers via the traceparent field. Use this to create child spans:

const { trace, context, SpanStatusCode } = require('@opentelemetry/api');
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');

// Initialize tracer
const provider = new NodeTracerProvider();
const exporter = new OTLPTraceExporter({
  url: 'http://localhost:4318/v1/traces',
});
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
provider.register();

const tracer = trace.getTracer('my-provider');

module.exports = {
  async callApi(prompt, promptfooContext) {
    // Parse trace context from Promptfoo
    if (promptfooContext.traceparent) {
      const activeContext = trace.propagation.extract(context.active(), {
        traceparent: promptfooContext.traceparent,
      });

      return context.with(activeContext, async () => {
        const span = tracer.startSpan('provider.call');

        try {
          // Your provider logic here
          span.setAttribute('prompt.length', prompt.length);

          const result = await yourLLMCall(prompt);

          span.setStatus({ code: SpanStatusCode.OK });
          return { output: result };
        } catch (error) {
          span.recordException(error);
          span.setStatus({
            code: SpanStatusCode.ERROR,
            message: error.message,
          });
          throw error;
        } finally {
          span.end();
        }
      });
    }

    // Fallback for when tracing is disabled
    return { output: await yourLLMCall(prompt) };
  },
};

3. View Traces

After running an evaluation, view traces in the web UI:

Run your evaluation:
```
promptfoo eval
```
Open the web UI:
```
promptfoo view
```
Click the magnifying glass (🔎) icon on any test result
Scroll to the "Trace Timeline" section

Configuration Reference

Basic Configuration

tracing:
  enabled: true # Enable/disable tracing
  otlp:
    http:
      enabled: true # Required to start the OTLP receiver
      # port: 4318   # Optional - defaults to 4318 (standard OTLP HTTP port)
      # host: '0.0.0.0'  # Optional - defaults to '0.0.0.0'

Environment Variables

You can also configure tracing via environment variables:

# Enable tracing
export PROMPTFOO_TRACING_ENABLED=true

# Configure OTLP endpoint (for providers)
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"

# Set service name
export OTEL_SERVICE_NAME="my-rag-application"

# Authentication headers (if needed)
export OTEL_EXPORTER_OTLP_HEADERS="api-key=your-key"

Forwarding to External Collectors

Forward traces to external observability platforms:

tracing:
  enabled: true
  otlp:
    http:
      enabled: true
  forwarding:
    enabled: true
    endpoint: 'http://jaeger:4318' # or Tempo, Honeycomb, etc.
    headers:
      'api-key': '${OBSERVABILITY_API_KEY}'

Provider Implementation Guide

JavaScript/TypeScript

For complete provider implementation details, see the JavaScript Provider documentation. For tracing-specific examples, see the OpenTelemetry tracing example.

Key points:

Use SimpleSpanProcessor for immediate trace export
Extract the W3C trace context from traceparent
Create child spans for each operation
Set appropriate span attributes and status

Python

For complete provider implementation details, see the Python Provider documentation.

from opentelemetry import trace
from opentelemetry.propagate import extract
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

# Setup
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")
provider.add_span_processor(SimpleSpanProcessor(exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)

def call_api(prompt, context):
    # Extract trace context
    if 'traceparent' in context:
        ctx = extract({"traceparent": context["traceparent"]})
        with tracer.start_as_current_span("provider.call", context=ctx) as span:
            span.set_attribute("prompt.length", len(prompt))
            # Your provider logic here
            result = your_llm_call(prompt)
            return {"output": result}

    # Fallback without tracing
    return {"output": your_llm_call(prompt)}

Trace Visualization

Promptfoo includes a built-in trace viewer that displays all collected telemetry data. Since Promptfoo functions as an OTLP receiver, you can view traces directly without configuring external tools like Jaeger or Grafana Tempo.

The web UI displays traces as a hierarchical timeline showing:

Span hierarchy: Parent-child relationships between operations
Duration bars: Visual representation of operation timing
Status indicators: Success (green), error (red), or unset (gray)
Hover details: Span attributes, duration, and timestamps
Relative timing: See which operations run in parallel vs. sequentially

Understanding the Timeline

[Root Span: provider.call (500ms)]
  ├─[Retrieve Documents (100ms)]
  ├─[Prepare Context (50ms)]
  └─[LLM Generation (300ms)]

Each bar's width represents its duration relative to the total trace time. Hover over any span to see:

Exact start and end timestamps
Duration in milliseconds or seconds
Custom attributes you've added
Error messages (if any)

Best Practices

1. Semantic Naming

Use descriptive, hierarchical span names:

// Good
'rag.retrieve_documents';
'rag.rank_results';
'llm.generate_response';

// Less informative
'step1';
'process';
'call_api';

2. Add Relevant Attributes

Include context that helps debugging:

span.setAttributes({
  'prompt.tokens': tokenCount,
  'documents.count': documents.length,
  'model.name': 'gpt-4',
  'cache.hit': false,
});

3. Handle Errors Properly

Always record exceptions and set error status:

try {
  // Operation
} catch (error) {
  span.recordException(error);
  span.setStatus({
    code: SpanStatusCode.ERROR,
    message: error.message,
  });
  throw error;
}

4. Use Appropriate Span Processors

SimpleSpanProcessor: For development and testing (immediate export)
BatchSpanProcessor: For production (better performance)

Advanced Features

Custom Trace Attributes

Add metadata that appears in the UI:

span.setAttributes({
  'user.id': userId,
  'feature.flags': JSON.stringify(featureFlags),
  version: packageVersion,
});

Trace Sampling

Reduce overhead in high-volume scenarios:

const { TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');

const provider = new NodeTracerProvider({
  sampler: new TraceIdRatioBasedSampler(0.1), // Sample 10% of traces
});

Multi-Service Tracing

Trace across multiple services:

// Service A: Forward trace context
const headers = {};
trace.propagation.inject(context.active(), headers);
await fetch(serviceB, { headers });

// Service B: Extract and continue trace
const extractedContext = trace.propagation.extract(context.active(), request.headers);

Troubleshooting

Traces Not Appearing

Check tracing is enabled: Verify tracing.enabled: true in config
Verify OTLP endpoint: Ensure providers are sending to http://localhost:4318/v1/traces
Check trace context: Log the traceparent value to ensure it's being passed
Review provider logs: Look for connection errors or failed exports

Context Naming Conflicts

If you see context.active is not a function, rename the OpenTelemetry import:

// Avoid conflict with promptfoo context parameter
const { context: otelContext } = require('@opentelemetry/api');

async callApi(prompt, promptfooContext) {
  // Use otelContext for OpenTelemetry
  // Use promptfooContext for Promptfoo's context
}

Performance Impact

Tracing adds ~1-2ms overhead per span
Use sampling for high-volume evaluations
Consider BatchSpanProcessor for production use

Debug Logging

Enable debug logs to troubleshoot:

# Promptfoo debug logs
DEBUG=promptfoo:* promptfoo eval

# OpenTelemetry debug logs
OTEL_LOG_LEVEL=debug promptfoo eval

Integration Examples

RAG Pipeline Tracing

async function ragPipeline(query, context) {
  const span = tracer.startSpan('rag.pipeline');

  try {
    // Retrieval phase
    const retrieveSpan = tracer.startSpan('rag.retrieve', { parent: span });
    const documents = await vectorSearch(query);
    retrieveSpan.setAttribute('documents.count', documents.length);
    retrieveSpan.end();

    // Reranking phase
    const rerankSpan = tracer.startSpan('rag.rerank', { parent: span });
    const ranked = await rerank(query, documents);
    rerankSpan.setAttribute('documents.reranked', ranked.length);
    rerankSpan.end();

    // Generation phase
    const generateSpan = tracer.startSpan('llm.generate', { parent: span });
    const response = await llm.generate(query, ranked);
    generateSpan.setAttribute('response.tokens', response.tokenCount);
    generateSpan.end();

    span.setStatus({ code: SpanStatusCode.OK });
    return response;
  } catch (error) {
    span.recordException(error);
    span.setStatus({ code: SpanStatusCode.ERROR });
    throw error;
  } finally {
    span.end();
  }
}

Multi-Model Comparison

async function compareModels(prompt, context) {
  const span = tracer.startSpan('compare.models');

  const models = ['gpt-4', 'claude-3', 'llama-3'];
  const promises = models.map(async (model) => {
    const modelSpan = tracer.startSpan(`model.${model}`, { parent: span });
    try {
      const result = await callModel(model, prompt);
      modelSpan.setAttribute('model.name', model);
      modelSpan.setAttribute('response.latency', result.latency);
      return result;
    } finally {
      modelSpan.end();
    }
  });

  const results = await Promise.all(promises);
  span.end();
  return results;
}

Next Steps

Explore the OpenTelemetry tracing example
Set up forwarding to your observability platform
Add custom instrumentation for your use case
Use traces to optimize provider performance

Tip!

Press p or to see the previous file or, n or to see the next file

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

nirbarazida / promptfoo mirror of https://github.com/promptfoo/promptfoo

tracing.md 12 KB Permalink History Raw

Tracing

Overview

Key Features

Quick Start

1. Enable Tracing

2. Instrument Your Provider

3. View Traces

Configuration Reference

Basic Configuration

Environment Variables

Forwarding to External Collectors

Provider Implementation Guide

JavaScript/TypeScript

Python

Trace Visualization

Understanding the Timeline

Best Practices

1. Semantic Naming

2. Add Relevant Attributes

3. Handle Errors Properly

4. Use Appropriate Span Processors

Advanced Features

Custom Trace Attributes

Trace Sampling

Multi-Service Tracing

Troubleshooting

Traces Not Appearing

Context Naming Conflicts

Performance Impact

Debug Logging

Integration Examples

RAG Pipeline Tracing

Multi-Model Comparison

Next Steps

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida
/
promptfoo
mirror of https://github.com/promptfoo/promptfoo

tracing.md 12 KB

Permalink History Raw