Langfuse Adapter

The Langfuse adapter allows you to pull conversation data and tool calling traces from Langfuse deployments and convert them to the standardized EvaluationRow format for evaluation.

Overview

Langfuse is an open-source observability platform for LLM applications. The Langfuse adapter enables you to:
  • Pull conversation histories from production deployments
  • Extract tool calling traces and function calls
  • Convert complex conversation structures to evaluation format
  • Filter data by tags, users, sessions, and time ranges

Installation

To use the Langfuse adapter, you need to install the Langfuse dependencies:
pip install 'eval-protocol[langfuse]'

Basic Usage

from eval_protocol.adapters import create_langfuse_adapter
from datetime import datetime, timedelta

# Create the adapter
adapter = create_langfuse_adapter(
    public_key="your_public_key",
    secret_key="your_secret_key",
    host="https://cloud.langfuse.com"  # Optional, defaults to cloud.langfuse.com
)

# Get evaluation rows
rows = list(adapter.get_evaluation_rows(
    limit=50,  # Maximum number of rows to return
    tags=["production"],  # Filter by specific tags
    from_timestamp=datetime.now() - timedelta(days=7)  # Last 7 days
))

# Use rows in evaluation via pytest-based tests

Advanced Filtering

The Langfuse adapter supports various filtering options to target specific data:
rows = adapter.get_evaluation_rows(
    limit=100,
    tags=["production", "customer-service"],  # Multiple tags (AND condition)
    user_id="specific_user_id",  # Filter by user ID
    session_id="specific_session",  # Filter by session ID
    from_timestamp=datetime(2023, 1, 1),  # From date
    to_timestamp=datetime(2023, 1, 31),  # To date
    include_tool_calls=True  # Include tool calling traces
)

Configuration Options

ParameterTypeDescription
public_keystringLangfuse public API key
secret_keystringLangfuse secret API key
hoststringLangfuse host URL (default: https://cloud.langfuse.com)
project_idstringOptional project ID to filter traces

Filtering Options

ParameterTypeDescription
limitintMaximum number of rows to return
tagsList[str]Filter by specific tags
user_idstrFilter by user ID
session_idstrFilter by session ID
from_timestampdatetimeFilter traces after this timestamp
to_timestampdatetimeFilter traces before this timestamp
include_tool_callsboolWhether to include tool calling traces

Example: Evaluating Production Conversations

from eval_protocol.adapters import create_langfuse_adapter
from eval_protocol.rewards.accuracy import accuracy_reward
from datetime import datetime, timedelta

# Create adapter for last 24 hours of production data
adapter = create_langfuse_adapter(
    public_key="your_public_key",
    secret_key="your_secret_key"
)

# Get rows with ground truth available
rows = list(adapter.get_evaluation_rows(
    limit=100,
    tags=["has_feedback"],  # Only conversations with user feedback
    from_timestamp=datetime.now() - timedelta(days=1)
))

# Evaluate accuracy in a pytest test using @evaluation_test; for ad-hoc usage, see SDK README's direct runner.

Troubleshooting

Common Issues

  1. Authentication Errors: Verify your API keys are correct and have the necessary permissions
  2. No Data Returned: Check your filtering criteria - you might be using tags or time ranges that don’t match any data
  3. Missing Dependencies: Ensure you’ve installed the Langfuse dependencies with pip install 'eval-protocol[langfuse]'
  4. Rate Limiting: If you’re pulling large amounts of data, you might hit API rate limits

Debug Mode

Enable debug logging to see detailed API requests and responses:
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("langfuse").setLevel(logging.DEBUG)