Installation
To use the Langfuse adapter, you need to install the Langfuse dependencies:Basic Usage
Configuration
The adapter uses the Langfuse client configuration. Set up your Langfuse credentials using environment variables:API Reference
LangfuseAdapter
The main adapter class for pulling data from Langfuse.get_evaluation_rows()
Pull traces from Langfuse and convert to EvaluationRow format.
limit
- Max number of trace summaries to collect via paginationsample_size
- Optional number of traces to randomly sample (if None, process all)tags
- Filter by specific tagsuser_id
- Filter by user IDsession_id
- Filter by session IDname
- Filter by trace nameenvironment
- Filter by environment (e.g., production, staging, development)version
- Filter by trace versionrelease
- Filter by trace releasefields
- Comma-separated list of fields to include (e.g., ‘core,scores,metrics’)hours_back
- Filter traces from this many hours agofrom_timestamp
- Explicit start time (overrides hours_back)to_timestamp
- Explicit end time (overrides hours_back)include_tool_calls
- Whether to include tool calling tracessleep_between_gets
- Sleep time between individual trace.get() calls (2.5s for 30 req/min limit)max_retries
- Maximum retries for rate limit errorsspan_name
- If provided, extract messages from generations within this named spanconverter
- Optional custom converter implementing TraceConverter protocolmetadata
- Filter by exact metadata match (dict)requester_metadata
- Filter by exact requester metadata match (dict)requester_metadata_contains
- Filter by substring in requester metadata values
get_evaluation_rows_by_ids()
Get specific traces by their IDs and convert to EvaluationRow format.
trace_ids
- List of trace IDs to fetchinclude_tool_calls
- Whether to include tool calling tracesspan_name
- If provided, extract messages from generations within this named spanconverter
- Optional custom converter implementing TraceConverter protocol
upload_scores()
Upload evaluation scores back to Langfuse traces.
rows
- List of EvaluationRow objects with session_data containing trace IDsmodel_name
- Name of the model (used as the score name in Langfuse)mean_score
- The calculated mean score to push to Langfuse
Factory Function
For convenience, you can use the factory function:Source Code
The complete implementation is available on GitHub: eval_protocol/adapters/langfuse.pyFiltering Examples
Filter by Tags
Filter by User
Filter by Time Range
Filter by Metadata
Combined Filters with Rate Limiting
Tool Calling Support
The adapter automatically handles tool calling traces from Langfuse:tool_calls
, tool_call_id
, and function_call
fields as appropriate.
Sampling and Rate Limiting
The adapter includes intelligent sampling and rate limiting to work efficiently with Langfuse’s API limits:Two-Stage Process
- Collect trace summaries - Fast pagination to gather up to
limit
trace IDs - Sample and fetch details - Randomly sample
sample_size
traces for full processing
- Survey large datasets efficiently without hitting rate limits
- Get representative samples from your trace population
- Control API usage while still getting meaningful evaluation data
Rate Limit Handling
Data Conversion
The adapter converts Langfuse traces to EvaluationRow format with intelligent handling of different input formats:Supported Trace Formats
- Dict Format
- List Format
- String Format
Metadata Preservation
The adapter stores the original Langfuse trace ID in the evaluation row metadata:Environment and Release Filtering
Filter traces by deployment environment and release versions:Advanced Features
Span-Based Message Extraction
Extract messages from specific spans within traces. Perfect for multi-agent workflows where different subagents use different LLMs - specify the span name to evaluate a particular subagent’s LLM performance in isolation.Custom Trace Conversion
If you have traces in a particular pattern, you can also implement custom trace to EvaluationRow logic using theTraceConverter
protocol: