Data Loader

The data loader module provides a standard way to feed evaluation data into tests. Use it to:

Build reusable input sources (adapters, files, generators)
Parameterize datasets with clear variant labeling
Preprocess inputs consistently (e.g., expand multi-turn data)

Components

DynamicDataLoader

Uses callables that return lists of EvaluationRow. Each callable becomes a labeled variant.

from eval_protocol import DynamicDataLoader
from eval_protocol.models import EvaluationRow

def my_generator() -> list[EvaluationRow]:
    # Fetch or generate rows here (adapters, DB, etc.)
    return []

data_loader = DynamicDataLoader(
    generators=[my_generator],
)

InlineDataLoader

Use when you have rows or raw messages inline.

from eval_protocol import InlineDataLoader
from eval_protocol.models import EvaluationRow, Message

inline_rows = [
    EvaluationRow(messages=[
        Message(role="user", content="Hello"),
        Message(role="assistant", content="Hi there!"),
    ])
]

loader = InlineDataLoader(rows=inline_rows, id="demo", description="Two-turn chat")

Preprocessing

All loaders support an optional preprocess_fn applied before returning rows. For example, expand multi-turn traces into multiple test cases:

from eval_protocol import DynamicDataLoader, multi_turn_assistant_to_ground_truth

DynamicDataLoader(
    generators=[my_generator],
    preprocess_fn=multi_turn_assistant_to_ground_truth,
)

Using with evaluation_test

from eval_protocol import evaluation_test, SingleTurnRolloutProcessor

@evaluation_test(
    data_loaders=data_loader,
    rollout_processor=SingleTurnRolloutProcessor(),
)
async def test_llm_judge(row: EvaluationRow) -> EvaluationRow:
    return await aha_judge(row)

Metadata and Variants

Each loader emits one or more variants. For each variant, Eval Protocol stores metadata on every row under row.input_metadata.dataset_info:

data_loader_type: loader class (e.g., DynamicDataLoader)
data_loader_variant_id: callable name or inline id
data_loader_variant_description: docstring/description
data_loader_num_rows: original count before preprocessing
data_loader_num_rows_after_preprocessing: final count

This enables clear tracking of which inputs produced which results in the UI.

Example with an Adapter

from eval_protocol import evaluation_test, aha_judge, DynamicDataLoader, SingleTurnRolloutProcessor
from eval_protocol.adapters.langfuse import create_langfuse_adapter

def langfuse_data_generator():
    adapter = create_langfuse_adapter()
    return adapter.get_evaluation_rows(limit=50, sample_size=10)

@evaluation_test(
    data_loaders=DynamicDataLoader(generators=[langfuse_data_generator]),
    rollout_processor=SingleTurnRolloutProcessor(),
)
async def test_llm_judge(row: EvaluationRow) -> EvaluationRow:
    return await aha_judge(row)

API Reference

DynamicDataLoader

class DynamicDataLoader(EvaluationDataLoader):
    generators: Sequence[Callable[[], list[EvaluationRow]]]

InlineDataLoader

class InlineDataLoader(EvaluationDataLoader):
    rows: list[EvaluationRow] | None
    messages: Sequence[list[Message]] | None
    id: str
    description: str | None

EvaluationDataLoader

class EvaluationDataLoader(ABC):
    preprocess_fn: Callable[[list[EvaluationRow]], list[EvaluationRow]] | None
    def variants(self) -> Sequence[DataLoaderVariant]: ...
    def load(self) -> list[DataLoaderResult]: ...

Source Code

See the Python source for full details: eval_protocol/data_loader/models.py

Getting Started

Integrations

Using the Logs UI

Reference

Components

DynamicDataLoader

InlineDataLoader

Preprocessing

Using with evaluation_test

Metadata and Variants

Example with an Adapter

API Reference

DynamicDataLoader

InlineDataLoader

EvaluationDataLoader

Source Code

Getting Started

Integrations

Using the Logs UI

Reference

​Components

​DynamicDataLoader

​InlineDataLoader

​Preprocessing

​Using with evaluation_test

​Metadata and Variants

​Example with an Adapter

​API Reference

​DynamicDataLoader

​InlineDataLoader

​EvaluationDataLoader

​Source Code

Components

DynamicDataLoader

InlineDataLoader

Preprocessing

Using with evaluation_test

Metadata and Variants

Example with an Adapter

API Reference

DynamicDataLoader

InlineDataLoader

EvaluationDataLoader

Source Code