Overview of built-in rollout processors, their configs, and when to use each
EvaluationRow
s and yield the same rows back after performing the rollout (e.g., calling a model once, running a tool-using agent loop, or interacting with an MCP gym). They all share the same signature:
eval_protocol/pytest/types.py
as RolloutProcessorConfig
and includes the most common knobs for evaluation runs.
model
.DatasetLogger
to capture mid-rollout logs.--ep-reasoning-effort
or --ep-input-param
.
eval_protocol/pytest/default_no_op_rollout_process.py
@evaluation_test
:
completion
per row and appends the assistant message (and any tool_calls) to row.messages
.completion_params
including extra_body.reasoning_effort
if provided.eval_protocol/pytest/default_single_turn_rollout_process.py
messages
and available tools.mcp_config_path
to enumerate available tools via MCPMultiClient
.max_concurrent_rollouts
for dataset-level parallelism; tool calls within a single row are also executed in parallel.eval_protocol/pytest/default_agent_rollout_processor.py
eval_protocol.rollout(...)
.server_script_path
to launch the MCP server. Binds localhost:9700
by default.eval_protocol/pytest/default_mcp_gym_rollout_processor.py
eval_protocol/pytest/plugin.py
adds flags to make evaluations CI-friendly:
--ep-max-rows=N|all
: limit dataset rows processed.--ep-print-summary
: print a concise summary line at end of each run.--ep-summary-json=PATH
: write a JSON artifact for CI.--ep-input-param key=value
or --ep-input-param @params.json
: ad-hoc overrides of completion_params
.--ep-reasoning-effort low|medium|high
: sets extra_body.reasoning_effort
via LiteLLM.