If you already have an agent, you can reuse your existing codebase by integrating with eval-protocol’s RemoteRolloutProcessor. RemoteRolloutProcessor delegates rollout execution to a remote HTTP service that you control. It’s useful for implementing rollouts with your existing agent codebase by wrapping it in an HTTP service.

Setup

Sign up for a free account at Langfuse and get your API keys. Set the following environment variables:
LANGFUSE_PUBLIC_KEY="pk-lf-..."
LANGFUSE_SECRET_KEY="sk-lf-..."
LANGFUSE_HOST="https://cloud.langfuse.com" # US region: https://us.cloud.langfuse.com

API Contract

We expect the remote service to implement the following API contract:

Request Body

  • model: The model to use for the rollout (e.g., “openai/gpt-4o”)
  • messages: Array of conversation messages
  • tools: Optional array of available tools for the model
  • model_base_url: Optional base URL for the remote server to make LLM calls. This is useful for configuring different endpoints during development/training.
  • metadata: Object containing rollout execution metadata
request_body.json
{
    "model": "accounts/fireworks/models/gpt-oss-120b",
    "messages": [ { "role": "user", "content": "..." } ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the weather for a city",
          "parameters": { "type": "object", "properties": { "city": { "type": "string" } } }
        }
      }
    ],
    "model_base_url": "https://api.fireworks.ai/inference/v1",
    "metadata": {
      "invocation_id": "ivk_abcd",
      "experiment_id": "exp_efgh",
      "rollout_id": "rll_ijkl",
      "run_id": "run_123",
      "row_id": "row_123"
    }
}

Response Body

status_response.json
{
  "terminated": true,
  "info": {
    "reason": "completed",
    "ended_at": "2025-09-24T12:34:56Z",
    "num_turns": 2
  }
}

Metadata Piggyback

To push trajectories back to eval-protocol, you must include the following metadata along with your traces so that they can correlated with the EvaluationRows.
  • invocation_id
  • experiment_id
  • rollout_id
  • run_id
  • row_id

Example Server Implementations

See the following example server implementations for reference:

Example @evaluation_test Usage

See the example code for using RemoteRolloutProcessor in the @evaluation_test decorator here.

Usage

  1. Run one of the server examples
  2. Run the example @evaluation_test usage