list_tools
, call_tool
, list_resources
/read_resource
.
/control/*
endpoints with mcp-session-id
header.
https://your-server.example/mcp
for MCP, and https://your-server.example/control/...
for control):
POST /control/reset_session
mcp-session-id: <session_id>
{ "seed": <int|null> }
GET /control/initial_state
mcp-session-id: <session_id>
GET /control/reward
mcp-session-id: <session_id>
{ "reward": <float> }
for the most recent step.GET /control/status
mcp-session-id: <session_id>
{ "terminated": <bool>, "truncated": <bool> }
to indicate episode end.session_id
by hashing dataset row values and the model ID via gen_session_id(...)
and passes it in MCP clientInfo
and as the control-plane header. Heads up: it does not use run ID, so between runs, the MCP server needs to be restarted. This is automatically done in the current implementation of MCPGymRolloutProcessor()
./control/*
endpoints in your production server. Note: the EP client does not depend on SimulationServerBase
; it is provided as a reference pattern only.clientInfo
with session_id
, seed
, config
, and model_id
.list_tools
(data plane) and caches them.GET /control/initial_state
(control plane); if that times out or fails, it falls back to list_resources
/read_resource
(data plane) heuristics.user_prompt_template
.base_url
to avoid thundering herds.call_tool
(data plane) and parses the observation from tool content.GET /control/reward
→ scalar rewardGET /control/status
→ terminated
/truncated
terminated
(environment signaled end) or truncated
(cutoff)._no_tool_call
or _playback_terminate
(e.g., model finished or playback hit the end).termination_reason = user_stop
.TerminationReason
values: stop
, length
, tool_calls
, plus environment-driven control_plane_signal
, max_steps
, user_stop
, error
.
/control/initial_state
fails or times out, EP falls back to read_resource
(and ultimately a default observation) so rollouts can proceed./control/reward
and /control/status
use short timeouts; absent data yields defaults (0.0 reward, not-terminated) and the step continues.close
, EP calls POST /control/reset_session
and then closes the MCP transport./control/*
endpoints.
EP_PLAYBACK_FILE
to enable deterministic record/playback. During playback, the policy is stepped to match prior turns, and _playback_terminate
ends the episode at the recorded boundary. Control-plane step summaries and an optional OpenAI-format log are emitted for terminated trajectories.
mcp-session-id
on every control request; return Content-Type: application/json
.POST /control/reset_session
safe to call multiple times; ignore duplicate resets.GET /control/initial_state
returns the initial observation JSON for this session, derived from seed
and config
(from MCP clientInfo
).GET /control/reward
returns { "reward": <float> }
for the most recent applied action.GET /control/status
returns { "terminated": <bool>, "truncated": <bool> }
for the episode state.session_id
.4xx
for client mistakes (missing/invalid mcp-session-id
), 5xx
for server errors.reward=0.0
and terminated=false
on non-200s.session_id
and avoid global mutable state.mcp-session-id
routing.session_id
lengths to prevent abuse.clientInfo
extras to create stable, session-aware environments. Example:
session_id
as the key for per-session state. Seed and config should shape the initial state./control/*
.