What It Does
- Generates realistic user turns based on scenario instructions and global guidelines.
- Interleaves with the agent’s tool-using turns to create full conversations.
- Signals when to stop (e.g., task complete, transfer, or out-of-scope) via a special termination token.
- Builds a system prompt from global guidelines + your scenario instructions.
- Optionally uses tool schemas to steer requests.
- Provides a
is_stop(...)
check that EP maps totermination_reason = "user_stop"
.
Enabling Simulation
Providedataset_info.user_simulation
in your EvaluationRow
(or dataset) to turn on the simulator for that row.
enabled
: boolean flag; if true, EP uses the simulator for the conversation.system_prompt
: scenario instructions appended to global guidelines.llm
: backing model for the user simulation (default:gpt-4.1
).llm_args
: sampling args for the simulator (default:{ "temperature": 0.0 }
).
Conversation Flow
Whenuser_simulation.enabled
is true:
- EP seeds the conversation with the simulator’s first user message.
- The agent policy receives tool schemas and responds with tool calls or a final answer.
- After each agent turn, the simulator may produce the next user message.
- If the simulator emits a stop intent, EP ends the episode with
termination_reason = user_stop
.
- Without simulation: each tool call increments the step counter.
- With simulation: EP increments the step counter after a full agent↔user turn, and records a consolidated control-plane step (reward, termination, tool calls).
Minimal End-to-End
Tips
- Keep scenario instructions specific and outcome-oriented to guide the simulator.
- Set
temperature
low for reproducible behavior (or use record/playback). - Use rewards and control-plane summaries to assess task success rather than only length of the dialogue.
Troubleshooting
- Simulator does nothing: ensure
user_simulation.enabled
istrue
and you have at least a system message. - Episode never ends: check that your environment’s rewards/termination are wired, or set a sensible
steps
limit. - Unexpected termination: the simulator may have emitted a stop intent; inspect
termination_reason
and conversation history.
GitHub References
- User simulation integration in rollouts (ExecutionManager):
- Backing user simulator (𝜏²-bench):
- Convenience facade and types: