- Klavis MCP Sandbox - Fully managed isolated environments for model training and evaluation at scale
- Klavis MCP Server - Direct MCP server connections using your own accounts
Which Option Should You Choose?
Use Klavis MCP Sandbox
Use Klavis MCP Sandbox if you only have input data and ground truth for your RL work. Klavis Sandbox handles all the tooling infrastructure for you:- Hosted MCP Servers - hundreds of pre-built servers ready to use
- Authentication - OAuth and session management handled automatically
- Isolated Concurrency Environments - run 64+ models in parallel without interference
- Tooling State Management - automatic initialization, reset, and cleanup
- Scaling - dedicated QPS per instance with automatic account pooling
Use Klavis MCP Server
Use Klavis MCP Server if you already have your own tooling infrastructure (authentication, isolated environments, state management, scaling) but only need Klavis hosted MCP servers to perform tool calls for your RL or model training work. This option allows you to connect directly to 100+ external applications through Klavis MCP while maintaining full control over your evaluation and training pipeline.Use with Klavis MCP Sandbox
Klavis MCP Sandbox provides fully managed, isolated sandbox environments designed for training and evaluating models at scale. Each sandbox has dedicated accounts, automatic state initialization, and cleanup - allowing you to focus on model interaction without managing sandbox environments.Key Features
- Isolated Environments: Each sandbox gets dedicated, authenticated sessions with automatic token management
- Account Pooling: Dynamic pool of test accounts supporting 64+ concurrent models
- State Management: Built-in
initialize,dump, andresetAPIs for environment lifecycle - Supported Services: Gmail, Jira, Salesforce, Slack, Linear, Google Calendar, and 100+ more
Setup
Set up your API keys:Step 1: Define Your Input Data and Ground Truth
Create a JSONL dataset file with your test cases. Each row should include:initialize_data: Initial state to seed the sandboxmessages: The task instruction for your modelground_truth: Expected final state after the model completes the task
Step 2: Implement Your RolloutProcessor
Use theKlavisSandboxRolloutProcessor to handle sandbox lifecycle management. The processor will:
- Create an isolated sandbox instance
- Initialize the sandbox with your input data
- Run your model with MCP tools from the sandbox
- Dump the final state after model interaction
- Clean up and return sandbox to pool
KlavisSandboxRolloutProcessor:
Step 3: Evaluate by Comparing State with Ground Truth
Create your evaluation test that compares the final sandbox state with your ground truth:row.execution_metadata.extra["sandbox_data"]. Use an LLM judge to semantically compare it with your ground truth.
See the complete test implementation for the full example.
Use with Klavis MCP Server
Setting Up Klavis MCP Server
Login to your Klavis AI account, then find the applications you want to connect with Eval Protocol and enable MCP for those applications. Follow the auth flow to authorize Klavis MCP to access those applications on your behalf. You can follow the Klavis quickstart guide here to set up your MCP. In the Klavis dashboard, click Add to Other Clients, and generate the access token. Save the access token in.env file as KLAVIS_API_KEY.
The Klavis MCP is defined as follows in Eval Protocol configuration:
Using Klavis MCP Server in Eval Protocol
We’ve set up an example in Eval Protocol to use Klavis MCP Server. You can also use it to connect to more applications and add more use cases. Here is the example test file. In this example, we connect to Gmail, Notion and Outlook Calendar using Klavis MCP, and have a few example test cases. To run this example workflow, you need to set up the test cases in those applications.Gmail
No particular setup. You should have at least 5 emails in your Gmail inbox.Notion
Copy this Notion page template (credit to MCPMark) to your Notion workspace. And when you authorize Klavis MCP to access Notion, make sure to give access to this page.Outlook Calendar
You should set up the following calendar events in your Outlook calendar. It’s recommended to create a new outlook account with a clean calendar for testing.- Create 3 events today. It’s better one starting at 12 am today, and one ending at 12 am tomorrow.
- Create an event that covers the whole workding hours except the first and last hour of your next working day. Outlook calendar default working hour is Monday to Friday, 8 am to 5 pm. In this case, you should create an event from 9 am to 4 pm on your next working day.
- Create total 8 events on this week’s working days. It should include the above events if they are on working days.
- Follow step 1, create 2 events on next week’s Thursday.
- Follow step 3, create total 5 events on next week’s working days.
- Follow step 1, create 4 events on Oct 15 2025.
- Follow step 3, create total 9 events from Oct 13 to Oct 17, 2025.
Resources
Klavis Sandbox Intro
Learn about tooling infrastructure for LLM training, RL and evaluation.
Klavis MCP Servers
Browse all high quality MCP servers written and evaluated by Klavis AI.
Example Notebook
Create sandboxes, seed data, run an agent, then dump and clean up.
Klavis Sandbox API
Manage isolated sandbox environments for training/eval: pooling, init, export, teardown.

