> ## Documentation Index
> Fetch the complete documentation index at: https://evalprotocol.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Fine Tuning an SVGAgent with Eval Protocol

> Train and improve an SVG generation agent using reinforcement fine tuning with Eval Protocol

<Frame>
  <iframe src="https://www.loom.com/embed/24ba433601de45ba8b63d9fb34c31fd5" width="100%" height="420" frameBorder="0" allow="autoplay; fullscreen" allowFullScreen />
</Frame>

## Introduction

This repo demonstrates building an SVG generation agent using reinforcement fine tuning, with the parts:

* **Eval Protocol** - Orchestrates the rollout execution and evaluation framework
* **Vercel Typescript Server** - Remote server that handles SVG code generation rollouts
* **Fireworks RFT** - Reinforcement fine tuning trainer

A big thank you to [SVGBench](https://github.com/johnbean393/SVGBench) for the dataset. SVGBench is a comprehensive benchmark that evaluates language models on their ability to generate SVG code that meets specific visual requirements. Each prompt includes detailed criteria (like "draw a red circle in the top-left corner") that the generated SVG must fulfill.

**The Evaluation Process**: The model generates SVG code from text prompts, we render the SVGs to images, and then use GPT-4.1 as a visual judge to count how many requirements were fulfilled. This gives us concrete scores to measure improvement and lets you see dramatic before/after visual comparisons as your model gets better through training.

<Frame>
  <img alt="SVG Agent Training Overview" src="https://mintcdn.com/fireworksai-staging/FSCDtiPwmFPY7vdQ/assets/overview.png?fit=max&auto=format&n=FSCDtiPwmFPY7vdQ&q=85&s=c97157a3484c1cf9417b80c5b507e8cc" width="1207" height="466" data-path="assets/overview.png" />
</Frame>

## Quick Start

### Installation

1. **Create a Fireworks account**: [https://app.fireworks.ai/account/home](https://app.fireworks.ai/account/home)

2. **Clone the quickstart repo**: [https://github.com/eval-protocol/quickstart](https://github.com/eval-protocol/quickstart)

```bash theme={null}
git clone git@github.com:eval-protocol/quickstart.git
cd quickstart
```

3. **Install Eval Protocol**:

```bash theme={null}
pip install "eval-protocol[svgbench]"
```

4. **Environment Setup**:

The `env.example` file is located in the `evaluator/` directory. Make a copy of it in the same directory, name it `.env`, and fill in your API keys:

```bash theme={null}
cp evaluator/env.example evaluator/.env
```

Then edit `evaluator/.env` with your API keys:

```
FIREWORKS_API_KEY=your-fireworks-key-here
OPENAI_API_KEY=your-openai-key-here
```

The create process below automatically reads and uploads these secrets to Fireworks.

## Running Locally

**Terminal 1** - Start the local UI server to view results:

```bash theme={null}
ep logs
```

**Terminal 2** - Test locally:

```bash theme={null}
cd evaluator
ep local-test
```

This command discovers and runs your `@evaluation_test` with pytest. In this case, it builds an image and runs the test in Docker, because a `Dockerfile` is present.

The test automatically uses our Vercel remote server:

```
rollout_processor=RemoteRolloutProcessor(
    remote_base_url="https://vercel-svg-server-ts.vercel.app",
)
```

If you want to use a local development Vercel server instead, see [Local Development Server](#local-development-server)

**Note:**

* If your evaluation setup has custom dependencies, for example Chromium, you will need containerize it using `Dockerfile`
  * Then, when you run `ep local-test`, we will build an image and run pytest inside Docker
* If not, `ep local-test` will just run pytest on your host machine
  * You can also ignore the `Dockerfile` and run on the host Python env using `ep local-test --ignore-docker`

### Expected Test Output:

Navigate to [http://localhost:8000](http://localhost:8000) to see the Eval Protocol UI.

```
INFO:eval_protocol.pytest.remote_rollout_processor:Found status log for rollout democratic-way-12: Rollout democratic-way-12 completed
INFO:eval_protocol.pytest.remote_rollout_processor:Found Fireworks log for rollout democratic-way-12 with status code 100.0
INFO:eval_protocol.adapters.fireworks_tracing:Successfully converted 1 traces to evaluation rows | 3/8 [00:19<00:22, 4.52s/rollout]
...
Runs (Parallel): 100%|████████████████████████████████████████████| 1/1 [00:31<00:00, 31.07s/run]
PASSED
```

<Frame>
  <img alt="Eval Protocol Logs Interface" src="https://mintcdn.com/fireworksai-staging/FSCDtiPwmFPY7vdQ/assets/ep_logs.png?fit=max&auto=format&n=FSCDtiPwmFPY7vdQ&q=85&s=5e947b0f42ff1de5a57984f9ced9e3f5" width="1273" height="716" data-path="assets/ep_logs.png" />
</Frame>

If you're interested in understanding how Remote Rollout Processing works and how it communicates with the remote server, see [How Remote Rollout Processing Works](#how-remote-rollout-processing-works).

## Single Command to Train

To kickoff training, simply do:

```bash theme={null}
eval-protocol create rft \
  --base-model accounts/fireworks/models/qwen3-0p6b \
  --chunk-size 10
```

This command:

1. **🔐 Uploads Secrets** - Automatically reads your `.env` file and uploads API keys as Fireworks secrets
2. **📦 Uploads Evaluator** - Packages and uploads your evaluation code
3. **⏳ Waits for Build** - Polls evaluator status every 10 seconds until ACTIVE (timeout: 10 minutes)
4. **📊 Creates Dataset** - Automatically uploads your `svgbench_dataset.jsonl`
5. **🚀 Launches RFT Job** - Starts reinforcement fine-tuning with your evaluator

### Configuration & Troubleshooting

**Training Parameters**: We use Eval Protocol's default values for training parameters (batch size, epochs, learning rate, LoRA rank, accelerator count, etc.). For a complete list of available RFT flags you can customize, see [Fireworks RFT Command Documentation](https://docs.fireworks.ai/tools-sdks/firectl/commands/create-reinforcement-fine-tuning-job).

**Changing Evaluators**: If you've made changes to your evaluator code and want to upload a new version:

```bash theme={null}
eval-protocol create rft \
  --base-model accounts/fireworks/models/qwen3-0p6b \
  --chunk-size 10 \
  --force
```

**Evaluator Upload Timing Out**: If your evaluator takes longer than 10 minutes to build, you'll see:

```
⏰ Timeout after 10.0m - evaluator is not yet ACTIVE

❌ Evaluator is not ready within the timeout period.
📊 Please check the evaluator status at: https://app.fireworks.ai/dashboard/evaluators/test-svgagent-test-svg-generation-evaluation
   Wait for it to become ACTIVE, then run 'eval-protocol create rft' again.
```

In this case, monitor the evaluator upload at the link, and run the command again when ACTIVE.

### Monitor Training Progress

After successful job creation, you'll see:

```
✅ Created Reinforcement Fine-tuning Job
   name: accounts/pyroworks/reinforcementFineTuningJobs/sdnld4yn

📊 Dashboard Links:
   Evaluator: https://app.fireworks.ai/dashboard/evaluators/test-svgagent-test-svg-generation-evaluation
   Dataset:   https://app.fireworks.ai/dashboard/datasets/svgbench-dataset
   RFT Job:   https://app.fireworks.ai/dashboard/fine-tuning/reinforcement/sdnld4yn
```

Click on the **RFT Job** link to view real-time training progress, epoch counts, and rollout data.

### Training Results

After successful training, you should see performance improvements reflected in the training metrics:

<Frame>
  <img alt="SVG Agent Training Progress" src="https://mintcdn.com/fireworksai-staging/FSCDtiPwmFPY7vdQ/assets/graph.png?fit=max&auto=format&n=FSCDtiPwmFPY7vdQ&q=85&s=b238aceb9d1d025982bd420d799dd96a" width="1145" height="727" data-path="assets/graph.png" />
</Frame>

### SVG Quality Improvement

You can inspect individual rollouts to see the dramatic improvement in SVG generation quality. Below is a comparison between the first epoch and the final 8th epoch:

**Before (1st Epoch):**

<Frame>
  <img alt="SVG Generation - Before Training" src="https://mintcdn.com/fireworksai-staging/FSCDtiPwmFPY7vdQ/assets/before.png?fit=max&auto=format&n=FSCDtiPwmFPY7vdQ&q=85&s=e2c13a0d78fc621344313e2a05c92b3a" width="1606" height="1136" data-path="assets/before.png" />
</Frame>

**After (8th Epoch):**

<Frame>
  <img alt="SVG Generation - After Training" src="https://mintcdn.com/fireworksai-staging/FSCDtiPwmFPY7vdQ/assets/after.png?fit=max&auto=format&n=FSCDtiPwmFPY7vdQ&q=85&s=2f2fe84ac471517182cc9a48ec4c3b80" width="2030" height="1134" data-path="assets/after.png" />
</Frame>

The reinforcement fine tuning process significantly improves the model's ability to generate accurate, detailed SVG graphics that better match the input descriptions.

## Debugging Tips

When your training is running, you have several powerful tools to debug and monitor your rollouts:

### Rollout Overview

Clicking on any **Epoch** or **Step** in the training dashboard, then clicking the **table icon** to the right, will show you a comprehensive table of all rollouts. It's a good high-level overview to see if any rollouts failed and for what reason.

<Frame>
  <img alt="Rollout Overview Table" src="https://mintcdn.com/fireworksai-staging/FSCDtiPwmFPY7vdQ/assets/rollouts.png?fit=max&auto=format&n=FSCDtiPwmFPY7vdQ&q=85&s=b5efe347404e0cc31e47d7fef7117d55" width="981" height="824" data-path="assets/rollouts.png" />
</Frame>

### Individual Rollout Details

If you click on a specific row in the rollout table, you can see exactly what the prompt was and how the model responded. You can even copy and paste out the SVG code generated and render it yourself to see what the model did. This is how we got the results above in the before and after comparison.

<Frame>
  <img alt="Individual Rollout Details" src="https://mintcdn.com/fireworksai-staging/FSCDtiPwmFPY7vdQ/assets/rollout_details.png?fit=max&auto=format&n=FSCDtiPwmFPY7vdQ&q=85&s=8494000363ff625a51297c1bc02ab879" width="1497" height="958" data-path="assets/rollout_details.png" />
</Frame>

### Live Log Streaming

Clicking on **View Logs** takes you to a page of logs being streamed in. Here, you can see precisely what errors are happening to the rollouts. This is useful to debug and fix any issues with your rollouts.

<Frame>
  <img alt="Live Log Streaming" src="https://mintcdn.com/fireworksai-staging/FSCDtiPwmFPY7vdQ/assets/logs.png?fit=max&auto=format&n=FSCDtiPwmFPY7vdQ&q=85&s=4293252e7da6b72a89db7b419d48c62a" width="1399" height="958" data-path="assets/logs.png" />
</Frame>

## Contact Us / Learn More

* [Discord Server](https://discord.gg/mMqQxvFD9A). Come talk to us in the #eval-protocol channel!
* [Eval Protocol Documentation](https://evalprotocol.io/introduction)
* [Remote Rollout Processor Tutorial](https://evalprotocol.io/tutorial/remote-rollout-processor)
* [SVGBench Dataset](https://github.com/johnbean393/SVGBench) - The original benchmark this project is based on
* [Fireworks AI Platform](https://fireworks.ai)

## Appendix

### How Remote Rollout Processing Works

Eval Protocol enables **reinforcement learning that meets you where you are**. Instead of forcing you to rewrite your agent in a specific framework, you can implement a lightweight remote server wherever your codebase and infrastructure already live.

Your remote server is only responsible for:

* **Executing rollouts** - Run your agent logic (in this case, SVG generation from text prompts)
* **Logging to tracing** - Send structured logs to `tracing.fireworks.ai` for evaluation (see the below linked docs for more information)

In this example, we showcase a **Vercel TypeScript server** that executes single-turn SVG code generation.

<Note>**📖 Learn More**: For a complete deep-dive into Remote Rollout Processing, see the [Remote Rollout Processor Tutorial](https://evalprotocol.io/tutorial/remote-rollout-processor).</Note>

### Local Development Server

```bash theme={null}
cd vercel_svg_server_ts
vercel dev
```

Then swap out the `remote_base_url` to point to the local server you just started:

```
rollout_processor=RemoteRolloutProcessor(
    remote_base_url="http://localhost:3000",
)
```

And in a third terminal, run the evaluation:

```bash theme={null}
ep local-test
```

<Note>See [Vercel CLI documentation](https://vercel.com/docs/cli/dev) for more information on local development.</Note>
