json_schema_reward
function to assess whether models generate JSON content that matches expected
schemas, with options for both structural validation and LLM-based judgment.
You can find the complete code for this example at test_pytest_json_schema.py.
Understanding JSON Schema Evaluation
JSON schema evaluation assesses a model’s ability to:- Generate valid JSON: Produce syntactically correct JSON content
- Match expected structure: Create JSON objects that conform to specified schemas
- Handle complex nested structures: Work with objects, arrays, and mixed data types
- Extract JSON from responses: Parse JSON content from markdown code blocks or plain text
- Validate type consistency: Ensure data types match schema specifications
Understanding the Dataset Structure
The JSON schema dataset contains diverse test cases that evaluate different aspects of JSON generation, from simple object creation to complex nested structures with various data types.Dataset Format
Each entry in the dataset contains:messages: Conversation history with user requests and assistant responsesground_truth: Optional expected response (not used in schema validation)evaluation_result: Pre-computed evaluation scores for validationinput_metadata: Additional context including expected schema and test case descriptions
Example Dataset Entries
Perfect Schema Match:Step 1: Import Required Dependencies
First, we import the necessary modules from the EP framework:json: Python’s JSON module for JSON parsing and validationtyping: Python’s typing module for type hints (Any, Dict, List)EvaluationRow: Data structure containing conversation messages and ground truthdefault_single_turn_rollout_processor: Default processor for single-turn conversationsevaluation_test: Decorator for configuring evaluation testsjson_schema_reward: Function to evaluate JSON content against expected schemas
Step 2: Create the Dataset Adapter
We need to convert the JSON schema dataset format to the EP’s expected format:- Extracts conversation messages: Takes the user prompt from the dataset
- Preserves metadata: Maintains the expected schema and test case information
- Handles ground truth: Passes through any ground truth data (though not used in schema validation)
- Creates evaluation rows: Converts dataset entries to the EP’s standard format
Step 3: Configure the Evaluation Test
We use the@evaluation_test decorator to configure our JSON schema evaluation:
input_dataset: Path to the JSON schema dataset filemodel: Target model to evaluate (Fireworks Kimi model in this example)mode: Set to “pointwise” for individual sample evaluationrollout_processor: Uses default single-turn processor for conversation handlingdataset_adapter: References our custom adapter function
Step 4: Implement the Evaluation Logic
The core evaluation logic extracts the expected schema and applies the JSON schema reward function:- Extracts expected schema: Gets the target JSON structure from metadata
- Applies schema validation: Uses
json_schema_rewardto compare generated JSON against expected schema - Stores results: Saves the evaluation score and metrics in the row
- Returns processed row: Provides the evaluated row for further analysis
Understanding the JSON Schema Reward Function
Thejson_schema_reward function provides comprehensive JSON validation capabilities:
Core Features
Schema Extraction and Normalization:- Extracts JSON content from assistant responses (supports markdown code blocks)
- Normalizes schemas for consistent comparison
- Handles both object and string schema representations
- Uses Jaccard similarity to compare schema structures
- Evaluates property matches, type consistency, and nested object alignment
- Provides detailed scoring with property-level analysis
- Validates JSON syntax before schema comparison
- Handles malformed JSON with appropriate error scoring
- Provides clear error messages for debugging
Test Cases and Evaluation Scenarios
The JSON schema evaluation covers various scenarios:✅ Perfect Matches
| Scenario | Description |
|---|---|
| Exact schema compliance | JSON that perfectly matches expected structure |
| Type consistency | All data types match schema specifications |
| Nested object handling | Complex nested structures with proper validation |
⚠️ Partial Matches
| Scenario | Description |
|---|---|
| Missing properties | JSON with some expected fields omitted |
| Extra properties | JSON with additional fields not in schema |
| Type mismatches | Correct structure but wrong data types |
❌ Error Cases
| Scenario | Description |
|---|---|
| Invalid JSON syntax | Malformed JSON that cannot be parsed |
| Missing JSON content | Responses without extractable JSON |
| Empty structures | Edge cases with empty objects or arrays |
🔄 Complex Scenarios
| Scenario | Description |
|---|---|
| Array validation | JSON arrays with consistent item structures |
| Mixed data types | Objects with various primitive and complex types |
| Nested arrays | Multi-level nested structures with arrays of objects |
Expected Output
The evaluation produces detailed results including: Perfect Match Example:Conclusion
This JSON schema evaluation demonstrates how to assess AI models’ structured data generation capabilities using schema validation and similarity scoring. The evaluation ensures models can generate valid JSON content that conforms to expected schemas, handle complex nested structures, and maintain type consistency. This evaluation is particularly valuable for:- API integration testing: Validating JSON responses from AI models that interact with external APIs
- Data pipeline validation: Ensuring structured data generation meets schema requirements
- Model capability assessment: Evaluating language models’ ability to produce machine-readable outputs

