Skip to main content

HuggingFace Adapter

The HuggingFace adapter allows you to load datasets from the HuggingFace Hub and transform them into the standardized EvaluationRow format for evaluation.

Overview

HuggingFace Datasets is a library providing access to thousands of datasets for machine learning. The HuggingFace adapter enables you to:
  • Load any dataset from the HuggingFace Hub
  • Transform dataset rows to the evaluation format
  • Apply custom transformations for specific dataset structures
  • Filter and limit dataset rows

Installation

To use the HuggingFace adapter, you need to install the HuggingFace datasets dependencies:
pip install 'eval-protocol[huggingface]'

Basic Usage

from eval_protocol.adapters import create_huggingface_adapter

# Define a transformation function
def transform_fn(row):
    return {
        'messages': [
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            {'role': 'user', 'content': row['question']}
        ],
        'ground_truth': row['answer'],
        'metadata': {'category': row.get('category')}
    }

# Create the adapter
adapter = create_huggingface_adapter(
    dataset_id="squad",  # HuggingFace dataset ID
    transform_fn=transform_fn  # Your transformation function
)

# Get evaluation rows
rows = list(adapter.get_evaluation_rows(
    split="validation",  # Dataset split to use
    limit=100  # Maximum number of rows
))

# Use rows in evaluation
# See pytest-based evaluation in docs

Pre-built Adapters

Eval Protocol includes pre-built adapters for common datasets:
from eval_protocol.adapters import create_gsm8k_adapter, create_math_adapter

# GSM8K math word problems
gsm8k_adapter = create_gsm8k_adapter()
gsm8k_rows = list(gsm8k_adapter.get_evaluation_rows(split="test", limit=10))

# General math problems
math_adapter = create_math_adapter()
math_rows = list(math_adapter.get_evaluation_rows(split="test", limit=10))

Configuration Options

ParameterTypeDescription
dataset_idstringHuggingFace dataset identifier
transform_fncallableFunction to transform dataset rows
config_namestringOptional dataset configuration name
revisionstringOptional dataset revision/commit hash

Creating Custom Transformations

The transformation function is the key component of the HuggingFace adapter. It should take a dataset row (dictionary) and return a dictionary with the following structure:
def custom_transform(row):
    return {
        'messages': [  # List of message dictionaries
            {'role': 'system', 'content': 'System prompt here'},
            {'role': 'user', 'content': row['input']},
            # Add more messages for multi-turn conversations
        ],
        'ground_truth': row['output'],  # Expected answer/output
        'metadata': {  # Optional metadata
            'source': 'dataset_name',
            'difficulty': row.get('difficulty'),
            # Any other metadata fields
        },
        'tools': []  # Optional tool definitions for tool calling scenarios
    }

Example: Custom GSM8K Adapter

from eval_protocol.adapters import create_huggingface_adapter
from eval_protocol import evaluate_rows
from eval_protocol.rewards.accuracy import accuracy_reward

# Custom transformation for GSM8K
def custom_gsm8k_transform(row):
    return {
        'messages': [
            {
                'role': 'system', 
                'content': 'You are a math expert. Solve the following problem step by step.'
            },
            {'role': 'user', 'content': row['question']}
        ],
        'ground_truth': row['answer'],
        'metadata': {
            'source': 'gsm8k',
            'difficulty': 'challenging'
        }
    }

# Create custom adapter
adapter = create_huggingface_adapter(
    dataset_id="gsm8k",
    config_name="main",
    transform_fn=custom_gsm8k_transform
)

# Get evaluation rows
rows = list(adapter.get_evaluation_rows(split="test", limit=20))

# Evaluate accuracy
results = evaluate_rows(rows, accuracy_reward)

# Calculate average score
avg_score = sum(r.score for r in results) / len(results) if results else 0
print(f"Average accuracy score: {avg_score:.2f}")

Loading Local Datasets

You can also use the adapter with local datasets:
from eval_protocol.adapters import HuggingFaceAdapter

# Create adapter from local dataset
adapter = HuggingFaceAdapter.from_local(
    path="/path/to/local/dataset",
    transform_fn=your_transform_function
)

# Get evaluation rows
rows = list(adapter.get_evaluation_rows())

Troubleshooting

Common Issues

  1. Dataset Not Found: Verify the dataset ID and configuration name
  2. Missing Fields: Ensure your transformation function handles the actual structure of the dataset
  3. Missing Dependencies: Ensure you’ve installed the HuggingFace dependencies with pip install 'eval-protocol[huggingface]'
  4. Memory Issues: For large datasets, use streaming and limit the number of rows

Debug Mode

Enable debug logging to see detailed dataset loading information:
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("datasets").setLevel(logging.DEBUG)