CTA, Table Join, and Table Reformat tasks with lightweight scoring ports
This example showcases three LiveBench Data Analysis tasks wired into Eval Protocol with minimal scoring ports adapted from the original benchmark: CTA, Table Join, and Table Reformat.
Suites live in the Python SDK under eval_protocol/benchmarks/suites/livebench_data_analysis.py and are exported as runnable benchmarks.
Uses datasets to pull livebench/data_analysis at import time.
Scoring is intentionally lightweight and aims for compatibility with LiveBench behavior (e.g., tolerant parsing, suffix matches, and defensive fallbacks), not an official reproduction.