A comprehensive benchmark for evaluating LLM-based agents on real-world advertising and marketing analytics tasks, with tiered difficulty and trajectory-based evaluation.
Last updated: February 2026