We ran 120 evaluations across 30 real-world video generation tasks, testing 4 different evaluation models. The Janus Judge achieved 47.4% accuracy, outperforming GPT-5.2, Claude Sonnet 4.5, and Gemini 2.5 Flash.
Learn how a high-growth video generation company reduced manual review time by 90% using Janus's automated evaluation platform.