Latest Release
We ran 120 evaluations across 30 real-world video generation tasks, testing 4 different evaluation models. The Janus Judge achieved 47.4% accuracy, outperforming GPT-5.2, Claude Sonnet 4.5, and Gemini 2.5 Flash.
We're developing curated benchmarks for high-stakes domains like healthcare, finance, and customer support, as well as complex enterprise workflows where reliability matters.
If interested in early access, reach out at team@withjanus.com.