LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild Paper • 2510.14240 • Published Oct 16, 2025 • 11
SPARKLE Collection models and datasets used in the SparkleRL&SPARKLE project • 7 items • Updated 12 days ago • 1
Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows Paper • 2506.03332 • Published Jun 3, 2025 • 2
COSMOS: Predictable and Cost-Effective Adaptation of LLMs Paper • 2505.01449 • Published Apr 30, 2025 • 3
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training Paper • 2505.00358 • Published May 1, 2025 • 26