DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published about 1 month ago • 149
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published Oct 30, 2025 • 108
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases Paper • 2510.20270 • Published Oct 23, 2025 • 6
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling Paper • 2509.23909 • Published Sep 28, 2025 • 32
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228