Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published Jan 13 • 39
Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts Paper • 2510.23027 • Published Oct 27, 2025 • 1
GAD-Models Collection Model checkpoints of Black-Box On-Policy Distillation of Large Language Models • 5 items • Updated Nov 17, 2025 • 6
Black-Box On-Policy Distillation of Large Language Models Paper • 2511.10643 • Published Nov 13, 2025 • 52
The Era of Agentic Organization: Learning to Organize with Language Models Paper • 2510.26658 • Published Oct 30, 2025 • 29
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published Oct 22, 2025 • 115
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published Oct 22, 2025 • 30
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published Oct 22, 2025 • 61
QueST: Incentivizing LLMs to Generate Difficult Problems Paper • 2510.17715 • Published Oct 20, 2025 • 35
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Paper • 2510.06499 • Published Oct 7, 2025 • 33
DocReward: A Document Reward Model for Structuring and Stylizing Paper • 2510.11391 • Published Oct 13, 2025 • 27