Papers
arxiv:2512.13874

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Published on Dec 15
ยท Submitted by
Jitesh Jain
on Dec 18
ยท allenai Ai2
Authors:
,
,
,
,
,
,
,
,

Abstract

The paper proposes SAGE, a multi-turn reasoning system for video that mimics human behavior, using synthetic data and reinforcement learning to improve performance on long videos.

AI-generated summary

As humans, we are natural any-horizon reasoners, i.e., we can decide whether to iteratively skim long videos or watch short ones in full when necessary for a given task. With this in mind, one would expect video reasoning models to reason flexibly across different durations. However, SOTA models are still trained to predict answers in a single turn while processing a large number of frames, akin to watching an entire long video, requiring significant resources. This raises the question: Is it possible to develop performant any-horizon video reasoning systems? Inspired by human behavior, we first propose SAGE, an agent system that performs multi-turn reasoning on long videos while handling simpler problems in a single turn. Secondly, we introduce an easy synthetic data generation pipeline using Gemini-2.5-Flash to train the orchestrator, SAGE-MM, which lies at the core of SAGE. We further propose an effective RL post-training recipe essential for instilling any-horizon reasoning ability in SAGE-MM. Thirdly, we curate SAGE-Bench with an average duration of greater than 700 seconds for evaluating video reasoning ability in real-world entertainment use cases. Lastly, we empirically validate the effectiveness of our system, data, and RL recipe, observing notable improvements of up to 6.1% on open-ended video reasoning tasks, as well as an impressive 8.2% improvement on videos longer than 10 minutes.

Community

Paper author Paper submitter

๐Ÿ“œ explainer thread: https://x.com/allen_ai/status/2001351082916630586
๐Ÿ”— Project page: https://lnkd.in/eff-DjHx
๐Ÿ’ป Code: github.com/allenai/SAGE
๐Ÿ“ฆ Models & data: https://lnkd.in/eT9iVVRk
๐Ÿ“ Paper: arxiv.org/abs/2512.13874

arXiv lens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/sage-training-smart-any-horizon-agents-for-long-video-reasoning-with-reinforcement-learning-3452-525d4526

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.13874 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.13874 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.13874 in a Space README.md to link it from this page.

Collections including this paper 2