TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions Paper • 2602.08711 • Published 3 days ago • 20
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11, 2025 • 48
MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation Paper • 2508.19320 • Published Aug 26, 2025 • 29