Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 10 days ago • 20
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21 • 245
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing Paper • 2310.05922 • Published Oct 9, 2023 • 4
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks Paper • 2312.16218 • Published Dec 24, 2023 • 8
SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation Paper • 2412.13462 • Published Dec 18, 2024
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 101
stabilityai/stable-video-diffusion-img2vid-xt-1-1 Image-to-Video • Updated Jul 10, 2024 • 17.9k • 964