DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Paper • 2510.15264 • Published Oct 17, 2025 • 2
Few-Step Distillation for Text-to-Image Generation: A Practical Guide Paper • 2512.13006 • Published 21 days ago • 7
Few-Step Distillation for Text-to-Image Generation: A Practical Guide Paper • 2512.13006 • Published 21 days ago • 7
VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer Paper • 2512.11891 • Published 26 days ago • 8
ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning Paper • 2512.09924 • Published 25 days ago • 3
ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning Paper • 2512.09924 • Published 25 days ago • 3
Unicorn: Text-Only Data Synthesis for Vision Language Model Training Paper • 2503.22655 • Published Mar 28, 2025 • 39
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Paper • 2505.03912 • Published May 6, 2025 • 9
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Paper • 2505.12448 • Published May 18, 2025 • 10
VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL Paper • 2505.15791 • Published May 21, 2025 • 6
Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors Paper • 2508.08896 • Published Aug 12, 2025 • 10
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning Paper • 2412.15576 • Published Dec 20, 2024
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Paper • 2509.09372 • Published Sep 11, 2025 • 243
Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation Paper • 2508.19958 • Published Aug 27, 2025
High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting Paper • 2510.10637 • Published Oct 12, 2025 • 12
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models Paper • 2512.09928 • Published 25 days ago • 11