PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation Paper • 2512.04025 • Published 29 days ago • 2
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Paper • 2510.19400 • Published Oct 22, 2025
DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Paper • 2510.15264 • Published Oct 17, 2025 • 1
TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image Paper • 2503.12779 • Published Mar 17, 2025
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction Paper • 2509.19297 • Published Sep 23, 2025 • 24
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models Paper • 2406.09098 • Published Jun 13, 2024 • 1
WonderTurbo: Generating Interactive 3D World in 0.72 Seconds Paper • 2504.02261 • Published Apr 3, 2025
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Paper • 2506.04207 • Published Jun 4, 2025 • 48
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting Paper • 2506.05327 • Published Jun 5, 2025 • 11
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS Paper • 2505.23734 • Published May 29, 2025 • 4