VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper β’ 2505.23359 β’ Published May 29 β’ 38
Teaching LMMs for Image Quality Scoring and Interpreting Paper β’ 2503.09197 β’ Published Mar 12 β’ 1
Generative Frame Sampler for Long Video Understanding Paper β’ 2503.09146 β’ Published Mar 12 β’ 1
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare Paper β’ 2405.19298 β’ Published May 29, 2024
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper β’ 2411.13281 β’ Published Nov 20, 2024 β’ 20
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper β’ 2410.05993 β’ Published Oct 8, 2024 β’ 111
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper β’ 2407.15754 β’ Published Jul 22, 2024 β’ 20
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels Paper β’ 2312.17090 β’ Published Dec 28, 2023 β’ 4
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper β’ 2311.06783 β’ Published Nov 12, 2023 β’ 28
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision Paper β’ 2309.14181 β’ Published Sep 25, 2023 β’ 2
FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling Paper β’ 2207.02595 β’ Published Jul 6, 2022