Mohammed Mohammed Ali

MohammedEltoum

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

SAM 3: Segment Anything with Concepts

upvoted a paper about 2 months ago

Depth Anything 3: Recovering the Visual Space from Any Views

upvoted an article 2 months ago

LeRobot v0.4.0: Supercharging OSS Robot Learning

View all activity

Organizations

upvoted a paper about 1 month ago

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 125

upvoted a paper about 2 months ago

Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 96

upvoted an article 2 months ago

Article

LeRobot v0.4.0: Supercharging OSS Robot Learning

Oct 24, 2025

•

upvoted a paper 3 months ago

AnyUp: Universal Feature Upsampling

Paper • 2510.12764 • Published Oct 14, 2025 • 11

upvoted 2 papers 4 months ago

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9, 2025 • 83

OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning

Paper • 2509.01644 • Published Sep 1, 2025 • 33

upvoted a paper 5 months ago

DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 291

upvoted an article 5 months ago

Article

Vision Language Model Alignment in TRL ⚡️

Aug 7, 2025

•

105

upvoted 2 papers 5 months ago

MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11, 2025 • 44

Enhanced Arabic Text Retrieval with Attentive Relevance Scoring

Paper • 2507.23404 • Published Jul 31, 2025 • 2

upvoted 2 papers 6 months ago

Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers

Paper • 2507.10787 • Published Jul 14, 2025 • 12

AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

Paper • 2506.19851 • Published Jun 24, 2025 • 60

upvoted an article 7 months ago

Article

How to Build an MCP Server with Gradio

Apr 30, 2025

•

201

upvoted 2 papers 8 months ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 320

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 98

upvoted an article 8 months ago

Article

Vision Language Models (Better, faster, stronger)

May 12, 2025

•

580

upvoted 4 papers 8 months ago

Vision-Language-Action Models: Concepts, Progress, Applications and Challenges

Paper • 2505.04769 • Published May 7, 2025 • 9

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8, 2025 • 185

FG-CLIP: Fine-Grained Visual and Textual Alignment

Paper • 2505.05071 • Published May 8, 2025 • 18

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

Paper • 2505.01043 • Published May 2, 2025 • 10

Mohammed Mohammed Ali

AI & ML interests

Recent Activity

Organizations

MohammedEltoum's activity

LeRobot v0.4.0: Supercharging OSS Robot Learning

Vision Language Model Alignment in TRL ⚡️

How to Build an MCP Server with Gradio

Vision Language Models (Better, faster, stronger)