NeuLab @ LTI/CMU

university

https://www.cs.cmu.edu/~neulab/

neulab

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

yuexiang96 authored a paper 3 days ago

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

yuexiang96 authored a paper 3 days ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

yuexiang96 authored a paper 3 days ago

Simulating Environments with Reasoning Models for Agent Training

View all activity

yuexiang96

authored 4 papers 3 days ago

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published Oct 28 • 27

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29 • 45

Simulating Environments with Reasoning Models for Agent Training

Paper • 2511.01824 • Published Nov 3 • 1

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published 11 days ago • 32

lintang

published a model 8 days ago

neulab/qwen3-8b-cso-alpha

Updated 8 days ago

yueqis

updated a dataset 17 days ago

neulab/agent-data-collection

Preview • Updated 17 days ago • 4.64k • 104

seungone

authored a paper 18 days ago

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

Paper • 2511.22173 • Published 22 days ago • 12

yueqis

updated a dataset 20 days ago

neulab/VisualPuzzles

Viewer • Updated 20 days ago • 1.17k • 258 • 11

akariasai

authored a paper 24 days ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published 25 days ago • 58

yueqis

in neulab/agent-data-collection 24 days ago

Data in chat template agnostic format

#4 opened about 1 month ago by

license please

#2 opened about 2 months ago by

Nyandwi

authored a paper 4 months ago

Grounding Multilingual Multimodal LLMs With Cultural Knowledge

Paper • 2508.07414 • Published Aug 10 • 1

ProKil

authored 2 papers 4 months ago

Sotopia-RL: Reward Design for Social Intelligence

Paper • 2508.03905 • Published Aug 5 • 23

SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions

Paper • 2506.23046 • Published Jun 29 • 1

yuexiang96

authored 6 papers 6 months ago

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17 • 40

Evaluating Vision-Language Models as Evaluators in Path Planning

Paper • 2411.18711 • Published Nov 27, 2024

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published Mar 13 • 24

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

Paper • 2503.19877 • Published Mar 25 • 1

VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge

Paper • 2504.10342 • Published Apr 14 • 10

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Paper • 2504.12329 • Published Apr 12