arxiv:2502.04270
Yaqi Duan
duanyq
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 1 month ago
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence
Reweighting
upvoted
a
paper
11 months ago
PILAF: Optimal Human Preference Sampling for Reward Modeling
authored
a paper
11 months ago
PILAF: Optimal Human Preference Sampling for Reward Modeling
Organizations
None yet