stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 9
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 9
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 9
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 6
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 8
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 8
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 6
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 9
stellalisy/rethink_rlvr_reproduce-random-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 9
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 9
stellalisy/rethink_rlvr_reproduce-random-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 8
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 55
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 12
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 8