Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -37,14 +37,14 @@ Baichuan-M2 采用了三个核心技术创新:首先通过**大型验证器系
|
|
| 37 |
|
| 38 |
### HealthBench指标
|
| 39 |
|
| 40 |
-
| 模型名称 |
|
| 41 |
|----------|-------------|------------------|-----------------------|
|
| 42 |
| Baichuan-M2 | 60.1 | 34.7 | 91.5 |
|
| 43 |
| gpt-oss-120b | 57.6 | 30 | 90 |
|
| 44 |
| Qwen3-235B-A22B-Thinking-2507 | 55.2 | 25.9 | 90.6 |
|
| 45 |
-
|
|
| 46 |
-
|
|
| 47 |
-
|
|
| 48 |
| gpt-oss-20b | 42.5 | 10.8 | 82.6 |
|
| 49 |
|
| 50 |
### 通用指标
|
|
@@ -53,11 +53,11 @@ Baichuan-M2 采用了三个核心技术创新:首先通过**大型验证器系
|
|
| 53 |
|--------|-----------------|-----------|
|
| 54 |
| AIME24 | 83.4 | 81.4 |
|
| 55 |
| AIME25 | 72.9 | 72.9 |
|
| 56 |
-
|
|
| 57 |
| CFBench | 77.6 | 75.7 |
|
| 58 |
| WritingBench | 8.56 | 7.90 |
|
| 59 |
|
| 60 |
-
*备注:AIME
|
| 61 |
|
| 62 |
|
| 63 |
## 🛠️ 技术特色
|
|
@@ -69,7 +69,7 @@ Baichuan-M2 采用了三个核心技术创新:首先通过**大型验证器系
|
|
| 69 |
|
| 70 |
### 医疗领域适应
|
| 71 |
- **Mid-Training**:医疗知识注入的同时保持通用能力
|
| 72 |
-
-
|
| 73 |
- **通专兼顾**:2:2:1 配比的医疗、通用、数学数据
|
| 74 |
|
| 75 |
## 🔧 快速开始
|
|
|
|
| 37 |
|
| 38 |
### HealthBench指标
|
| 39 |
|
| 40 |
+
| 模型名称 | HealthBench | HealthBench-Hard | HealthBench-Consensus |
|
| 41 |
|----------|-------------|------------------|-----------------------|
|
| 42 |
| Baichuan-M2 | 60.1 | 34.7 | 91.5 |
|
| 43 |
| gpt-oss-120b | 57.6 | 30 | 90 |
|
| 44 |
| Qwen3-235B-A22B-Thinking-2507 | 55.2 | 25.9 | 90.6 |
|
| 45 |
+
| Deepseek-R1-0528 | 53.6 | 22.6 | 91.5 |
|
| 46 |
+
| GLM-4.5 | 47.8 | 18.7 | 85.3 |
|
| 47 |
+
| Kimi-K2 | 43 | 10.7 | 90.9 |
|
| 48 |
| gpt-oss-20b | 42.5 | 10.8 | 82.6 |
|
| 49 |
|
| 50 |
### 通用指标
|
|
|
|
| 53 |
|--------|-----------------|-----------|
|
| 54 |
| AIME24 | 83.4 | 81.4 |
|
| 55 |
| AIME25 | 72.9 | 72.9 |
|
| 56 |
+
| Arena-Hard-v2.0 | 45.8 | 44.5 |
|
| 57 |
| CFBench | 77.6 | 75.7 |
|
| 58 |
| WritingBench | 8.56 | 7.90 |
|
| 59 |
|
| 60 |
+
*备注:AIME 的 max_tokens 设为 64k,其他评测集设为 32k,温度统一为 0.6。*
|
| 61 |
|
| 62 |
|
| 63 |
## 🛠️ 技术特色
|
|
|
|
| 69 |
|
| 70 |
### 医疗领域适应
|
| 71 |
- **Mid-Training**:医疗知识注入的同时保持通用能力
|
| 72 |
+
- **强化学习**:多阶段 RL 策略优化
|
| 73 |
- **通专兼顾**:2:2:1 配比的医疗、通用、数学数据
|
| 74 |
|
| 75 |
## 🔧 快速开始
|