WizardMath：基于 Evol-Instruct 的数学增强微调

FreeGuideOnline 最新 2026-06-22

text 种子数学指令 (如 MATH 训练集问题) | [ 向上进化 / 向下进化 ] （由强模型重写） | 候选指令 + 答案 | [ 自动过滤 ]

数学格式校验
答案一致性检查（多次采样投票）
长度与难度过滤 | 高质量微调数据


该流程可迭代多轮，每一轮都用上一轮生成的高质量数据作为新的种子，使数据分布更贴合目标困难区间。

## 微调实施细节

WizardMath 采用标准的监督微调（SFT）范式，实际操作中的关键参数与技巧如下：

- **基座模型选择**：推荐 Code Llama、Mistral-7B 等代码与逻辑能力较强的模型。
- **指令模板**：统一使用 `"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{数学问题}\n\n### Response:"` 格式。
- **训练框架**：多数开源实现基于 `LLaMA-Factory`、`Firefly` 或原生 Transformers + Deepspeed。
- **关键超参**：
  - 学习率 2e-5，余弦退火调度
  - 全局批次大小 128
  - 序列最大长度 2048 或 4096
  - 训练 3 个 epoch
- **仅对答案部分计算损失**：将提示部分的标签设为 `-100`，让模型仅学习生成解答步骤。
- **混合精度**：使用 bf16 或 fp16 降低显存占用。

## 效果与基准测评

在多个数学推理数据集上的典型表现（以 7B 模型为例）：

| 模型 | GSM8K | MATH |
|------|-------|------|
| Llama-2-7B (原始) | 14.6 | 2.5 |
| WizardMath-7B (v1.0) | 54.9 | 10.7 |
| WizardMath-7B (改进版) | 83.2 | 33.0 |
| 闭源模型 (如早期 GPT-3.5) | 57.1 | 原文未报告 |

改进版通过增加强化学习（RLEIF）等后训练手段进一步拉升。可以看出，仅通过数据进化与微调，小模型即可实现数倍的性能跃升。

## 实战：使用开源 WizardMath 模型

### 方式一：直接加载模型推理

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "WizardLM/WizardMath-7B-V1.1"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
已知二次函数 f(x)=x^2-4x+3，求 f(x) 在区间 [0,3] 上的最大值与最小值。

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

方式二：通过 LLaMA-Factory 自行微调

准备数据：采用项目提供的 WizardMath 数据集，或自己运行 Evol-Instruct 生成。
配置训练 yml：

### model
model_name_or_path: mistralai/Mistral-7B-v0.1

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: wizard_math_evol
template: alpaca
cutoff_len: 4096
overwrite_cache: false
preprocessing_num_workers: 16

### output
output_dir: saves/mistral-7b/wizard-math
logging_steps: 10
save_steps: 500
plot_loss: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 2.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
bf16: true

启动训练：

llamafactory-cli train configs/math_sft.yaml