ChatGLM：智谱 AI 的自主预训练对话模型

FreeGuideOnline 最新 2026-06-22

bash pip install torch transformers accelerate sentencepiece protobuf


#### 加载模型并生成对话
```python
from transformers import AutoTokenizer, AutoModel

# 模型名称，可替换为 THUDM/chatglm3-6b
model_name = "THUDM/chatglm3-6b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True).half().cuda()
model = model.eval()

# 多轮对话示例
history = []
while True:
    query = input("用户：")
    if query == "退出":
        break
    response, history = model.chat(tokenizer, query, history=history)
    print("ChatGLM：", response)

量化推理（降低显存占用）

ChatGLM3-6B 支持 INT4 量化，只需修改加载方式：

model = AutoModel.from_pretrained(model_name, trust_remote_code=True).quantize(4).cuda()

5. 高级功能：工具调用与代码解释器

ChatGLM3 和 GLM-4 系列内置了 工具调用（Function Call） 能力，让模型能够自主调用外部 API、执行代码，从而完成更复杂的任务。

5.1 定义工具函数

def get_weather(city: str):
    # 模拟天气查询
    return f"{city}当前天气：晴，26°C"

tools = [
    {
        "name": "get_weather",
        "description": "查询指定城市的实时天气",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "城市名称"}
            },
            "required": ["city"]
        }
    }
]

5.2 使用模型调用工具

query = "北京今天天气如何？"
# 调用 chat 接口，传入 tools 参数
response, history = model.chat(
    tokenizer,
    query,
    history=[],
    tools=tools
)
print(response)

模型会返回类似 {"name": "get_weather", "parameters": {"city": "北京"}} 的结构化响应，你可以在本地执行该函数，并将结果追加到 history 中继续对话。

6. 微调你的 ChatGLM

当通用对话能力不能满足特定领域需求时，可以对 ChatGLM 进行监督微调（SFT）。推荐使用官方支持的 Lora 高效微调方案。

6.1 准备数据

数据应为 JSON 格式，每条样本包含“对话历史”和“目标回答”。示例：

{
  "conversations": [
    {"role": "user", "content": "介绍一下糖尿病患者的饮食注意事项"},
    {"role": "assistant", "content": "糖尿病患者的饮食应控制总热量摄入，优先选择低升糖指数食物……"}
  ]
}

6.2 使用 PEFT 微调

安装 peft 和 datasets 库后，可基于以下脚本快速启动：

from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset

model_name = "THUDM/chatglm3-6b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).half().cuda()

# 配置 Lora
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["query_key_value"]
)
model = get_peft_model(model, lora_config)

# 加载数据集并格式化
dataset = load_dataset("json", data_files="your_data.json")
def preprocess(example):
    # 拼接对话，加入 ChatGLM 的特殊 token
    return tokenizer(...)
dataset = dataset.map(preprocess)

# 训练设置
training_args = TrainingArguments(
    output_dir="./output",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    save_steps=500,
    logging_steps=100,
    learning_rate=2e-5,
    fp16=True
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    tokenizer=tokenizer
)
trainer.train()

微调完成后，只需加载 Lora 权重即可使用定制模型。

7. 部署与服务化

为了让模型对外提供 API 服务，推荐使用 FastChat 或 vLLM 框架。以 vLLM 为例部署 GLM-4-9B：

python -m vllm.entrypoints.openai.api_server \
    --model THUDM/glm-4-9b-chat \
    --trust-remote-code \
    --dtype auto

启动后，即可通过 OpenAI 兼容接口调用：

import openai
openai.api_base = "http://localhost:8000/v1"
response = openai.ChatCompletion.create(
    model="THUDM/glm-4-9b-chat",
    messages=[{"role": "user", "content": "你好"}]
)