Qwen 模型系列:通义千问大模型家族解析
FreeGuideOnline
最新
2026-06-22
python from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen2.5-7B-Instruct"
加载 tokenizer 和模型(使用 bfloat16 节省显存)
tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" )
使用 ChatML 格式构造对话
messages = [ {"role": "system", "content": "你是一个有用的助手。"}, {"role": "user", "content": "请用三种语言介绍 Qwen 模型系列。"} ]
应用聊天模板
text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True )
生成回复
inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True) print(response)
如果显存不足,可尝试使用 4-bit 量化加载:
```python
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map="auto"
)