多语言 NLP：跨语言表示与零样本迁移

FreeGuideOnline 最新 2026-06-23

bash pip install transformers torch datasets


### 2. 加载模型与数据

我们将使用 XLM-RoBERTa 基础模型，并在英文情感分类数据集（如 SST-2）上微调，然后直接对中文文本进行预测。为简化，这里使用已经微调好的英文情感模型 `cardiffnlp/twitter-xlm-roberta-base-sentiment`，该模型在英文推文情感数据上训练，支持 0（负面）、1（中立）、2（正面）三类。

```python
from transformers import pipeline

classifier = pipeline(
    "sentiment-analysis",
    model="cardiffnlp/twitter-xlm-roberta-base-sentiment",
    tokenizer="cardiffnlp/twitter-xlm-roberta-base-sentiment"
)

3. 进行零样本预测

输入几句不同语言的文本，模型都能给出合理的情感预测：

texts = [
    "I absolutely loved the movie, it was fantastic!",  # 英语
    "Der Film war schrecklich, ich habe ihn nicht gemocht.",  # 德语
    "这部电影太棒了，我超级推荐！",  # 中文
    "Me encantó la película, fue increíble."  # 西班牙语
]

for text in texts:
    result = classifier(text)
    print(f"Text: {text}")
    print(f"Label: {result[0]['label']}, Score: {result[0]['score']:.4f}\n")

输出示例：

Text: I absolutely loved the movie...
Label: positive, Score: 0.9992

Text: Der Film war schrecklich...
Label: negative, Score: 0.9871

Text: 这部电影太棒了...
Label: positive, Score: 0.9965

Text: Me encantó la película...
Label: positive, Score: 0.9987