仇恨言论检测:识别冒犯性与攻击性内容

FreeGuideOnline 最新 2026-06-23

python from datasets import load_dataset dataset = load_dataset("hate_speech_offensive", split="train")

类别: 0-仇恨, 1-冒犯, 2-正常


### 基线:TF-IDF + 逻辑回归
快速构建可解释基线,评估难度。

```python
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report

texts = dataset["tweet"]
labels = dataset["class"]

pipeline = make_pipeline(
    TfidfVectorizer(ngram_range=(1,2), max_features=10000),
    LogisticRegression(max_iter=1000)
)
# 交叉验证后进行评估

微调预训练模型

使用 DistilBERT 平衡速度和性能。

from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments

model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)

def tokenize(batch):
    return tokenizer(batch["tweet"], padding=True, truncation=True)

dataset = dataset.map(tokenize, batched=True)
dataset = dataset.rename_column("class", "labels")
# 设置训练参数并训练