仇恨言论检测:识别冒犯性与攻击性内容
FreeGuideOnline
最新
2026-06-23
python from datasets import load_dataset dataset = load_dataset("hate_speech_offensive", split="train")
类别: 0-仇恨, 1-冒犯, 2-正常
### 基线:TF-IDF + 逻辑回归
快速构建可解释基线,评估难度。
```python
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report
texts = dataset["tweet"]
labels = dataset["class"]
pipeline = make_pipeline(
TfidfVectorizer(ngram_range=(1,2), max_features=10000),
LogisticRegression(max_iter=1000)
)
# 交叉验证后进行评估
微调预训练模型
使用 DistilBERT 平衡速度和性能。
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
def tokenize(batch):
return tokenizer(batch["tweet"], padding=True, truncation=True)
dataset = dataset.map(tokenize, batched=True)
dataset = dataset.rename_column("class", "labels")
# 设置训练参数并训练