不确定性采样:选择模型最困惑的样本去标注
FreeGuideOnline
最新
2026-06-27
python import numpy as np from sklearn.linear_model import LogisticRegression
假设已有初始标注集 X_initial, y_initial 和未标注池 X_pool
model = LogisticRegression() model.fit(X_initial, y_initial)
获取未标注池的预测概率
probas = model.predict_proba(X_pool)
计算熵:对每个样本沿类别轴求和
entropy = -np.sum(probas * np.log(probas + 1e-10), axis=1)
选出熵最高的 10 个样本索引
query_idx = np.argsort(entropy)[-10:] samples_to_label = X_pool[query_idx]