不确定性采样:选择模型最困惑的样本去标注

FreeGuideOnline 最新 2026-06-27

python import numpy as np from sklearn.linear_model import LogisticRegression

假设已有初始标注集 X_initial, y_initial 和未标注池 X_pool

model = LogisticRegression() model.fit(X_initial, y_initial)

获取未标注池的预测概率

probas = model.predict_proba(X_pool)

计算熵:对每个样本沿类别轴求和

entropy = -np.sum(probas * np.log(probas + 1e-10), axis=1)

选出熵最高的 10 个样本索引

query_idx = np.argsort(entropy)[-10:] samples_to_label = X_pool[query_idx]