密度加权主动学习:结合代表性与不确定性的策略
FreeGuideOnline
最新
2026-06-27
python import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.neighbors import NearestNeighbors
def entropy(prob): return -np.sum(prob * np.log(prob + 1e-10), axis=1)
假设已有模型 model, 未标注池 X_pool, 每轮挑选数量 B
proba = model.predict_proba(X_pool) uncertainty = entropy(proba)
计算密度 (基于未标注池)
k = 10 nn = NearestNeighbors(n_neighbors=k, algorithm='auto').fit(X_pool) distances, _ = nn.kneighbors(X_pool)
平均距离的倒数作为密度,加上微小值防止除零
density = 1.0 / (np.mean(distances, axis=1) + 1e-5)
beta = 1.0 score = uncertainty * (density ** beta)
选择得分最高的B个样本索引
selected = np.argsort(score)[-B:]