异常解释:为检测到的离群点提供归因分析

FreeGuideOnline 最新 2026-06-24

python import pandas as pd import numpy as np

模拟历史数据(正常时段的每分钟销售额)

np.random.seed(42) regions = ['华东', '华南', '华北'] products = ['A', 'B', 'C'] data = [] for r in regions: for p in products: base = np.random.normal(100, 10) # 正常均值100 data.append([r, p, base]) df_hist = pd.DataFrame(data, columns=['region', 'product', 'avg_sales'])

当前时刻(异常点)的实际销售额

current = pd.DataFrame([ ['华东', 'A', 30], ['华东', 'B', 95], ['华东', 'C', 100], ['华南', 'A', 105], ['华南', 'B', 110], ['华南', 'C', 98], ['华北', 'A', 102], ['华北', 'B', 99], ['华北', 'C', 101], ], columns=['region', 'product', 'actual_sales'])

合并计算每个组合的惊奇度(Surprise)

merged = current.merge(df_hist, on=['region', 'product']) merged['predicted'] = merged['avg_sales'] merged['surprise'] = (merged['actual_sales'] - merged['predicted']) / merged['predicted']

整体总销售额的下降量

total_actual = merged['actual_sales'].sum() total_pred = merged['predicted'].sum() total_drop = total_actual - total_pred # 为负数表示下降 print(f"整体销售额变化: {total_drop:.0f}")

计算每个维度组合对总下降的贡献(实际下降的绝对量占比)

merged['drop_contribution'] = merged['actual_sales'] - merged['predicted'] merged['contribution_pct'] = merged['drop_contribution'] / total_drop * 100

找出贡献最大的根因

result = merged[['region','product','actual_sales','predicted','surprise','contribution_pct']] result = result.sort_values('surprise') # 最小Surprise是负向贡献最大的 print(result)


输出将明确告诉你:

整体销售额变化: -71 region product actual_sales predicted surprise contribution_pct 0 华东 A 30 101.8 -0.7053 101.1 1 华东 B 95 101.2 -0.0613 8.7 ...