标注平台集成：将标注工具接入人机协同循环

FreeGuideOnline 最新 2026-06-29

bash docker run -it -p 8080:8080 -v pwd/mydata:/label-studio/data heartexlabs/label-studio:latest

访问 `http://localhost:8080`，创建账户并登录。
2. 安装 Python 依赖：
```bash
pip install label-studio-sdk scikit-learn requests

创建标注项目与连接模型

我们将模拟一个文本分类任务。你需要：

在 Label Studio 界面创建一个“文本分类”项目，复制项目 ID（在 Settings 页面可查）。
生成 API Token（Account > Access Token）。

使用 SDK 初始化项目并设置标注接口：

from label_studio_sdk import Client

LABEL_STUDIO_URL = 'http://localhost:8080'
API_KEY = '你的token'

client = Client(url=LABEL_STUDIO_URL, api_key=API_KEY)
project = client.start_project(
    title='人机协同文本分类',
    label_config='''
    <View>
      <Text name="text" value="$text"/>
      <Choices name="sentiment" toName="text" choice="single">
        <Choice value="positive"/>
        <Choice value="negative"/>
      </Choices>
    </View>
    '''
)

使用 Label Studio ML 后端实现自动预标注

ML 后端是一个独立的服务，接收标注任务并返回预测结果。官方提供了示例框架，你只需重写 predict 方法。

克隆并安装 ML 后端模板：

git clone https://github.com/heartexlabs/label-studio-ml-backend
cd label-studio-ml-backend/label_studio_ml/examples/simple_text_classifier
pip install -r requirements.txt

编写 model.py，用你自己的模型接口替换默认逻辑（这里用伪代码示意）：

class MyModel(LabelStudioMLBase):
    def predict(self, tasks, **kwargs):
        predictions = []
        for task in tasks:
            text = task['data']['text']
            # 调用你的模型预测
            score = your_model.predict_proba(text)
            pred = {'model_version': 'v1', 'result': [{
                'from_name': 'sentiment',
                'to_name': 'text',
                'type': 'choices',
                'value': {'choices': ['positive']},
                'score': score
            }]}
            predictions.append(pred)
        return predictions

启动 ML 后端并注册到 Label Studio：
```
label-studio-ml init my_backend --script model.py
label-studio-ml start my_backend
```
在 Label Studio 项目设置中，添加 ML 后端指向 http://localhost:9090。此后导入的任何任务都会自动显示模型预标结果，标注员只需确认或修正。

通过 Webhook 触发模型训练更新

当一定数量的标注积累后，我们希望自动训练模型并用新版本替换 ML 后端。步骤如下：

在项目 Settings > Webhooks 页，添加一个 webhook：URL 填 http://你的服务地址/annotation_created，触发事件勾选 “Annotation created”。

编写接收 webhook 的微服务（用 Flask 示例）：

from flask import Flask, request
import subprocess, json

app = Flask(__name__)
ANNOTATION_THRESHOLD = 50  # 积累50条新样本后启动训练

@app.route('/annotation_created', methods=['POST'])
def handle_annotation():
    data = request.get_json()
    task_count = data['task']['project']  # 实际项目中通过API查询当前已审核的标注数
    # 伪逻辑：检查是否达到阈值
    if check_count_reached(ANNOTATION_THRESHOLD):
        # 拉取所有已完成标注，训练新模型
        export_annotations_and_train()
        # 重启 ML 后端加载新模型（或通过热替换）
        subprocess.Popen(['label-studio-ml', 'start', 'my_backend'])
        return 'Training triggered'
    return 'OK'