视频搜索:跨模态视频片段检索与定位
FreeGuideOnline
最新
2026-06-24
python text_features = clip.encode_text("a person opening a fridge") video_features = precomputed_clip_features # shape: (T, D) scores = cosine_similarity(text_features, video_features) pred_start, pred_end = find_best_segment(scores)