smg-grpc-servicer

SMG gRPC servicer implementations for LLM inference engines (vLLM, MLX, TokenSpeed, SGLang)

Apache-2.0 8 个版本 Python >=3.10
安装
pip install smg-grpc-servicer
poetry add smg-grpc-servicer
pipenv install smg-grpc-servicer
conda install smg-grpc-servicer
描述

smg-grpc-servicer

gRPC servicer implementations for LLM inference engines. Supports vLLM, MLX, TokenSpeed, and SGLang.

Installation

For vLLM:

pip install smg-grpc-servicer[vllm]

For MLX:

pip install smg-grpc-servicer[mlx]

For TokenSpeed, install the TokenSpeed runtime first, then install the servicer bridge:

pip install smg-grpc-servicer

For SGLang:

pip install smg-grpc-servicer[sglang]

Usage

vLLM

vllm serve meta-llama/Llama-2-7b-hf --grpc

MLX

python -m smg_grpc_servicer.mlx --model meta-llama/Llama-2-7b-hf --host 0.0.0.0 --port 50051

TokenSpeed

python -m smg_grpc_servicer.tokenspeed --model meta-llama/Llama-2-7b-hf --host 0.0.0.0 --port 50051

SGLang

sglang serve --model-path meta-llama/Llama-2-7b-hf --grpc-mode

Architecture

smg-grpc-servicer[vllm]    ──optional dep──>  vllm       (lazy import)
smg-grpc-servicer[mlx]     ──optional dep──>  mlx-lm     (lazy import)
smg-grpc-servicer          ──external runtime──>  tokenspeed (lazy import)
smg-grpc-servicer[sglang]  ──optional dep──>  sglang     (lazy import)
smg-grpc-servicer          ──depends on────>  smg-grpc-proto  (hard dependency)
vllm                       ──optional──────>  smg-grpc-servicer (via vllm serve --grpc)
sglang                     ──optional──────>  smg-grpc-servicer (via --grpc-mode)

Backend dependencies are isolated via extras or runtime installs to avoid conflicts between vLLM, MLX, TokenSpeed, and SGLang.

Development

See DEVELOPMENT.md for local development setup, CI, and release workflows.