smg-grpc-servicer
gRPC servicer implementations for LLM inference engines. Supports vLLM, MLX, TokenSpeed, and SGLang.
Installation
For vLLM:
pip install smg-grpc-servicer[vllm]
For MLX:
pip install smg-grpc-servicer[mlx]
For TokenSpeed, install the TokenSpeed runtime first, then install the servicer bridge:
pip install smg-grpc-servicer
For SGLang:
pip install smg-grpc-servicer[sglang]
Usage
vLLM
vllm serve meta-llama/Llama-2-7b-hf --grpc
MLX
python -m smg_grpc_servicer.mlx --model meta-llama/Llama-2-7b-hf --host 0.0.0.0 --port 50051
TokenSpeed
python -m smg_grpc_servicer.tokenspeed --model meta-llama/Llama-2-7b-hf --host 0.0.0.0 --port 50051
SGLang
sglang serve --model-path meta-llama/Llama-2-7b-hf --grpc-mode
Architecture
smg-grpc-servicer[vllm] ──optional dep──> vllm (lazy import)
smg-grpc-servicer[mlx] ──optional dep──> mlx-lm (lazy import)
smg-grpc-servicer ──external runtime──> tokenspeed (lazy import)
smg-grpc-servicer[sglang] ──optional dep──> sglang (lazy import)
smg-grpc-servicer ──depends on────> smg-grpc-proto (hard dependency)
vllm ──optional──────> smg-grpc-servicer (via vllm serve --grpc)
sglang ──optional──────> smg-grpc-servicer (via --grpc-mode)
Backend dependencies are isolated via extras or runtime installs to avoid conflicts between vLLM, MLX, TokenSpeed, and SGLang.
Development
See DEVELOPMENT.md for local development setup, CI, and release workflows.