compressed-tensors

Library for utilization of compressed safetensors of neural network models

Apache 2.0 170 个版本
The vLLM Project <vllm-questions@lists.berkeley.edu>
安装
pip install compressed-tensors
poetry add compressed-tensors
pipenv install compressed-tensors
conda install compressed-tensors
描述

compressed-tensors

The compressed-tensors library extends the safetensors format, providing a versatile and efficient way to store and manage compressed tensor data. This library supports various quantization and sparsity schemes, making it a unified format for handling different model optimizations like GPTQ, AWQ, SmoothQuant, INT8, FP8, SparseGPT, and more.

Why compressed-tensors?

As model compression becomes increasingly important for efficient deployment of LLMs, the landscape of quantization and compression techniques has become increasingly fragmented. Each method often comes with its own storage format and loading procedures, making it challenging to work with multiple techniques or switch between them. compressed-tensors addresses this by providing a single, extensible format that can represent a wide variety of compression schemes.

  • Unified Checkpoint Format: Supports various compression schemes in a single, consistent format.
  • Wide Compatibility: Works with popular quantization methods like GPTQ, SmoothQuant, and FP8. See llm-compressor
  • Flexible Quantization Support:
    • Weight-only quantization (e.g., W4A16, W8A16, WnA16)
    • Activation quantization (e.g., W8A8)
    • KV cache quantization
    • Non-uniform schemes (different layers can be quantized in different ways!)
  • Sparsity Support: Handles both unstructured and semi-structured (e.g., 2:4) sparsity patterns.
  • Open-Source Integration: Designed to work seamlessly with Hugging Face models and PyTorch.

This allows developers and researchers to easily experiment with composing different quantization methods, simplify model deployment pipelines, and reduce the overhead of supporting multiple compression formats in inference engines.

Installation

From PyPI

Stable release:

pip install compressed-tensors

Nightly release:

pip install --pre compressed-tensors

From Source

git clone https://github.com/vllm-project/compressed-tensors
cd compressed-tensors
pip install -e .

Getting started

Saving a Compressed Model with PTQ

We can use compressed-tensors to run basic post training quantization (PTQ) and save the quantized model compressed on disk

model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda:0", torch_dtype="auto")

config = QuantizationConfig.parse_file("./examples/bit_packing/int4_config.json")
config.quantization_status = QuantizationStatus.CALIBRATION
apply_quantization_config(model, config)

dataset = load_dataset("ptb_text_only")["train"]
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples["sentence"], padding=False, truncation=True, max_length=1024)

tokenized_dataset = dataset.map(tokenize_function, batched=True)
data_loader = DataLoader(tokenized_dataset, batch_size=1, collate_fn=DefaultDataCollator())

with torch.no_grad():
    for idx, sample in tqdm(enumerate(data_loader), desc="Running calibration"):
        sample = {key: value.to(device) for key,value in sample.items()}
        _ = model(**sample)

        if idx >= 512:
            break

model.apply(freeze_module_quantization)
model.apply(compress_quantized_weights)

output_dir = "./ex_llama1.1b_w4a16_packed_quantize"
compressor = ModelCompressor.from_pretrained_model(model)
compressor.compress_model(model)
model.save_pretrained(output_dir)
版本列表
0.17.1 2026-06-11
0.17.0 2026-06-03
0.17.2a20260622 2026-06-23
0.17.2a20260618 2026-06-19
0.17.2a20260616 2026-06-17
0.17.2a20260611 2026-06-11
0.17.1a20260610 2026-06-11
0.17.1a20260604 2026-06-05
0.17.1a20260602 2026-06-03
0.16.0 2026-05-28
0.16.1a20260602 2026-06-03
0.16.1a20260529 2026-05-31
0.16.1a20260526 2026-05-28
0.15.0.1 2026-04-10
0.15.0 2026-04-08
0.15.1a20260526 2026-05-27
0.15.1a20260521 2026-05-22
0.15.1a20260520 2026-05-21
0.15.1a20260515 2026-05-16
0.15.1a20260503 2026-05-06
0.15.1a20260428 2026-04-29
0.15.1a20260421 2026-04-24
0.15.1a20260416 2026-04-17
0.15.1a20260414 2026-04-15
0.15.1a20260413 2026-04-14
0.15.1a20260409 2026-04-09
0.15.1a20260406 2026-04-08
0.14.0.1 2026-03-11
0.14.0 2026-02-27
0.14.1a20260406 2026-04-07
0.14.1a20260326 2026-03-27
0.14.1a20260325 2026-03-25
0.14.1a20260323 2026-03-24
0.14.1a20260320 2026-03-22
0.14.1a20260317 2026-03-18
0.14.1a20260313 2026-03-16
0.14.1a20260310 2026-03-11
0.14.1a20260309 2026-03-10
0.14.1a20260306 2026-03-07
0.14.1a20260305 2026-03-05
0.14.1a20260225 2026-02-27
0.13.0 2025-12-16
0.13.1a20260225 2026-02-26
0.13.1a20260223 2026-02-24
0.13.1a20260219 2026-02-20
0.13.1a20260218 2026-02-19
0.13.1a20260217 2026-02-18
0.13.1a20260212 2026-02-13
0.13.1a20260211 2026-02-12
0.13.1a20260210 2026-02-11
0.13.1a20260209 2026-02-10
0.13.1a20260205 2026-02-08
0.13.1a20260203 2026-02-04
0.13.1a20260130 2026-01-30
0.13.1a20260127 2026-01-28
0.13.1a20260123 2026-01-24
0.13.1a20260116 2026-01-19
0.13.1a20260115 2026-01-16
0.13.1a20260109 2026-01-10
0.13.1a20260108 2026-01-09
0.13.1a20251215 2025-12-16
0.12.2 2025-10-07
0.12.1 2025-10-02
0.12.0 2025-10-01
0.12.3a20251215 2025-12-16
0.12.3a20251214 2025-12-15
0.12.3a20251212 2025-12-13
0.12.3a20251203 2025-12-04
0.12.3a20251114 2025-11-15
0.12.3a20251110 2025-11-11
0.12.3a20251030 2025-11-01
0.12.3a20251028 2025-10-28
0.12.3a20251023 2025-10-24
0.12.3a20251013 2025-10-14
0.12.3a20251010 2025-10-11
0.12.3a20251009 2025-10-10
0.12.3a20251008 2025-10-09
0.12.3a20251007 2025-10-08
0.12.3a20251003 2025-10-07
0.12.2a20251003 2025-10-05
0.12.2a20251002 2025-10-02
0.12.1a20251001 2025-10-01
0.11.0 2025-08-19
0.11.1a20250929 2025-09-30
0.11.1a20250923 2025-09-25
0.11.1a20250918 2025-09-19
0.11.1a20250917 2025-09-18
0.11.1a20250912 2025-09-13
0.11.1a20250911 2025-09-12
0.11.1a20250910 2025-09-11
0.11.1a20250909 2025-09-10
0.11.1a20250908 2025-09-09
0.11.1a20250904 2025-09-05
0.11.1a20250903 2025-09-04
0.11.1a20250902 2025-09-03
0.11.1a20250828 2025-08-29
0.11.1a20250821 2025-08-22
0.11.1a20250820 2025-08-21
0.11.1a20250819 2025-08-19
0.10.2 2025-06-23
0.10.1 2025-06-06
0.10.0 2025-06-05
0.10.3a20250815 2025-08-16
0.10.3a20250814 2025-08-15
0.10.3a20250812 2025-08-13
0.10.3a20250811 2025-08-12
0.10.3a20250806 2025-08-08
0.10.3a20250805 2025-08-06
0.10.3a20250731 2025-08-01
0.10.3a20250728 2025-07-29
0.10.3a20250724 2025-07-25
0.10.3a20250721 2025-07-22
0.10.3a20250716 2025-07-17
0.10.3a20250715 2025-07-16
0.10.3a20250711 2025-07-12
0.10.3a20250710 2025-07-11
0.10.3a20250709 2025-07-10
0.10.3a20250708 2025-07-09
0.10.3a20250707 2025-07-08
0.10.3a20250703 2025-07-04
0.10.3a20250701 2025-07-03
0.10.3a20250620 2025-06-24
0.10.2a20250620 2025-06-21
0.10.2a20250617 2025-06-18
0.10.2a20250616 2025-06-17
0.10.2a20250613 2025-06-14
0.10.2a20250612 2025-06-13
0.10.2a20250611 2025-06-12
0.10.2a20250609 2025-06-10
0.10.2a20250606 2025-06-06
0.10.1a20250605 2025-06-06
0.10.1a20250604 2025-06-05
0.9.4 2025-04-24
0.9.3 2025-04-02
0.9.2 2025-02-18
0.9.1 2025-01-23
0.9.0 2025-01-15
0.9.5a20250604 2025-06-05
0.9.5a20250603 2025-06-04
0.9.5a20250602 2025-06-03
0.9.5a20250530 2025-05-31
0.9.5a20250528 2025-05-29
0.9.5a20250521 2025-05-22
0.9.5a20250520 2025-05-21
0.9.5a20250519 2025-05-20
0.9.5a20250514 2025-05-15
0.9.5a20250513 2025-05-14
0.9.5a20250512 2025-05-13
0.9.5a20250509 2025-05-10
0.9.5a20250507 2025-05-08
0.9.5a20250502 2025-05-03
0.9.5a20250428 2025-04-29
0.9.5a20250425 2025-04-28
0.9.5a20250424 2025-04-25
0.9.4a20250421 2025-04-23
0.9.4a20250414 2025-04-15
0.9.4a20250412 2025-04-12
0.9.4a20250410 2025-04-11
0.9.4a20250408 2025-04-09
0.8.1 2024-12-11
0.8.0 2024-11-12
0.7.1 2024-10-17
0.7.0 2024-10-09
0.6.0 2024-09-23
0.5.0 2024-08-08
0.4.0 2024-06-21
0.3.3 2024-05-07
0.3.2 2024-04-29
0.3.1 2024-04-25
0.3.0 2024-04-25