preshed

Cython hash table that trusts the keys are pre-hashed

MIT 49 个版本 Python <3.15,>=3.9
Explosion <contact@explosion.ai>
安装
pip install preshed
poetry add preshed
pipenv install preshed
conda install preshed
描述

preshed: Cython Hash Table for Pre-Hashed Keys

Simple but high performance Cython hash table mapping pre-randomized keys to void* values. Inspired by Jeff Preshing.

All Python APIs provded by the BloomFilter and PreshMap classes are thread-safe on both the GIL-enabled build and the free-threaded build of Python 3.14 and newer. If you use the C API or the PreshCounter class, you must provide external synchronization if you use the data structures by this library in a multithreaded environment.

tests pypi Version conda Version Python wheels

Installation

pip install preshed --only-binary preshed

Or with conda:

conda install -c conda-forge preshed

Usage

PreshMap

A hash map for pre-hashed keys, mapping uint64 to uint64 values.

from preshed.maps import PreshMap

map = PreshMap()                  # create with default size
map = PreshMap(initial_size=1024) # create with initial capacity (must be power of 2)

map[key] = value        # set a value
value = map[key]        # get a value (returns None if missing)
value = map.pop(key)    # remove and return a value
del map[key]            # delete a key
key in map              # membership test
len(map)                # number of entries

for key in map:                    # iterate over keys
    pass
for key, value in map.items():     # iterate over key-value pairs
    pass
for value in map.values():         # iterate over values
    pass

BloomFilter

A probabilistic set for fast membership testing of integer keys.

from preshed.bloom import BloomFilter

bloom = BloomFilter(size=1024, hash_funcs=23)  # explicit parameters
bloom = BloomFilter.from_error_rate(10000, error_rate=1e-4)  # auto-sized

bloom.add(42)          # add a key
42 in bloom            # membership test (may have false positives)

data = bloom.to_bytes()            # serialize
bloom.from_bytes(data)             # deserialize in-place

PreshCounter

A counter backed by a hash map, for counting occurrences of uint64 keys.

from preshed.counter import PreshCounter

counter = PreshCounter()

counter.inc(key, 1)       # increment key by 1
count = counter[key]      # get current count
len(counter)              # number of buckets

for key, count in counter: # iterate over entries
    pass

counter.smooth()           # apply Good-Turing smoothing
prob = counter.prob(key)   # get smoothed probability

Cython API

All classes expose a C-level API via .pxd files for use in Cython extensions. The low-level MapStruct and BloomStruct functions operate on raw structs and can be called without the GIL:

from preshed.maps cimport PreshMap, map_get, map_set, map_iter, key_t
from preshed.bloom cimport BloomFilter, bloom_add, bloom_contains

cdef PreshMap table = PreshMap()

# Low-level nogil access (requires external synchronization)
cdef void* value
with nogil:
    value = map_get(table.c_map, some_key)
版本列表
4.0.0 2023-04-27
3.0.13 2026-03-23
3.0.12 2025-11-17
3.0.11 2025-11-13
3.0.10 2025-05-26
3.0.9 2023-09-15
3.0.8 2022-10-14
3.0.7 2022-08-15
3.0.6 2021-11-08
3.0.5 2020-12-07
3.0.4 2020-11-05
3.0.3 2020-11-02
3.0.2 2019-09-24
3.0.1 2019-09-24
3.0.0 2019-09-10
3.0.0.dev2 2019-09-10
2.0.1 2018-10-14
1.0.1 2018-10-13
1.0.0 2016-09-30
0.46.4 2016-04-30
0.46.3 2016-03-25
0.46.2 2016-02-19
0.46.1 2015-12-22
0.45.0 2015-12-22
0.44.0 2015-12-22
0.44 2015-11-05
0.43 2015-11-02
0.42 2015-10-10
0.41 2015-07-15
0.40 2015-07-15
0.39 2015-07-15
0.38 2015-07-15
0.37 2015-01-04
0.36 2015-01-04
0.35 2015-01-04
0.34 2015-01-04
0.33 2015-01-04
0.32 2015-01-03
0.31 2015-01-03
0.30 2015-01-03
0.28 2015-01-03
0.27 2015-01-03
0.25 2015-01-02
0.24 2015-01-02
0.23 2015-01-02
0.22 2015-01-02
0.21 2015-01-02
0.20 2014-12-19
0.1 2014-09-26