SPDX-FileCopyrightText: 2024-2026 PyThaiNLP Project
SPDX-License-Identifier: Apache-2.0
nlpO3 Python binding

Python binding for nlpO3,
a Thai natural language processing library written in Rust.
To install:
pip install nlpo3
Table of Contents
Features
- Thai word tokenizer
segment() - use maximal-matching dictionary-based tokenization algorithm
and honor Thai Character Cluster boundaries
- 2.5x faster
than similar pure Python implementation (PyThaiNLP's newmm)
load_dict() - load a dictionary from a plain text file
(one word per line)
Use
Load a dictionary file and assign it a name (for example, dict_name).
Then tokenize text using the named dictionary:
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "dict_name")
segment("สวัสดีครับ", "dict_name")
The function returns a list of strings, for example:
['สวัสดี', 'ครับ']
The result depends on the words included in the dictionary.
Use multithread mode using the dict_name dictionary:
segment("สวัสดีครับ", dict_name="dict_name", parallel=True)
Use safe mode to avoid long run times for inputs with many ambiguous
word boundaries:
segment("สวัสดีครับ", dict_name="dict_name", safe=True)
Dictionary
- To keep the library small, nlpO3 does not include a dictionary.
Users must provide a dictionary when using the dictionary-based tokenizer.
- For tokenization dictionaries, try
Build
Requirements
- Rust 2018 Edition
- Python 3.7 or newer (PyO3's minimum supported version)
- Python Development Headers
- Ubuntu:
sudo apt-get install python3-dev
- macOS: No action needed
- PyO3 - already included in
Cargo.toml
- setuptools-rust
Steps
python -m pip install --upgrade build
python -m build
This should generate a wheel file, in dist/ directory,
which can be installed by pip.
To install a wheel from a local directory:
pip install dist/nlpo3-1.3.1-cp311-cp311-macosx_12_0_x86_64.whl
Test
To run a Python unit test:
cd tests
python -m unittest
Issues
Please report issues at https://github.com/PyThaiNLP/nlpo3/issues
License
nlpO3 Python binding is copyrighted by its authors
and licensed under terms of the Apache Software License 2.0 (Apache-2.0).
See file LICENSE for details.
Binary wheels
Pre-built binary packages for CPython, GraalPy, and PyPy are available
on PyPI for the platforms listed below.
Versions with a "t" suffix indicate CPython with free threading.
| Python |
OS |
Architecture |
Binary wheel |
| 3.14 |
Windows |
x86 |
✓ |
|
|
AMD64 |
✓ |
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
✓ |
|
musllinux |
x86_64 |
✓ |
| 3.14t |
Windows |
x86 |
✓ |
|
|
AMD64 |
✓ |
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
✓ |
|
musllinux |
x86_64 |
✓ |
| 3.13 |
Windows |
x86 |
✓ |
|
|
AMD64 |
✓ |
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
✓ |
|
musllinux |
x86_64 |
✓ |
| 3.12 |
Windows |
x86 |
✓ |
|
|
AMD64 |
✓ |
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
✓ |
|
musllinux |
x86_64 |
✓ |
| 3.11 |
Windows |
x86 |
✓ |
|
|
AMD64 |
✓ |
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
✓ |
|
musllinux |
x86_64 |
✓ |
| 3.10 |
Windows |
x86 |
✓ |
|
|
AMD64 |
✓ |
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
✓ |
|
musllinux |
x86_64 |
✓ |
| 3.9 |
Windows |
x86 |
✓ |
|
|
AMD64 |
✓ |
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
✓ |
|
musllinux |
x86_64 |
✓ |
| 3.8 |
Windows |
x86 |
✓ (v1.3.1) |
|
|
AMD64 |
✓ (v1.3.1) |
|
macOS |
x86_64 |
✓ (v1.3.1) |
|
|
arm64 |
✓ (v1.3.1) |
|
manylinux |
x86_64 |
✓ (v1.3.1) |
|
|
i686 |
✓ (v1.3.1) |
|
musllinux |
x86_64 |
✓ (v1.3.1) |
| 3.7 |
Windows |
x86 |
✓ (v1.3.1) |
|
|
AMD64 |
✓ (v1.3.1) |
|
macOS |
x86_64 |
✓ (v1.3.1) |
|
|
arm64 |
|
|
manylinux |
x86_64 |
✓ (v1.3.1) |
|
|
i686 |
✓ (v1.3.1) |
|
musllinux |
x86_64 |
✓ (v1.3.1) |
| GraalPy 3.12 |
Windows |
x86 |
|
|
|
AMD64 |
|
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
|
| GraalPy 3.11 |
Windows |
x86 |
|
|
|
AMD64 |
|
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
|
| PyPy 3.11 |
Windows |
x86 |
|
|
|
AMD64 |
✓ |
|
macOS |
x86_64 |
✓ |
|
|
arm64 |
✓ |
|
manylinux |
x86_64 |
✓ |
|
|
i686 |
✓ |
| PyPy 3.10 |
Windows |
x86 |
|
|
|
AMD64 |
✓ (v1.3.1) |
|
macOS |
x86_64 |
✓ (v1.3.1) |
|
|
arm64 |
✓ (v1.3.1) |
|
manylinux |
x86_64 |
✓ (v1.3.1) |
|
|
i686 |
✓ (v1.3.1) |
| PyPy 3.9 |
Windows |
x86 |
|
|
|
AMD64 |
✓ (v1.3.1) |
|
macOS |
x86_64 |
✓ (v1.3.1) |
|
|
arm64 |
✓ (v1.3.1) |
|
manylinux |
x86_64 |
✓ (v1.3.1) |
|
|
i686 |
✓ (v1.3.1) |
| PyPy 3.8 |
Windows |
x86 |
|
|
|
AMD64 |
✓ (v1.3.1) |
|
macOS |
x86_64 |
✓ (v1.3.1) |
|
|
arm64 |
✓ (v1.3.1) |
|
manylinux |
x86_64 |
✓ (v1.3.1) |
|
|
i686 |
✓ (v1.3.1) |
| PyPy 3.7 |
Windows |
x86 |
|
|
|
AMD64 |
✓ (v1.3.1) |
|
macOS |
x86_64 |
✓ (v1.3.1) |
|
|
arm64 |
|
|
manylinux |
x86_64 |
✓ (v1.3.1) |
|
|
i686 |
✓ (v1.3.1) |