attaparse

Thai dependency parser.

MIT 2 个版本 Python >=3.8
安装
pip install attaparse
poetry add attaparse
pipenv install attaparse
conda install attaparse
描述

Attaparse : Thai Dependency Parser

attaparse is a Thai dependency parser trained using stanza. Attaparse uses PhayaThaiBERT as a based model in training process. The model refer to Stanza*P with no POS model in Thai Universal Dependency Treebank (TUD).

Content

  1. Installation
  2. Usage

Installation

attaparse can be installed using pip:

pip install attaparse

Usage

Initialising

import attaparse
from attaparse import load_model, depparse

nlp = load_model()

Plain Text

Uses Stanza's default Thai tokeniser.

text = 'ฉันอยากกินข้าวที่แม่ทำ'

doc = depparse(text, nlp)

Pipe-Delimited Input

from attaparse import depparse_pipe_delimited

nlp = load_model(tokenize_pretokenized=True)
pipe_text = "ฉัน|รัก|เธอ"

doc = depparse_pipe_delimited(pipe_text, nlp)

Pre-tokenised List Input

from attaparse import depparse_pretokenized

nlp = load_model(tokenize_pretokenized=True)
tokens = [["ฉัน", "กิน", "ข้าว"]]

doc = depparse_pretokenized(tokens, nlp)

Access the Results

print(f'\n{text}\n',*[f'id: {word.id}\tword: {word.text}\thead id: {word.head}\thead: {sent.words[word.head-1].text if word.head > 0 else "root"}\tdeprel: {word.deprel}' for sent in doc.sentences for word in sent.words], sep='\n')
  • .id : the id of the word.
  • .head : the head of the word.
  • .deprel : the dependency relationship between the word and the head.

Citation

If you use attaparse in your project or publication, please cite as follows:

Panyut Sriwirote, Wei Qi Leong, Charin Polpanumas, Santhawat Thanyawong, William Chandra Tjhi, Wirote Aroonmanakun, and Attapol T. Rutherford. 2025. The Thai Universal Dependency Treebank. Transactions of the Association for Computational Linguistics, 13:376–391.

BibTex

@article{sriwirote-etal-2025-thai,
    title = "The {T}hai {U}niversal {D}ependency Treebank",
    author = "Sriwirote, Panyut  and
      Leong, Wei Qi  and
      Polpanumas, Charin  and
      Thanyawong, Santhawat  and
      Tjhi, William Chandra  and
      Aroonmanakun, Wirote  and
      Rutherford, Attapol T.",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "13",
    year = "2025",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/2025.tacl-1.18/",
    doi = "10.1162/tacl_a_00745",
    pages = "376--391"
}