trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
51 个版本
Python >=3.10
安装
pip install trafilatura
poetry add trafilatura
pipenv install trafilatura
conda install trafilatura
描述
分类
Development Status :: 5 - Production/Stable
Environment :: Console
Intended Audience :: Developers
Intended Audience :: Education
Intended Audience :: Information Technology
Intended Audience :: Science/Research
Operating System :: MacOS
Operating System :: Microsoft
Operating System :: POSIX
Programming Language :: Python
Programming Language :: Python :: 3
Programming Language :: Python :: 3.10
Programming Language :: Python :: 3.11
Programming Language :: Python :: 3.12
Programming Language :: Python :: 3.13
Programming Language :: Python :: 3.14
Topic :: Internet :: WWW/HTTP
Topic :: Scientific/Engineering :: Information Analysis
Topic :: Security
Topic :: Text Editors :: Text Processing
Topic :: Text Processing :: Linguistic
Topic :: Text Processing :: Markup :: HTML
Topic :: Text Processing :: Markup :: Markdown
Topic :: Text Processing :: Markup :: XML
Topic :: Utilities
版本列表
2.1.0
2026-06-07
2.0.0
2024-12-03
1.12.2
2024-09-10
1.12.1
2024-08-20
1.12.0
2024-07-30
1.11.0
2024-06-27
1.10.0
2024-05-30
1.9.0
2024-05-02
1.8.1
2024-04-03
1.8.0
2024-03-20
1.7.0
2024-01-25
1.6.4
2024-01-08
1.6.3
2023-11-29
1.6.2
2023-09-06
1.6.1
2023-06-15
1.6.0
2023-05-11
1.5.0
2023-03-30
1.4.1
2023-01-19
1.4.0
2022-10-18
1.3.0
2022-07-29
1.2.2
2022-05-18
1.2.1
2022-05-02
1.2.0
2022-03-07
1.1.0
2022-02-21
1.0.0
2021-11-30
0.9.3
2021-10-21
0.9.2
2021-10-06
0.9.1
2021-08-02
0.9.0
2021-06-15
0.8.2
2021-04-21
0.8.1
2021-03-11
0.8.0
2021-02-19
0.7.0
2021-01-04
0.6.1
2020-12-02
0.6.0
2020-11-06
0.5.2
2020-09-22
0.5.1
2020-07-15
0.5.0
2020-06-02
0.4.1
2020-04-23
0.4
2020-03-19
0.3.1
2020-01-24
0.3.0
2020-01-13
0.2.1
2019-12-03
0.2.0
2019-11-27
0.1.1
2019-10-08
0.1.0
2019-09-25
0.0.5
2019-09-16
0.0.4
2019-08-23
0.0.3
2019-08-09
0.0.2
2019-08-02
0.0.1
2019-07-17