fuzzywuzzy

Fuzzy string matching in python

GPLv2 27 个版本
Adam Cohen <adam@seatgeek.com>
安装
pip install fuzzywuzzy
poetry add fuzzywuzzy
pipenv install fuzzywuzzy
conda install fuzzywuzzy
描述

.. image:: https://travis-ci.org/seatgeek/fuzzywuzzy.svg?branch=master :target: https://travis-ci.org/seatgeek/fuzzywuzzy

FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance>_ to calculate the differences between sequences in a simple-to-use package.

Requirements

  • Python 2.7 or higher
  • difflib
  • python-Levenshtein <https://github.com/ztane/python-Levenshtein/>_ (optional, provides a 4-10x speedup in String Matching, though may result in differing results for certain cases <https://github.com/seatgeek/fuzzywuzzy/issues/128>_)

For testing

-  pycodestyle
-  hypothesis
-  pytest

Installation
============

Using PIP via PyPI

.. code:: bash

    pip install fuzzywuzzy

or the following to install `python-Levenshtein` too

.. code:: bash

    pip install fuzzywuzzy[speedup]


Using PIP via Github

.. code:: bash

    pip install git+git://github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy

Adding to your ``requirements.txt`` file (run ``pip install -r requirements.txt`` afterwards)

.. code:: bash

    git+ssh://git@github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy

Manually via GIT

.. code:: bash

    git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
    cd fuzzywuzzy
    python setup.py install


Usage
=====

.. code:: python

    >>> from fuzzywuzzy import fuzz
    >>> from fuzzywuzzy import process

Simple Ratio

.. code:: python

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio


.. code:: python

    >>> fuzz.partial_ratio("this is a test", "this is a test!")
        100

Token Sort Ratio

.. code:: python

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio


.. code:: python

    >>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
        84
    >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
        100

Process
~~~~~~~

.. code:: python

    >>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
    >>> process.extract("new york jets", choices, limit=2)
        [('New York Jets', 100), ('New York Giants', 78)]
    >>> process.extractOne("cowboys", choices)
        ("Dallas Cowboys", 90)

You can also pass additional parameters to ``extractOne`` method to make it use a specific scorer. A typical use case is to match file paths:

.. code:: python

    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
        ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
        ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

.. |Build Status| image:: https://api.travis-ci.org/seatgeek/fuzzywuzzy.png?branch=master
   :target: https:travis-ci.org/seatgeek/fuzzywuzzy

Known Ports
============

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

-  Java: `xpresso's fuzzywuzzy implementation <https://github.com/WantedTechnologies/xpresso/wiki/Approximate-string-comparison-and-pattern-matching-in-Java>`_
-  Java: `fuzzywuzzy (java port) <https://github.com/xdrop/fuzzywuzzy>`_
-  Rust: `fuzzyrusty (Rust port) <https://github.com/logannc/fuzzyrusty>`_
-  JavaScript: `fuzzball.js (JavaScript port) <https://github.com/nol13/fuzzball.js>`_
-  C++: `Tmplt/fuzzywuzzy <https://github.com/Tmplt/fuzzywuzzy>`_
-  C#: `fuzzysharp (.Net port) <https://github.com/BoomTownRoi/BoomTown.FuzzySharp>`_
-  Go: `go-fuzzywuzz (Go port) <https://github.com/paul-mannino/go-fuzzywuzzy>`_
-  Free Pascal: `FuzzyWuzzy.pas (Free Pascal port) <https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas>`_
-  Kotlin multiplatform: `FuzzyWuzzy-Kotlin <https://github.com/willowtreeapps/fuzzywuzzy-kotlin>`_
-  R: `fuzzywuzzyR (R port) <https://github.com/mlampros/fuzzywuzzyR>`_


版本列表
0.18.0 2020-02-13
0.17.0 2018-08-20
0.16.0 2017-12-18
0.15.1 2017-07-19
0.15.0 2017-02-20
0.14.0 2016-11-04
0.13.0 2016-11-01
0.12.0 2016-09-14
0.11.1 2016-07-27
0.11.0 2016-06-30
0.10.0 2016-03-14
0.9.0 2016-03-07
0.8.2 2016-02-26
0.8.1 2016-01-25
0.8.0 2015-11-16
0.7.0 2015-10-02
0.6.2 2015-09-03
0.6.1 2015-08-07
0.6.0 2015-07-20
0.5.0 2015-02-04
0.4.0 2014-10-31
0.3.3 2014-10-22
0.3.2 2014-09-12
0.3.1 2014-08-24
0.3.0 2014-08-24
0.2 2013-05-28
0.1 2011-07-24