rapidfuzz.fuzz¶

ratio¶

rapidfuzz.fuzz.ratio(s1, s2, *, processor=None, score_cutoff=None)¶

Calculates the normalized Indel distance.

Parameters:

s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

partial_ratio¶

rapidfuzz.fuzz.partial_ratio(s1, s2, *, processor=None, score_cutoff=None)¶

Searches for the optimal alignment of the shorter string in the longer string and returns the fuzz.ratio for this alignment.

Parameters:

s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

Depending on the length of the needle (shorter string) different implementations are used to improve the performance.

short needle (length ≤ 64):: When using a short needle length the fuzz.ratio is calculated for all alignments that could result in an optimal alignment. It is guaranteed to find the optimal alignment. For short needles this is very fast, since for them fuzz.ratio runs in O(N) time. This results in a worst case performance of O(NM).

../_images/partial_ratio_short_needle.svg

long needle (length > 64):

For long needles a similar implementation to FuzzyWuzzy is used. This implementation only considers alignments which start at one of the longest common substrings. This results in a worst case performance of O(N[N/64]M). However usually most of the alignments can be skipped. The following Python code shows the concept:

blocks = SequenceMatcher(None, needle, longer, False).get_matching_blocks()
score = 0
for block in blocks:
    long_start = block[1] - block[0] if (block[1] - block[0]) > 0 else 0
    long_end = long_start + len(shorter)
    long_substr = longer[long_start:long_end]
    score = max(score, fuzz.ratio(needle, long_substr))

This is a lot faster than checking all possible alignments. However it only finds one of the best alignments and not necessarily the optimal one.

../_images/partial_ratio_long_needle.svg

Examples

>>> fuzz.partial_ratio("this is a test", "this is a test!")
100.0

partial_ratio_alignment¶

rapidfuzz.fuzz.partial_ratio_alignment(s1, s2, *, processor=None, score_cutoff=None)¶

Searches for the optimal alignment of the shorter string in the longer string and returns the fuzz.ratio and the corresponding alignment.

Parameters:

s1 (str | bytes) – First string to compare.
s2 (str | bytes) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff None is returned instead. Default is 0, which deactivates this behaviour.

Returns:

alignment – alignment between s1 and s2 with the score as a float between 0 and 100

Return type:

ScoreAlignment, optional

Examples

>>> s1 = "a certain string"
>>> s2 = "cetain"
>>> res = fuzz.partial_ratio_alignment(s1, s2)
>>> res
ScoreAlignment(score=83.33333333333334, src_start=2, src_end=8, dest_start=0, dest_end=6)

Using the alignment information it is possible to calculate the same fuzz.ratio

>>> fuzz.ratio(s1[res.src_start:res.src_end], s2[res.dest_start:res.dest_end])
83.33333333333334

token_set_ratio¶

rapidfuzz.fuzz.token_set_ratio(s1, s2, *, processor=None, score_cutoff=None)¶

Compares the words in the strings based on unique and common words between them using fuzz.ratio

Parameters:

s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

Examples

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
83.8709716796875
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100.0

partial_token_set_ratio¶

rapidfuzz.fuzz.partial_token_set_ratio(s1, s2, *, processor=None, score_cutoff=None)¶

Compares the words in the strings based on unique and common words between them using fuzz.partial_ratio

Parameters:

s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

token_sort_ratio¶

rapidfuzz.fuzz.token_sort_ratio(s1, s2, *, processor=None, score_cutoff=None)¶

Sorts the words in the strings and calculates the fuzz.ratio between them

Parameters:

s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

Examples

>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100.0

partial_token_sort_ratio¶

rapidfuzz.fuzz.partial_token_sort_ratio(s1, s2, *, processor=None, score_cutoff=None)¶

sorts the words in the strings and calculates the fuzz.partial_ratio between them

Parameters:

s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

token_ratio¶

rapidfuzz.fuzz.token_ratio(s1, s2, *, processor=None, score_cutoff=None)¶

Helper method that returns the maximum of fuzz.token_set_ratio and fuzz.token_sort_ratio (faster than manually executing the two functions)

Parameters:

s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

partial_token_ratio¶

rapidfuzz.fuzz.partial_token_ratio(s1, s2, *, processor=None, score_cutoff=None)¶

Helper method that returns the maximum of fuzz.partial_token_set_ratio and fuzz.partial_token_sort_ratio (faster than manually executing the two functions)

Parameters:

s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

WRatio¶

rapidfuzz.fuzz.WRatio(s1, s2, *, processor=None, score_cutoff=None)¶

Calculates a weighted ratio based on the other ratio algorithms

Parameters:

s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

QRatio¶

rapidfuzz.fuzz.QRatio(s1, s2, *, processor=None, score_cutoff=None)¶

Calculates a quick ratio between two strings using fuzz.ratio.

Since v3.0 this behaves similar to fuzz.ratio with the exception that this returns 0 when comparing two empty strings

Parameters:

s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Examples

>>> fuzz.QRatio("this is a test", "this is a test!")
96.55171966552734