rapidfuzz.fuzz

ratio

rapidfuzz.fuzz.ratio(s1, s2, *, processor=None, score_cutoff=None)

Calculates the normalized Indel similarity.

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

See also

rapidfuzz.distance.Indel.normalized_similarity

Normalized Indel similarity

Notes

../_images/ratio.svg

Examples

>>> fuzz.ratio("this is a test", "this is a test!")
96.55171966552734

partial_ratio

rapidfuzz.fuzz.partial_ratio(s1, s2, *, processor=None, score_cutoff=None)

Searches for the optimal alignment of the shorter string in the longer string and returns the fuzz.ratio for this alignment.

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

Depending on the length of the needle (shorter string) different implementations are used to improve the performance.

short needle (length ≤ 64):

When using a short needle length the fuzz.ratio is calculated for all alignments that could result in an optimal alignment. It is guaranteed to find the optimal alignment. For short needles this is very fast, since for them fuzz.ratio runs in O(N) time. This results in a worst case performance of O(NM).

../_images/partial_ratio_short_needle.svg
long needle (length > 64):

For long needles a similar implementation to FuzzyWuzzy is used. This implementation only considers alignments which start at one of the longest common substrings. This results in a worst case performance of O(N[N/64]M). However usually most of the alignments can be skipped. The following Python code shows the concept:

blocks = SequenceMatcher(None, needle, longer, False).get_matching_blocks()
score = 0
for block in blocks:
    long_start = block[1] - block[0] if (block[1] - block[0]) > 0 else 0
    long_end = long_start + len(shorter)
    long_substr = longer[long_start:long_end]
    score = max(score, fuzz.ratio(needle, long_substr))

This is a lot faster than checking all possible alignments. However it only finds one of the best alignments and not necessarily the optimal one.

../_images/partial_ratio_long_needle.svg

Examples

>>> fuzz.partial_ratio("this is a test", "this is a test!")
100.0

partial_ratio_alignment

rapidfuzz.fuzz.partial_ratio_alignment(s1, s2, *, processor=None, score_cutoff=None)

token_set_ratio

rapidfuzz.fuzz.token_set_ratio(s1, s2, *, processor=None, score_cutoff=None)

Compares the words in the strings based on unique and common words between them using fuzz.ratio

Parameters:
  • s1 (str) – First string to compare.

  • s2 (str) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/token_set_ratio.svg

Examples

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
83.8709716796875
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100.0
# Returns 100.0 if one string is a subset of the other, regardless of extra content in the longer string
>>> fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear")
100.0
# Score is reduced only when there is explicit disagreement in the two strings
>>> fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear but not a cat")
92.3076923076923

partial_token_set_ratio

rapidfuzz.fuzz.partial_token_set_ratio(s1, s2, *, processor=None, score_cutoff=None)

Compares the words in the strings based on unique and common words between them using fuzz.partial_ratio

Parameters:
  • s1 (str) – First string to compare.

  • s2 (str) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/partial_token_set_ratio.svg

token_sort_ratio

rapidfuzz.fuzz.token_sort_ratio(s1, s2, *, processor=None, score_cutoff=None)

Sorts the words in the strings and calculates the fuzz.ratio between them

Parameters:
  • s1 (str) – First string to compare.

  • s2 (str) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/token_sort_ratio.svg

Examples

>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100.0

partial_token_sort_ratio

rapidfuzz.fuzz.partial_token_sort_ratio(s1, s2, *, processor=None, score_cutoff=None)

sorts the words in the strings and calculates the fuzz.partial_ratio between them

Parameters:
  • s1 (str) – First string to compare.

  • s2 (str) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/partial_token_sort_ratio.svg

token_ratio

rapidfuzz.fuzz.token_ratio(s1, s2, *, processor=None, score_cutoff=None)

Helper method that returns the maximum of fuzz.token_set_ratio and fuzz.token_sort_ratio (faster than manually executing the two functions)

Parameters:
  • s1 (str) – First string to compare.

  • s2 (str) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/token_ratio.svg

partial_token_ratio

rapidfuzz.fuzz.partial_token_ratio(s1, s2, *, processor=None, score_cutoff=None)

Helper method that returns the maximum of fuzz.partial_token_set_ratio and fuzz.partial_token_sort_ratio (faster than manually executing the two functions)

Parameters:
  • s1 (str) – First string to compare.

  • s2 (str) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/partial_token_ratio.svg

WRatio

rapidfuzz.fuzz.WRatio(s1, s2, *, processor=None, score_cutoff=None)

Calculates a weighted ratio based on the other ratio algorithms

Parameters:
  • s1 (str) – First string to compare.

  • s2 (str) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/WRatio.svg

QRatio

rapidfuzz.fuzz.QRatio(s1, s2, *, processor=None, score_cutoff=None)

Calculates a quick ratio between two strings using fuzz.ratio.

Since v3.0 this behaves similar to fuzz.ratio with the exception that this returns 0 when comparing two empty strings

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Examples

>>> fuzz.QRatio("this is a test", "this is a test!")
96.55171966552734