rapidfuzz.fuzz¶
ratio¶
- rapidfuzz.fuzz.ratio(s1, s2, *, processor=None, score_cutoff=None)¶
Calculates the normalized Indel similarity.
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
See also
rapidfuzz.distance.Indel.normalized_similarity
Normalized Indel similarity
Notes
Examples
>>> fuzz.ratio("this is a test", "this is a test!") 96.55171966552734
partial_ratio¶
- rapidfuzz.fuzz.partial_ratio(s1, s2, *, processor=None, score_cutoff=None)¶
Searches for the optimal alignment of the shorter string in the longer string and returns the fuzz.ratio for this alignment.
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
Notes
Depending on the length of the needle (shorter string) different implementations are used to improve the performance.
- short needle (length ≤ 64):
When using a short needle length the fuzz.ratio is calculated for all alignments that could result in an optimal alignment. It is guaranteed to find the optimal alignment. For short needles this is very fast, since for them fuzz.ratio runs in
O(N)
time. This results in a worst case performance ofO(NM)
.
- long needle (length > 64):
For long needles a similar implementation to FuzzyWuzzy is used. This implementation only considers alignments which start at one of the longest common substrings. This results in a worst case performance of
O(N[N/64]M)
. However usually most of the alignments can be skipped. The following Python code shows the concept:blocks = SequenceMatcher(None, needle, longer, False).get_matching_blocks() score = 0 for block in blocks: long_start = block[1] - block[0] if (block[1] - block[0]) > 0 else 0 long_end = long_start + len(shorter) long_substr = longer[long_start:long_end] score = max(score, fuzz.ratio(needle, long_substr))
This is a lot faster than checking all possible alignments. However it only finds one of the best alignments and not necessarily the optimal one.
Examples
>>> fuzz.partial_ratio("this is a test", "this is a test!") 100.0
partial_ratio_alignment¶
- rapidfuzz.fuzz.partial_ratio_alignment(s1, s2, *, processor=None, score_cutoff=None)¶
token_set_ratio¶
- rapidfuzz.fuzz.token_set_ratio(s1, s2, *, processor=None, score_cutoff=None)¶
Compares the words in the strings based on unique and common words between them using fuzz.ratio
- Parameters:
s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
Notes
Examples
>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 83.8709716796875 >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 100.0 # Returns 100.0 if one string is a subset of the other, regardless of extra content in the longer string >>> fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear") 100.0 # Score is reduced only when there is explicit disagreement in the two strings >>> fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear but not a cat") 92.3076923076923
partial_token_set_ratio¶
- rapidfuzz.fuzz.partial_token_set_ratio(s1, s2, *, processor=None, score_cutoff=None)¶
Compares the words in the strings based on unique and common words between them using fuzz.partial_ratio
- Parameters:
s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
Notes
token_sort_ratio¶
- rapidfuzz.fuzz.token_sort_ratio(s1, s2, *, processor=None, score_cutoff=None)¶
Sorts the words in the strings and calculates the fuzz.ratio between them
- Parameters:
s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
Notes
Examples
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 100.0
partial_token_sort_ratio¶
- rapidfuzz.fuzz.partial_token_sort_ratio(s1, s2, *, processor=None, score_cutoff=None)¶
sorts the words in the strings and calculates the fuzz.partial_ratio between them
- Parameters:
s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
Notes
token_ratio¶
- rapidfuzz.fuzz.token_ratio(s1, s2, *, processor=None, score_cutoff=None)¶
Helper method that returns the maximum of fuzz.token_set_ratio and fuzz.token_sort_ratio (faster than manually executing the two functions)
- Parameters:
s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
Notes
partial_token_ratio¶
- rapidfuzz.fuzz.partial_token_ratio(s1, s2, *, processor=None, score_cutoff=None)¶
Helper method that returns the maximum of fuzz.partial_token_set_ratio and fuzz.partial_token_sort_ratio (faster than manually executing the two functions)
- Parameters:
s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
Notes
WRatio¶
- rapidfuzz.fuzz.WRatio(s1, s2, *, processor=None, score_cutoff=None)¶
Calculates a weighted ratio based on the other ratio algorithms
- Parameters:
s1 (str) – First string to compare.
s2 (str) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
Notes
QRatio¶
- rapidfuzz.fuzz.QRatio(s1, s2, *, processor=None, score_cutoff=None)¶
Calculates a quick ratio between two strings using fuzz.ratio.
Since v3.0 this behaves similar to fuzz.ratio with the exception that this returns 0 when comparing two empty strings
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
Examples
>>> fuzz.QRatio("this is a test", "this is a test!") 96.55171966552734