Damerau Levenshtein¶
Functions¶
distance¶
- rapidfuzz.distance.DamerauLevenshtein.distance(s1, s2, *, processor=None, score_cutoff=None)¶
Calculates the Damerau-Levenshtein distance.
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (int, optional) – Maximum distance between s1 and s2, that is considered as a result. If the distance is bigger than score_cutoff, score_cutoff + 1 is returned instead. Default is None, which deactivates this behaviour.
- Returns:
distance – distance between s1 and s2
- Return type:
int
Examples
Find the Damerau-Levenshtein distance between two strings:
>>> from rapidfuzz.distance import DamerauLevenshtein >>> DamerauLevenshtein.distance("CA", "ABC") 2
normalized_distance¶
- rapidfuzz.distance.DamerauLevenshtein.normalized_distance(s1, s2, *, processor=None, score_cutoff=None)¶
Calculates a normalized Damerau-Levenshtein distance in the range [1, 0].
This is calculated as
distance / max(len1, len2)
.- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For norm_dist > score_cutoff 1.0 is returned instead. Default is 1.0, which deactivates this behaviour.
- Returns:
norm_dist – normalized distance between s1 and s2 as a float between 0 and 1.0
- Return type:
float
similarity¶
- rapidfuzz.distance.DamerauLevenshtein.similarity(s1, s2, *, processor=None, score_cutoff=None)¶
Calculates the Damerau-Levenshtein similarity in the range [max, 0].
This is calculated as
max(len1, len2) - distance
.- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (int, optional) – Maximum distance between s1 and s2, that is considered as a result. If the similarity is smaller than score_cutoff, 0 is returned instead. Default is None, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2
- Return type:
int
normalized_similarity¶
- rapidfuzz.distance.DamerauLevenshtein.normalized_similarity(s1, s2, *, processor=None, score_cutoff=None)¶
Calculates a normalized Damerau-Levenshtein similarity in the range [0, 1].
This is calculated as
1 - normalized_distance
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For norm_sim < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
norm_sim – normalized similarity between s1 and s2 as a float between 0 and 1.0
- Return type:
float
Performance¶
The following image shows a benchmark of the Damerau Levenshtein distance in
RapidFuzz and jellyfish. Both have a time complexity of O(NM)
. However RapidFuzz
only requires O(N + M)
while the implementation in jellyfish requires
has a memory usage of O(NM)
.