Hamming¶
Functions¶
distance¶
- rapidfuzz.distance.Hamming.distance(s1, s2, *, pad=True, processor=None, score_cutoff=None)¶
Calculates the Hamming distance between two strings. The hamming distance is defined as the number of positions where the two strings differ. It describes the minimum amount of substitutions required to transform s1 into s2.
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
pad (bool, optional) – should strings be padded if there is a length difference. If pad is False and strings have a different length a ValueError is thrown instead. Defaults is True.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (int or None, optional) – Maximum distance between s1 and s2, that is considered as a result. If the distance is bigger than score_cutoff, score_cutoff + 1 is returned instead. Default is None, which deactivates this behaviour.
- Returns:
distance – distance between s1 and s2
- Return type:
int
- Raises:
ValueError – If s1 and s2 have a different length
normalized_distance¶
- rapidfuzz.distance.Hamming.normalized_distance(s1, s2, *, pad=True, processor=None, score_cutoff=None)¶
Calculates a normalized Hamming similarity in the range [1, 0].
This is calculated as
distance / (len1 + len2)
.- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
pad (bool, optional) – should strings be padded if there is a length difference. If pad is False and strings have a different length a ValueError is thrown instead. Defaults is True.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For norm_dist > score_cutoff 1.0 is returned instead. Default is 1.0, which deactivates this behaviour.
- Returns:
norm_dist – normalized distance between s1 and s2 as a float between 0 and 1.0
- Return type:
float
- Raises:
ValueError – If s1 and s2 have a different length
similarity¶
- rapidfuzz.distance.Hamming.similarity(s1, s2, *, pad=True, processor=None, score_cutoff=None)¶
Calculates the Hamming similarity between two strings.
This is calculated as
len1 - distance
.- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
pad (bool, optional) – should strings be padded if there is a length difference. If pad is False and strings have a different length a ValueError is thrown instead. Defaults is True.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (int, optional) – Maximum distance between s1 and s2, that is considered as a result. If the similarity is smaller than score_cutoff, 0 is returned instead. Default is None, which deactivates this behaviour.
- Returns:
distance – distance between s1 and s2
- Return type:
int
- Raises:
ValueError – If s1 and s2 have a different length
normalized_similarity¶
- rapidfuzz.distance.Hamming.normalized_similarity(s1, s2, *, pad=True, processor=None, score_cutoff=None)¶
Calculates a normalized Hamming similarity in the range [0, 1].
This is calculated as
1 - normalized_distance
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
pad (bool, optional) – should strings be padded if there is a length difference. If pad is False and strings have a different length a ValueError is thrown instead. Defaults is True.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For norm_sim < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
norm_sim – normalized similarity between s1 and s2 as a float between 0 and 1.0
- Return type:
float
- Raises:
ValueError – If s1 and s2 have a different length