Hamming

Functions

distance

rapidfuzz.distance.Hamming.distance(s1, s2, *, pad=True, processor=None, score_cutoff=None)

Calculates the Hamming distance between two strings. The hamming distance is defined as the number of positions where the two strings differ. It describes the minimum amount of substitutions required to transform s1 into s2.

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • pad (bool, optional) – should strings be padded if there is a length difference. If pad is False and strings have a different length a ValueError is thrown instead. Defaults is True.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (int or None, optional) – Maximum distance between s1 and s2, that is considered as a result. If the distance is bigger than score_cutoff, score_cutoff + 1 is returned instead. Default is None, which deactivates this behaviour.

Returns:

distance – distance between s1 and s2

Return type:

int

Raises:

ValueError – If s1 and s2 have a different length

normalized_distance

rapidfuzz.distance.Hamming.normalized_distance(s1, s2, *, pad=True, processor=None, score_cutoff=None)

Calculates a normalized Hamming similarity in the range [1, 0].

This is calculated as distance / (len1 + len2).

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • pad (bool, optional) – should strings be padded if there is a length difference. If pad is False and strings have a different length a ValueError is thrown instead. Defaults is True.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For norm_dist > score_cutoff 1.0 is returned instead. Default is 1.0, which deactivates this behaviour.

Returns:

norm_dist – normalized distance between s1 and s2 as a float between 0 and 1.0

Return type:

float

Raises:

ValueError – If s1 and s2 have a different length

similarity

rapidfuzz.distance.Hamming.similarity(s1, s2, *, pad=True, processor=None, score_cutoff=None)

Calculates the Hamming similarity between two strings.

This is calculated as len1 - distance.

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • pad (bool, optional) – should strings be padded if there is a length difference. If pad is False and strings have a different length a ValueError is thrown instead. Defaults is True.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (int, optional) – Maximum distance between s1 and s2, that is considered as a result. If the similarity is smaller than score_cutoff, 0 is returned instead. Default is None, which deactivates this behaviour.

Returns:

distance – distance between s1 and s2

Return type:

int

Raises:

ValueError – If s1 and s2 have a different length

normalized_similarity

rapidfuzz.distance.Hamming.normalized_similarity(s1, s2, *, pad=True, processor=None, score_cutoff=None)

Calculates a normalized Hamming similarity in the range [0, 1].

This is calculated as 1 - normalized_distance

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • pad (bool, optional) – should strings be padded if there is a length difference. If pad is False and strings have a different length a ValueError is thrown instead. Defaults is True.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For norm_sim < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

norm_sim – normalized similarity between s1 and s2 as a float between 0 and 1.0

Return type:

float

Raises:

ValueError – If s1 and s2 have a different length