JaroWinkler

Functions

distance

rapidfuzz.distance.JaroWinkler.distance(s1, s2, *, prefix_weight=0.1, processor=None, score_cutoff=None)

Calculates the jaro winkler distance

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • prefix_weight (float, optional) – Weight used for the common prefix of the two strings. Has to be between 0 and 0.25. Default is 0.1.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For ratio < score_cutoff 0 is returned instead. Default is None, which deactivates this behaviour.

Returns:

distance – distance between s1 and s2 as a float between 1.0 and 0.0

Return type:

float

Raises:

ValueError – If prefix_weight is invalid

normalized_distance

rapidfuzz.distance.JaroWinkler.normalized_distance(s1, s2, *, prefix_weight=0.1, processor=None, score_cutoff=None)

Calculates the normalized jaro winkler distance

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • prefix_weight (float, optional) – Weight used for the common prefix of the two strings. Has to be between 0 and 0.25. Default is 0.1.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For ratio < score_cutoff 0 is returned instead. Default is None, which deactivates this behaviour.

Returns:

normalized distance – normalized distance between s1 and s2 as a float between 1.0 and 0.0

Return type:

float

Raises:

ValueError – If prefix_weight is invalid

similarity

rapidfuzz.distance.JaroWinkler.similarity(s1, s2, *, prefix_weight=0.1, processor=None, score_cutoff=None)

Calculates the jaro winkler similarity

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • prefix_weight (float, optional) – Weight used for the common prefix of the two strings. Has to be between 0 and 0.25. Default is 0.1.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For ratio < score_cutoff 0 is returned instead. Default is None, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 1.0

Return type:

float

Raises:

ValueError – If prefix_weight is invalid

normalized_similarity

rapidfuzz.distance.JaroWinkler.normalized_similarity(s1, s2, *, prefix_weight=0.1, processor=None, score_cutoff=None)

Calculates the normalized jaro winkler similarity

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • prefix_weight (float, optional) – Weight used for the common prefix of the two strings. Has to be between 0 and 0.25. Default is 0.1.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 1.0. For ratio < score_cutoff 0 is returned instead. Default is None, which deactivates this behaviour.

Returns:

normalized similarity – normalized similarity between s1 and s2 as a float between 0 and 1.0

Return type:

float

Raises:

ValueError – If prefix_weight is invalid

Performance

The following image shows a benchmark of the Jaro-Winkler similarity in RapidFuzz and jellyfish. Jellyfish uses an implementation with a time complexity of O(NM), while RapidFuzz has a time complexity of O([N/64]M).

../../_images/jaro_winkler.svg