Changelog¶
[3.11.0] - 2024-12-17¶
Performance¶
improve calculation of min score inside partial_ratio so it can skip more alignments
Added¶
added build support for emscripten
[3.10.1] - 2024-10-24¶
Fixed¶
fix compilation on clang-19
fix incorrect results in simd optimized implementation of Levenshtein and OSA on 32bit targets
Added¶
added support for taskflow 3.8.0
[3.10.0] - 2024-09-21¶
Fixed¶
drop support for Python 3.8
switch build system to scikit-build-core
[3.9.7] - 2024-09-02¶
Fixed¶
fix crash in
cdist
due to Visual Studio upgrade
[3.9.6] - 2024-08-06¶
Changed¶
upgrade to
Cython==3.0.11
add python 3.13 wheels
[3.9.5] - 2024-07-29¶
Fixed¶
include simd binaries in pyinstaller builds
fix builds with setuptools 72 by upgrading scikit-build
[3.9.4] - 2024-07-02¶
Fixed¶
fix bug in
Levenshtein.editops
andLevenshtein.opcodes
which could lead to incorrect results and crashes for some inputs
[3.9.3] - 2024-05-31¶
Fixed¶
fix None handling for queries in
process.cdist
for scorers not supporting SIMD
[3.9.2] - 2024-05-28¶
Fixed¶
fix supported versions of taskflow in cmake to be in the range v3.3 - v3.7
[3.9.1] - 2024-05-19¶
Fixed¶
disable AVX2 on MacOS since it did lead to illegal instructions being generated
[3.9.0] - 2024-05-02¶
Changed¶
significantly improve type hints for the library
Fixed¶
fix cmake version parsing
[3.8.1] - 2024-04-07¶
Fixed¶
use the correct version of
rapidfuzz-cpp
when building against a system installed version
[3.8.0] - 2024-04-06¶
Added¶
added
process.cpdist
which allows pairwise comparison of two collection of inputs
Fixed¶
fix some minor errors in the type hints
fix potentially incorrect results of JaroWinkler when using high prefix weights
[3.7.0] - 2024-03-21¶
Changed¶
reduce importtime
[3.6.2] - 2024-03-05¶
Changed¶
upgrade to
Cython==3.0.9
Fixed¶
upgrade
rapidfuzz-cpp
which includes a fix for build issues on some compilersfix some issues with the sphinx config
[3.6.1] - 2023-12-28¶
Fixed¶
fix overflow error on systems with
sizeof(size_t) < 8
[3.6.0] - 2023-12-26¶
Fixed¶
fix pure python fallback implementation of
fuzz.token_set_ratio
properly link with
-latomic
ifstd::atomic<uint64_t>
is not natively supported
Performance¶
add banded implementation of LCS / Indel. This improves the runtime from
O((|s1|/64) * |s2|)
toO((score_cutoff/64) * |s2|)
Changed¶
upgrade to
Cython==3.0.7
cdist for many metrics now returns a matrix of
uint32
instead ofint32
by default
[3.5.2] - 2023-11-02¶
Fixed¶
use _mm_malloc/_mm_free on macOS if aligned_alloc is unsupported
[3.5.1] - 2023-10-31¶
Fixed¶
fix compilation failure on macOS
[3.5.0] - 2023-10-31¶
Changed¶
skip pandas
pd.NA
similar toNone
add
score_multiplier
argument toprocess.cdist
which allows multiplying the end result scores with a constant factor.drop support for Python 3.7
Performance¶
improve performance of simd implementation for
LCS
/Indel
/Jaro
/JaroWinkler
improve performance of Jaro and Jaro Winkler for long sequences
implement
process.extract
withlimit=1
usingprocess.extractOne
which can be faster
Fixed¶
the preprocessing function was always called through Python due to a broken C-API version check
fix wraparound issue in simd implementation of Jaro and Jaro Winkler
[3.4.0] - 2023-10-09¶
Changed¶
upgrade to
Cython==3.0.3
add simd implementation for Jaro and Jaro Winkler
[3.3.1] - 2023-09-25¶
Added¶
add missing tag for python 3.12 support
[3.3.0] - 2023-09-11¶
Changed¶
upgrade to
Cython==3.0.2
implement the remaining missing features from the C++ implementation in the pure Python implementation
Added¶
added support for Python 3.12
[3.2.0] - 2023-08-02¶
Changed¶
build x86 with sse2/avx2 runtime detection
[3.1.2] - 2023-07-19¶
Changed¶
upgrade to
Cython==3.0.0
[3.1.1] - 2023-06-06¶
Changed¶
upgrade to
taskflow==3.6
Fixed¶
replace usage of
isnan
withstd::isnan
which fixes the build on NetBSD
[3.1.0] - 2023-06-02¶
Changed¶
added keyword argument
pad
to Hamming distance. This controls whether sequences of different length should be padded or lead to aValueError
improve consistency of exception messages between the C++ and pure Python implementation
upgrade required Cython version to
Cython==3.0.0b3
Fixed¶
fix missing GIL restore when an exception is thrown inside
process.cdist
fix incorrect type hints for the
process
module
[3.0.0] - 2023-04-16¶
Changed¶
allow the usage of
Hamming
for different string lengths. Length differences are handled as insertions / deletionsremove support for boolean preprocessor functions in
rapidfuzz.fuzz
andrapidfuzz.process
. The processor argument is now always a callable orNone
.update defaults of the processor argument to be
None
everywhere. For affected functions this can change results, since strings are no longer preprocessed. To get back the old behaviour passprocessor=utils.default_process
to these functions. The following functions are affected by this:process.extract
,process.extract_iter
,process.extractOne
fuzz.token_sort_ratio
,fuzz.token_set_ratio
,fuzz.token_ratio
,fuzz.partial_token_sort_ratio
,fuzz.partial_token_set_ratio
,fuzz.partial_token_ratio
,fuzz.WRatio
,fuzz.QRatio
rapidfuzz.process
no longer calls scorers withprocessor=None
. For this reason user provided scorers no longer require this argument.remove option to pass keyword arguments to scorer via
**kwargs
inrapidfuzz.process
. They can be passed via ascorer_kwargs
argument now. This ensures this does not break when extending function parameters and prevents naming clashes.remove
rapidfuzz.string_metric
module. Replacements for all functions are available inrapidfuzz.distance
Added¶
added support for arbitrary hashable sequence in the pure Python fallback implementation of all functions in
rapidfuzz.distance
added support for
None
andfloat("nan")
inprocess.cdist
as long as the underlying scorer supports it. This is the case for all scorers returning normalized results.
Fixed¶
fix division by zero in simd implementation of normalized metrics leading to incorrect results
[2.15.1] - 2023-04-11¶
Fixed¶
fix incorrect tag dispatching implementation leading to AVX2 instructions in the SSE2 code path
Added¶
add wheels for windows arm64
[2.15.0] - 2023-04-01¶
Changed¶
allow the usage of finite generators as choices in
process.extract
[2.14.0] - 2023-03-31¶
Changed¶
upgrade required Cython version to
Cython==3.0.0b2
Fixed¶
fix handling of non symmetric scorers in pure python version of
process.cdist
fix default dtype handling when using
process.cdist
with pure python scorers
[2.13.7] - 2022-12-20¶
Fixed¶
fix function signature of
get_requires_for_build_wheel
[2.13.6] - 2022-12-11¶
Changed¶
reformat changelog as restructured text to get rig of
m2r2
dependency
[2.13.5] - 2022-12-11¶
Added¶
added docs to sdist
Fixed¶
fix two cases of undefined behavior in
process.cdist
[2.13.4] - 2022-12-08¶
Changed¶
handle
float("nan")
similar toNone
for query / choice, since this is common for non-existent data in tools like numpy
Fixed¶
fix handling on
None
/float("nan")
inprocess.distance
use absolute imports inside tests
[2.13.3] - 2022-12-03¶
Fixed¶
improve handling of functions wrapped using
functools.wraps
fix broken fallback to Python implementation when the a
ImportError
occurs on import. This can e.g. occur when the binary has a dependency on libatomic, but it is unavailable on the systemdefine
CMAKE_C_COMPILER_AR
/CMAKE_CXX_COMPILER_AR
/CMAKE_C_COMPILER_RANLIB
/CMAKE_CXX_COMPILER_RANLIB
if they are not defined yet
[2.13.2] - 2022-11-05¶
Fixed¶
fix incorrect results in
Hamming.normalized_similarity
fix incorrect score_cutoff handling in pure python implementation of
Postfix.normalized_distance
andPrefix.normalized_distance
fix
Levenshtein.normalized_similarity
andLevenshtein.normalized_distance
when used in combination with the process modulefuzz.partial_ratio
was not always symmetric whenlen(s1) == len(s2)
[2.13.1] - 2022-11-02¶
Fixed¶
fix bug in
normalized_similarity
of most scorers, leading to incorrect results when used in combination with the process modulefix sse2 support
fix bug in
JaroWinkler
andJaro
when used in the pure python process moduleforward kwargs in pure Python implementation of
process.extract
[2.13.0] - 2022-10-30¶
Fixed¶
fix bug in
Levenshtein.editops
leading to crashes when used withscore_hint
Changed¶
moved capi from
rapidfuzz_capi
intorapidfuzz
, since it will always succeed the installation now that there is a pure Python modeadd
score_hint
argument to process moduleadd
score_hint
argument to Levenshtein module
[2.12.0] - 2022-10-24¶
Changed¶
drop support for Python 3.6
Added¶
added
Prefix
/Suffix
similarity
Fixed¶
fixed packaging with pyinstaller
[2.11.1] - 2022-10-05¶
Fixed¶
Fix segmentation fault in
process.cdist
when used with an empty query sequence
[2.11.0] - 2022-10-02¶
Changes¶
move jarowinkler dependency into rapidfuzz to simplify maintenance
Performance¶
add SIMD implementation for
fuzz.ratio
/fuzz.QRatio
/Levenshtein
/Indel
/LCSseq
/OSA
to improve performance for short strings in cdist
[2.10.3] - 2022-09-30¶
Fixed¶
use
scikit-build=0.14.1
on Linux, sincescikit-build=0.15.0
fails to find the Python Interpreterworkaround gcc in bug in template type deduction
[2.10.2] - 2022-09-27¶
Fixed¶
fix support for cmake versions below 3.17
[2.10.1] - 2022-09-25¶
Changed¶
modernize cmake build to fix most conda-forge builds
[2.10.0] - 2022-09-18¶
Added¶
add editops to hamming distance
Performance¶
strip common affix in osa distance
Fixed¶
ignore missing pandas in Python 3.11 tests
[2.9.0] - 2022-09-16¶
Added¶
add optimal string alignment (OSA)
[2.8.0] - 2022-09-11¶
Fixed¶
fuzz.partial_ratio
did not find the optimal alignment in some edge cases (#219)
Performance¶
improve performance of
fuzz.partial_ratio
Changed¶
increased minimum C++ version to C++17 (see #255)
[2.7.0] - 2022-09-11¶
Performance¶
improve performance of
Levenshtein.distance
/Levenshtein.editops
for long sequences.
Added¶
add
score_hint
parameter toLevenshtein.editops
which allows the use of a faster implementation
Changed¶
all functions in the
string_metric
module do now raise a deprecation warning. They are now only wrappers for their replacement functions, which makes them slower when used with the process module
[2.6.1] - 2022-09-03¶
Fixed¶
fix incorrect results of partial_ratio for long needles (#257)
[2.6.0] - 2022-08-20¶
Fixed¶
fix hashing for custom classes
Added¶
add support for slicing in
Editops.__getitem__
/Editops.__delitem__
add
DamerauLevenshtein
module
[2.5.0] - 2022-08-14¶
Added¶
added support for KeyboardInterrupt in processor module It might still take a bit until the KeyboardInterrupt is registered, but no longer runs all text comparisons after pressing
Ctrl + C
Fixed¶
fix default scorer used by cdist to use C++ implementation if possible
[2.4.4] - 2022-08-12¶
Changed¶
Added support for Python 3.11
[2.4.3] - 2022-08-08¶
Fixed¶
fix value range of
jaro_similarity
/jaro_winkler_similarity
in the pure Python mode for the string_metric modulefix missing atomix symbol on arm 32 bit
[2.4.2] - 2022-07-30¶
Fixed¶
add missing symbol to pure Python which made the usage impossible
[2.4.1] - 2022-07-29¶
Fixed¶
fix version number
[2.4.0] - 2022-07-29¶
Fixed¶
fix banded Levenshtein implementation
Performance¶
improve performance and memory usage of
Levenshtein.editops
memory usage is reduced from O(NM) to O(N)
performance is improved for long sequences
[2.3.0] - 2022-07-23¶
Added¶
add
as_matching_blocks
toEditops
/Opcodes
add support for deletions from
Editops
add
Editops.apply
/Opcodes.apply
add
Editops.remove_subsequence
Changed¶
merge adjacent similar blocks in
Opcodes
Fixed¶
fix usage of
eval(repr(Editop))
,eval(repr(Editops))
,eval(repr(Opcode))
andeval(repr(Opcodes))
fix opcode conversion for empty source sequence
fix validation for empty Opcode list passed into
Opcodes.__init__
[2.2.0] - 2022-07-19¶
Changed¶
added in-tree build backend to install cmake and ninja only when it is not installed yet and only when wheels are available
[2.1.4] - 2022-07-17¶
Changed¶
changed internal implementation of cdist to remove build dependency to numpy
Added¶
added wheels for musllinux and manylinux ppc64le, s390x
[2.1.3] - 2022-07-09¶
Fixed¶
fix missing type stubs
[2.1.2] - 2022-07-04¶
Changed¶
change src layout to make package import from root directory possible
[2.1.1] - 2022-06-30¶
Changed¶
allow installation without the C++ extension if it fails to compile
allow selection of implementation via the environment variable
RAPIDFUZZ_IMPLEMENTATION
which can be set to “cpp” or “python”
[2.1.0] - 2022-06-29¶
Added¶
added pure python fallback for all implementations with the following exceptions:
no support for sequences of hashables. Only strings supported so far
\*.editops
/\*.opcodes
functions not implemented yetprocess.cdist does not support multithreading
Fixed¶
fuzz.partial_ratio_alignment ignored the score_cutoff
fix implementation of Hamming.normalized_similarity
fix default score_cutoff of Hamming.similarity
fix implementation of LCSseq.distance when used in the process module
treat hash for -1 and -2 as different
[2.0.15] - 2022-06-24¶
Fixed¶
fix integer wraparound in partial_ratio/partial_ratio_alignment
[2.0.14] - 2022-06-23¶
Fixed¶
fix unlimited recursion in LCSseq when used in combination with the process module
Changed¶
add fallback implementations of
taskflow
,rapidfuzz-cpp
andjarowinkler-cpp
back to wheel, since some package building systems like piwheels can’t clone sources
[2.0.13] - 2022-06-22¶
Changed¶
use system version of cmake on arm platforms, since the cmake package fails to compile
[2.0.12] - 2022-06-22¶
Changed¶
add tests to sdist
remove cython dependency for sdist
[2.0.11] - 2022-04-23¶
Changed¶
relax version requirements of dependencies to simplify packaging
[2.0.10] - 2022-04-17¶
Fixed¶
Do not include installations of jaro_winkler in wheels (regression from 2.0.7)
Changed¶
Allow installation from system installed versions of
rapidfuzz-cpp
,jarowinkler-cpp
andtaskflow
Added¶
Added PyPy3.9 wheels on Linux
[2.0.9] - 2022-04-07¶
Fixed¶
Add missing Cython code in sdist
consider float imprecision in score_cutoff (see #210)
[2.0.8] - 2022-04-07¶
Fixed¶
fix incorrect score_cutoff handling in token_set_ratio and token_ratio
Added¶
add longest common subsequence
[2.0.7] - 2022-03-13¶
Fixed¶
Do not include installations of jaro_winkler and taskflow in wheels
[2.0.6] - 2022-03-06¶
Fixed¶
fix incorrect population of sys.modules which lead to submodules overshadowing other imports
Changed¶
moved JaroWinkler and Jaro into a separate package
[2.0.5] - 2022-02-25¶
Fixed¶
fix signed integer overflow inside hashmap implementation
[2.0.4] - 2022-02-21¶
Fixed¶
fix binary size increase due to debug symbols
fix segmentation fault in
Levenshtein.editops
[2.0.3] - 2022-02-18¶
Added¶
Added fuzz.partial_ratio_alignment, which returns the result of fuzz.partial_ratio combined with the alignment this result stems from
Fixed¶
Fix Indel distance returning incorrect result when using score_cutoff=1, when the strings are not equal. This affected other scorers like fuzz.WRatio, which use the Indel distance as well.
[2.0.2] - 2022-02-12¶
Fixed¶
fix type hints
Add back transpiled cython files to the sdist to simplify builds in package builders like FreeBSD port build or conda-forge
[2.0.1] - 2022-02-11¶
Fixed¶
fix type hints
Indel.normalized_similarity mistakenly used the implementation of Indel.normalized_distance
[2.0.0] - 2022-02-09¶
Added¶
added C-Api which can be used to extend RapidFuzz from different Python modules using any programming language which allows the usage of C-Apis (C/C++/Rust)
added new scorers in
rapidfuzz.distance.*
port existing distances to this new api
add Indel distance along with the corresponding editops function
Changed¶
when the result of
string_metric.levenshtein
orstring_metric.hamming
is below max they do now returnmax + 1
instead of -1Build system moved from setuptools to scikit-build
Stop including all modules in __init__.py, since they significantly slowed down import time
Removed¶
remove the
rapidfuzz.levenshtein
module which was deprecated in v1.0.0 and scheduled for removal in v2.0.0dropped support for Python2.7 and Python3.5
Deprecated¶
deprecate support to specify processor in form of a boolean (will be removed in v3.0.0)
new functions will not get support for this in the first place
deprecate
rapidfuzz.string_metric
(will be removed in v3.0.0). Similar scorers are available inrapidfuzz.distance.*
Fixed¶
process.cdist did raise an exception when used with a pure python scorer
Performance¶
improve performance and memory usage of
rapidfuzz.string_metric.levenshtein_editops
memory usage is reduced by 33%
performance is improved by around 10%-20%
significantly improve performance of
rapidfuzz.string_metric.levenshtein
formax <= 31
using a banded implementation
[1.9.1] - 2021-12-13¶
Fixed¶
fix bug in new editops implementation, causing it to SegFault on some inputs (see qurator-spk/dinglehopper#64)
[1.9.0] - 2021-12-11¶
Fixed¶
Fix some issues in the type annotations (see #163)
Performance¶
improve performance and memory usage of
rapidfuzz.string_metric.levenshtein_editops
memory usage is reduced by 10x
performance is improved from
O(N * M)
toO([N / 64] * M)
[1.8.3] - 2021-11-19¶
Added¶
Added missing wheels for Python3.6 on MacOs and Windows (see #159)
[1.8.2] - 2021-10-27¶
Added¶
Add wheels for Python 3.10 on MacOs
[1.8.1] - 2021-10-22¶
Fixed¶
Fix incorrect editops results (See #148)
[1.8.0] - 2021-10-20¶
Changed¶
Add Wheels for Python3.10 on all platforms except MacOs (see #141)
Improve performance of
string_metric.jaro_similarity
andstring_metric.jaro_winkler_similarity
for strings with a length <= 64
[1.7.1] - 2021-10-02¶
Fixed¶
fixed incorrect results of fuzz.partial_ratio for long needles (see #138)
[1.7.0] - 2021-09-27¶
Changed¶
Added typing for process.cdist
Added multithreading support to cdist using the argument
process.cdist
Add dtype argument to
process.cdist
to set the dtype of the result numpy array (see #132)Use a better hash collision strategy in the internal hashmap, which improves the worst case performance
[1.6.2] - 2021-09-15¶
Changed¶
improved performance of fuzz.ratio
only import process.cdist when numpy is available
[1.6.1] - 2021-09-11¶
Changed¶
Add back wheels for Python2.7
[1.6.0] - 2021-09-10¶
Changed¶
fuzz.partial_ratio uses a new implementation for short needles (<= 64). This implementation is
more accurate than the current implementation (it is guaranteed to find the optimal alignment)
it is significantly faster
Add process.cdist to compare all elements of two lists (see #51)
[1.5.1] - 2021-09-01¶
Fixed¶
Fix out of bounds access in levenshtein_editops
[1.5.0] - 2021-08-21¶
Changed¶
all scorers do now support similarity/distance calculations between any sequence of hashables. So it is possible to calculate e.g. the WER as: .. code-block:
>>> string_metric.levenshtein(["word1", "word2"], ["word1", "word3"]) 1
Added¶
Added type stub files for all functions
added jaro similarity in
string_metric.jaro_similarity
added jaro winkler similarity in
string_metric.jaro_winkler_similarity
added Levenshtein editops in
string_metric.levenshtein_editops
Fixed¶
Fixed support for set objects in
process.extract
Fixed inconsistent handling of empty strings
[1.4.1] - 2021-03-30¶
Performance¶
improved performance of result creation in process.extract
Fixed¶
Cython ABI stability issue (#95)
fix missing decref in case of exceptions in process.extract
[1.4.0] - 2021-03-29¶
Changed¶
added processor support to
levenshtein
andhamming
added distance support to extract/extractOne/extract_iter
Fixed¶
incorrect results of
normalized_hamming
andnormalized_levenshtein
when used withutils.default_process
as processor
[1.3.3] - 2021-03-20¶
Fixed¶
Fix a bug in the mbleven implementation of the uniform Levenshtein distance and cover it with fuzz tests
[1.3.2] - 2021-03-20¶
Fixed¶
some of the newly activated warnings caused build failures in the conda-forge build
[1.3.1] - 2021-03-20¶
Fixed¶
Fixed issue in LCS calculation for partial_ratio (see #90)
Fixed incorrect results for normalized_hamming and normalized_levenshtein when the processor
utils.default_process
is usedFix many compiler warnings
[1.3.0] - 2021-03-16¶
Changed¶
add wheels for a lot of new platforms
drop support for Python 2.7
Performance¶
use
is
instead of==
to compare functions directly by address
Fixed¶
Fix another ref counting issue
Fix some issues in the Levenshtein distance algorithm (see #92)
[1.2.1] - 2021-03-08¶
Performance¶
further improve bitparallel implementation of uniform Levenshtein distance for strings with a length > 64 (in many cases more than 50% faster)
[1.2.0] - 2021-03-07¶
Changed¶
add more benchmarks to documentation
Performance¶
add bitparallel implementation to InDel Distance (Levenshtein with the weights 1,1,2) for strings with a length > 64
improve bitparallel implementation of uniform Levenshtein distance for strings with a length > 64
use the InDel Distance and uniform Levenshtein distance in more cases instead of the generic implementation
Directly use the Levenshtein implementation in C++ instead of using it through Python in process.*
[1.1.2] - 2021-03-03¶
Fixed¶
Fix reference counting in process.extract (see #81)
[1.1.1] - 2021-02-23¶
Fixed¶
Fix result conversion in process.extract (see #79)
[1.1.0] - 2021-02-21¶
Changed¶
string_metric.normalized_levenshtein supports now all weights
when different weights are used for Insertion and Deletion the strings are not swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported.
replace C++ implementation with a Cython implementation. This has the following advantages:
The implementation is less error prone, since a lot of the complex things are done by Cython
slightly faster than the current implementation (up to 10% for some parts)
about 33% smaller binary size
reduced compile time
Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer
Add max argument to hamming distance
Add support for whole Unicode range to utils.default_process
Performance¶
replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
[1.0.2] - 2021-02-19¶
Fixed¶
The bitparallel LCS algorithm in fuzz.partial_ratio did not find the longest common substring properly in some cases. The old algorithm is used again until this bug is fixed.
[1.0.1] - 2021-02-17¶
Changed¶
string_metric.normalized_levenshtein supports now the weights (1, 1, N) with N >= 1
Performance¶
The Levenshtein distance with the weights (1, 1, >2) do now use the same implementation as the weight (1, 1, 2), since
Substitution > Insertion + Deletion
has no effect
Fixed¶
fix uninitialized variable in bitparallel Levenshtein distance with the weight (1, 1, 1)
[1.0.0] - 2021-02-12¶
Changed¶
all normalized string_metrics can now be used as scorer for process.extract/extractOne
Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
improved docstrings of functions
Performance¶
Added bit-parallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bit-parallel implementation.
Improved performance of
fuzz.partial_ratio
-> Sincefuzz.ratio
andfuzz.partial_ratio
are used in most scorers, this improves the overall performance.Improved performance of
process.extract
andprocess.extractOne
Deprecated¶
the
rapidfuzz.levenshtein
module is now deprecated and will be removed in v2.0.0 These functions are now placed inrapidfuzz.string_metric
.distance
,normalized_distance
,weighted_distance
andweighted_normalized_distance
are combined intolevenshtein
andnormalized_levenshtein
.
Added¶
added normalized version of the hamming distance in
string_metric.normalized_hamming
process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff
Fixed¶
multiple bugs in extractOne when used with a scorer, that’s not from RapidFuzz
fixed bug in
token_ratio
fixed bug in result normalization causing zero division
[0.14.2] - 2020-12-31¶
Fixed¶
utf8 usage in the copyright header caused problems with python2.7 on some platforms (see #70)
[0.14.1] - 2020-12-13¶
Fixed¶
when a custom processor like
lambda s: s
was used with any of the methods inside fuzz.* it always returned a score of 100. This release fixes this and adds a better test coverage to prevent this bug in the future.
[0.14.0] - 2020-12-09¶
Added¶
added hamming distance metric in the levenshtein module
Performance¶
improved performance of default_process by using lookup table
[0.13.4] - 2020-11-30¶
Fixed¶
Add missing virtual destructor that caused a segmentation fault on Mac Os
[0.13.3] - 2020-11-21¶
Added¶
C++11 Support
manylinux wheels
[0.13.2] - 2020-11-21¶
Fixed¶
Levenshtein was not imported from __init__
The reference count of a Python Object inside process.extractOne was decremented to early
[0.13.1] - 2020-11-17¶
Performance¶
process.extractOne exits early when a score of 100 is found. This way the other strings do not have to be preprocessed anymore.
[0.13.0] - 2020-11-16¶
Fixed¶
string objects passed to scorers had to be strings even before preprocessing them. This was changed, so they only have to be strings after preprocessing similar to process.extract/process.extractOne
Performance¶
process.extractOne is now implemented in C++ making it a lot faster
When token_sort_ratio or partial_token_sort ratio is used inprocess.extractOne the words in the query are only sorted once to improve the runtime
Changed¶
process.extractOne/process.extract do now return the index of the match, when the choices are a list.
Removed¶
process.extractIndices got removed, since the indices are now already returned by process.extractOne/process.extract
[0.12.5] - 2020-10-26¶
Fixed¶
fix documentation of process.extractOne (see #48)
[0.12.4] - 2020-10-22¶
Added¶
Added wheels for
CPython 2.7 on windows 64 bit
CPython 2.7 on windows 32 bit
PyPy 2.7 on windows 32 bit
[0.12.3] - 2020-10-09¶
Fixed¶
fix bug in partial_ratio (see #43)
[0.12.2] - 2020-10-01¶
Fixed¶
fix inconsistency with fuzzywuzzy in partial_ratio when using strings of equal length
[0.12.1] - 2020-09-30¶
Fixed¶
MSVC has a bug and therefore crashed on some of the templates used. This Release simplifies the templates so compiling on msvc works again
[0.12.0] - 2020-09-30¶
Performance¶
partial_ratio is using the Levenshtein distance now, which is a lot faster. Since many of the other algorithms use partial_ratio, this helps to improve the overall performance
[0.11.3] - 2020-09-22¶
Fixed¶
fix partial_token_set_ratio returning 100 all the time
[0.11.2] - 2020-09-12¶
Added¶
added rapidfuzz.__author__, rapidfuzz.__license__ and rapidfuzz.__version__
[0.11.1] - 2020-09-01¶
Fixed¶
do not use auto junk when searching the optimal alignment for partial_ratio
[0.11.0] - 2020-08-22¶
Changed¶
support for python 2.7 added #40
add wheels for python2.7 (both pypy and cpython) on MacOS and Linux
[0.10.0] - 2020-08-17¶
Changed¶
added wheels for Python3.9
Fixed¶
tuple scores in process.extractOne are now supported #39