• Stars
    star
    135
  • Rank 267,751 (Top 6 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 8 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A comprehensive and scalable set of string tokenizers and similarity measures in Python

py_stringmatching

This project seeks to build a Python software package that consists of a comprehensive and scalable set of string tokenizers (such as alphabetical tokenizers, whitespace tokenizers) and string similarity measures (such as edit distance, Jaccard, TF/IDF). The package is free, open-source, and BSD-licensed.

Important links

Dependencies

py_stringmatching has been tested on each Python version between 3.7 and 3.11, inclusive.

The required dependencies to build the package are NumPy 1.7.0 or higher, Six, and a C or C++ compiler. For the development version, you will also need Cython.

Platforms

py_stringmatching has been tested on Linux, OS X and Windows. At this time we have only tested on x86 architecture.