PyClustering
pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each algorithm or model. C++ pyclustering library is a part of pyclustering and supported for Linux, Windows and MacOS operating systems.
Version: 0.11.dev
License: The 3-Clause BSD License
E-Mail: [email protected]
Documentation: https://pyclustering.github.io/docs/0.10.1/html/
Homepage: https://pyclustering.github.io/
PyClustering Wiki: https://github.com/annoviko/pyclustering/wiki
Dependencies
Required packages: scipy, matplotlib, numpy, Pillow
Python version: >=3.6 (32-bit, 64-bit)
C++ version: >= 14 (32-bit, 64-bit)
Performance
Each algorithm is implemented using Python and C/C++ language, if your platform is not supported then Python implementation is used, otherwise C/C++. Implementation can be chosen by ccore flag (by default it is always 'True' and it means that C/C++ is used), for example:
# As by default - C/C++ part of the library is used
xmeans_instance_1 = xmeans(data_points, start_centers, 20, ccore=True);
# The same - C/C++ part of the library is used by default
xmeans_instance_2 = xmeans(data_points, start_centers, 20);
# Switch off core - Python is used
xmeans_instance_3 = xmeans(data_points, start_centers, 20, ccore=False);
Installation
Installation using pip3 tool:
$ pip3 install pyclustering
Manual installation from official repository using Makefile:
# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .
# compile CCORE library (core of the pyclustering library).
$ cd ccore/
$ make ccore_64bit # build for 64-bit OS
# $ make ccore_32bit # build for 32-bit OS
# return to parent folder of the pyclustering library
$ cd ../
# install pyclustering library
$ python3 setup.py install
# optionally - test the library
$ python3 setup.py test
Manual installation using CMake:
# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .
# generate build files.
$ mkdir build
$ cmake ..
# build pyclustering-shared target depending on what was generated (Makefile or MSVC solution)
# if Makefile has been generated then
$ make pyclustering-shared
# return to parent folder of the pyclustering library
$ cd ../
# install pyclustering library
$ python3 setup.py install
# optionally - test the library
$ python3 setup.py test
Manual installation using Microsoft Visual Studio solution:
- Clone repository from: https://github.com/annoviko/pyclustering.git
- Open folder pyclustering/ccore
- Open Visual Studio project ccore.sln
- Select solution platform: x86 or x64
- Build pyclustering-shared project.
- Add pyclustering folder to python path or install it using setup.py
# install pyclustering library
$ python3 setup.py install
# optionally - test the library
$ python3 setup.py test
Proposals, Questions, Bugs
In case of any questions, proposals or bugs related to the pyclustering please contact to [email protected] or create an issue here.
PyClustering Status
Branch | master | 0.10.dev | 0.10.1.rel |
---|---|---|---|
Build (Linux, MacOS) | |||
Build (Win) | |||
Code Coverage |
Cite the Library
If you are using pyclustering library in a scientific paper, please, cite the library:
Novikov, A., 2019. PyClustering: Data Mining Library. Journal of Open Source Software, 4(36), p.1230. Available at: http://dx.doi.org/10.21105/joss.01230.
BibTeX entry:
@article{Novikov2019, doi = {10.21105/joss.01230}, url = {https://doi.org/10.21105/joss.01230}, year = 2019, month = {apr}, publisher = {The Open Journal}, volume = {4}, number = {36}, pages = {1230}, author = {Andrei Novikov}, title = {{PyClustering}: Data Mining Library}, journal = {Journal of Open Source Software} }
Brief Overview of the Library Content
Clustering algorithms and methods (module pyclustering.cluster):
Algorithm | Python | C++ |
---|---|---|
Agglomerative | ✓ | ✓ |
BANG | ✓ |  |
BIRCH | ✓ |  |
BSAS | ✓ | ✓ |
CLARANS | ✓ |  |
CLIQUE | ✓ | ✓ |
CURE | ✓ | ✓ |
DBSCAN | ✓ | ✓ |
Elbow | ✓ | ✓ |
EMA | ✓ |  |
Fuzzy C-Means | ✓ | ✓ |
GA (Genetic Algorithm) | ✓ | ✓ |
G-Means | ✓ | ✓ |
HSyncNet | ✓ | ✓ |
K-Means | ✓ | ✓ |
K-Means++ | ✓ | ✓ |
K-Medians | ✓ | ✓ |
K-Medoids | ✓ | ✓ |
MBSAS | ✓ | ✓ |
OPTICS | ✓ | ✓ |
ROCK | ✓ | ✓ |
Silhouette | ✓ | ✓ |
SOM-SC | ✓ | ✓ |
SyncNet | ✓ | ✓ |
Sync-SOM | ✓ |  |
TTSAS | ✓ | ✓ |
X-Means | ✓ | ✓ |
Oscillatory networks and neural networks (module pyclustering.nnet):
Model | Python | C++ |
---|---|---|
CNN (Chaotic Neural Network) | ✓ |  |
fSync (Oscillatory network based on Landau-Stuart equation and Kuramoto model) | ✓ |  |
HHN (Oscillatory network based on Hodgkin-Huxley model) | ✓ | ✓ |
Hysteresis Oscillatory Network | ✓ |  |
LEGION (Local Excitatory Global Inhibitory Oscillatory Network) | ✓ | ✓ |
PCNN (Pulse-Coupled Neural Network) | ✓ | ✓ |
SOM (Self-Organized Map) | ✓ | ✓ |
Sync (Oscillatory network based on Kuramoto model) | ✓ | ✓ |
SyncPR (Oscillatory network for pattern recognition) | ✓ | ✓ |
SyncSegm (Oscillatory network for image segmentation) | ✓ | ✓ |
Graph Coloring Algorithms (module pyclustering.gcolor):
Algorithm | Python | C++ |
---|---|---|
DSatur | ✓ |  |
Hysteresis | ✓ |  |
GColorSync | ✓ |  |
Containers (module pyclustering.container):
Algorithm | Python | C++ |
---|---|---|
KD Tree | ✓ | ✓ |
CF Tree | ✓ |  |
Examples in the Library
The library contains examples for each algorithm and oscillatory network model:
Clustering examples: pyclustering/cluster/examples
Graph coloring examples: pyclustering/gcolor/examples
Oscillatory network examples: pyclustering/nnet/examples
Code Examples
Data clustering by CURE algorithm
from pyclustering.cluster import cluster_visualizer;
from pyclustering.cluster.cure import cure;
from pyclustering.utils import read_sample;
from pyclustering.samples.definitions import FCPS_SAMPLES;
# Input data in following format [ [0.1, 0.5], [0.3, 0.1], ... ].
input_data = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);
# Allocate three clusters.
cure_instance = cure(input_data, 3);
cure_instance.process();
clusters = cure_instance.get_clusters();
# Visualize allocated clusters.
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, input_data);
visualizer.show();
Data clustering by K-Means algorithm
from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample
# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)
# Prepare initial centers using K-Means++ method.
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()
# Create instance of K-Means algorithm with prepared centers.
kmeans_instance = kmeans(sample, initial_centers)
# Run cluster analysis and obtain results.
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()
# Visualize obtained results
kmeans_visualizer.show_clusters(sample, clusters, final_centers)
Data clustering by OPTICS algorithm
from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.optics import optics, ordering_analyser, ordering_visualizer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample
# Read sample for clustering from some file
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN)
# Run cluster analysis where connectivity radius is bigger than real
radius = 2.0
neighbors = 3
amount_of_clusters = 3
optics_instance = optics(sample, radius, neighbors, amount_of_clusters)
# Performs cluster analysis
optics_instance.process()
# Obtain results of clustering
clusters = optics_instance.get_clusters()
noise = optics_instance.get_noise()
ordering = optics_instance.get_ordering()
# Visualize ordering diagram
analyser = ordering_analyser(ordering)
ordering_visualizer.show_ordering_diagram(analyser, amount_of_clusters)
# Visualize clustering results
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.show()
Simulation of oscillatory network PCNN
from pyclustering.nnet.pcnn import pcnn_network, pcnn_visualizer
# Create Pulse-Coupled neural network with 10 oscillators.
net = pcnn_network(10)
# Perform simulation during 100 steps using binary external stimulus.
dynamic = net.simulate(50, [1, 1, 1, 0, 0, 0, 0, 1, 1, 1])
# Allocate synchronous ensembles from the output dynamic.
ensembles = dynamic.allocate_sync_ensembles()
# Show output dynamic.
pcnn_visualizer.show_output_dynamic(dynamic, ensembles)
Simulation of chaotic neural network CNN
from pyclustering.cluster import cluster_visualizer
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
from pyclustering.nnet.cnn import cnn_network, cnn_visualizer
# Load stimulus from file.
stimulus = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
# Create chaotic neural network, amount of neurons should be equal to amount of stimulus.
network_instance = cnn_network(len(stimulus))
# Perform simulation during 100 steps.
steps = 100
output_dynamic = network_instance.simulate(steps, stimulus)
# Display output dynamic of the network.
cnn_visualizer.show_output_dynamic(output_dynamic)
# Display dynamic matrix and observation matrix to show clustering phenomenon.
cnn_visualizer.show_dynamic_matrix(output_dynamic)
cnn_visualizer.show_observation_matrix(output_dynamic)
# Visualize clustering results.
clusters = output_dynamic.allocate_sync_ensembles(10)
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, stimulus)
visualizer.show()
Illustrations
Cluster allocation on FCPS dataset collection by DBSCAN:
Cluster allocation by OPTICS using cluster-ordering diagram:
Partial synchronization (clustering) in Sync oscillatory network:
Cluster visualization by SOM (Self-Organized Feature Map)