• Stars
    star
    101
  • Rank 338,166 (Top 7 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created about 12 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Density Based Clustering (DeBaCl) Toolbox

DeBaCl: DEnsity-BAsed CLustering

Travis CI Pending Pull-Requests Github Issues License Docs

DeBaCl is a Python library for density-based clustering with level set trees.

Level set trees are a statistically-principled way to represent the topology of a probability density function. This representation is particularly useful for several core tasks in statistics:

  • clustering, especially for data with multi-scale clustering behavior
  • describing data topology
  • exploratory data analysis
  • data visualization
  • anomaly detection

DeBaCl is a Python implementation of the Level Set Tree method, with an emphasis on computational speed, algorithmic simplicity, and extensibility.

License

DeBaCl is available under the 3-clause BSD license.

Installation

DeBaCl is currently compatible with Python 2.7 only. Other versions may work, but caveat emptor; at this time DeBaCl is only officially tested on Python 2.7. The package can be downloaded and installed from the Python package installer. From a terminal:

$ pip install debacl

It can also be installed by cloning this GitHub repo. This requires updating the Python path to include the cloned repo. On linux, this looks something like:

$ git clone https://github.com/CoAxLab/DeBaCl/
$ export PYTHONPATH='DeBaCl'

Dependencies

All of the dependencies are Python packages that can be installed with either conda or pip. DeBaCl 1.0 no longer depends on igraph, which required tricky manual installation.

Langauges:

  • Python 2.7
  • (coming soon: Python 3.4)

Required packages:

  • numpy
  • networkx
  • prettytable

Strongly recommended packages

  • matplotlib
  • scipy

Optional packages

  • scikit-learn

Quickstart

Construct the level set tree

```python import debacl as dcl from sklearn.datasets import make_moons

X = make_moons(n_samples=100, noise=0.1, random_state=19)[0]

tree = dcl.construct_tree(X, k=10, prune_threshold=10) print tree

```no-highlight
+----+-------------+-----------+------------+----------+------+--------+----------+
| id | start_level | end_level | start_mass | end_mass | size | parent | children |
+----+-------------+-----------+------------+----------+------+--------+----------+
| 0  |    0.000    |   0.196   |   0.000    |  0.220   | 100  |  None  |  [1, 2]  |
| 1  |    0.196    |   0.396   |   0.220    |  0.940   |  37  |   0    |    []    |
| 2  |    0.196    |   0.488   |   0.220    |  1.000   |  41  |   0    |    []    |
+----+-------------+-----------+------------+----------+------+--------+----------+

Plot the level set tree

Clusters are represented by the vertical line segments in the dendrogram. In this example the vertical axis is plotted on the _density_ scale, so that the lower endpoint of a cluster's branch is at its _start_level_ and the upper endpoint is at its _end_level_ (see the table above), and the length of the branch is the _persistence_ of the cluster. ```python fig = tree.plot(form='density')[0] fig.show() ```

Query the level set tree for cluster labels

```python import matplotlib.pyplot as plt

labels = tree.get_clusters(method='leaf') # each leaf node is a cluster clusters = X[labels[:, 0], :]

fig, ax = plt.subplots() ax.scatter(X[:, 0], X[:, 1], c='black', s=40, alpha=0.4) ax.scatter(clusters[:, 0], clusters[:, 1], c=labels[:, 1], s=80, alpha=0.9, cmap=plt.cm.winter) ax.set_xlabel('x0') ax.set_ylabel('x1', rotation=0) fig.show()

<!--![Clusters](docs/readme_clusters.png)-->
<img src="docs/readme_clusters.png" height="480px" />


Documentation
-------------
- [API documentation](http://debacl.readthedocs.io/en/master/)
- [Example jupyter notebooks](examples) (in progress)

Running unit tests
------------------
From the top level of the repo:

```bash
$ nosetests -s -v debacl/test

References

More Repositories

1

DataSciencePsychNeuro

Repository for the Data Science for Psychology and Neuroscience course at CMU (Verstynen)
Jupyter Notebook
53
star
2

pycombat

Python implementation of Combat for data harmonisation, allowing also to remove unwanted effects
Jupyter Notebook
13
star
3

AdaptiveDecisionMaking_2018

Repository for code and lab resources for "Neural and Cognitive Models of Adaptive Decision Making" course (2018)
Jupyter Notebook
11
star
4

azad

Game-playing ANNs that use a stumbler-strategist architecture.
Jupyter Notebook
6
star
5

radd

RADD demo code and docs
Jupyter Notebook
5
star
6

CBGT

Spiking Neural Network of CBGT pathways
C
5
star
7

glia_playing_atari

Artificial Glia Networks.
Jupyter Notebook
4
star
8

BiologicallyIntelligentExploration

The repository for the CMU class Biologically Intelligent eXploration (BIX).
Jupyter Notebook
3
star
9

infomercial

Agents who seek information and reward in equal measure.
Jupyter Notebook
3
star
10

niphlem

NeuroImaging-oriented Physiological Log Extraction for Modeling
Python
3
star
11

CBGTPy

Rewrite of a basal ganglia simulator by Matthew Clapp
Python
3
star
12

multimodal-predict-cognition

Jupyter Notebook
3
star
13

explorationlib

Tools to simulate biological exploration.
Jupyter Notebook
3
star
14

BIX-book

Jupyter book for Biologically Intelligent eXploration
Jupyter Notebook
2
star
15

bDNN

Biological Deep Neural Network
Jupyter Notebook
2
star
16

AdaptiveCBGT

Spiking Neural Network of CBGT pathways
Jupyter Notebook
2
star
17

BRAVO2

BRAVO 2.0
MATLAB
2
star
18

newremagine

New experiences, replay and imagination, titrated, in training
Python
1
star
19

LabHacks-1

1
star