• Stars
    star
    150
  • Rank 245,684 (Top 5 %)
  • Language
    R
  • Created about 10 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Clustering by fast search and find of density peaks

Clustering by fast search and find of density peaks

R-CMD-check Codecov test coverage CRAN_Release_Badge CRAN_Download_Badge

This package implement the clustering algorithm described by Alex Rodriguez and Alessandro Laio (2014). It provides the user with tools for generating the initial rho and delta values for each observation as well as using these to assign observations to clusters. This is done in two passes so the user is free to reassign observations to clusters using a new set of rho and delta thresholds, without needing to recalculate everything.

Plotting

Two types of plots are supported by this package, and both mimics the types of plots used in the publication for the algorithm. The standard plot function produces a decision plot, with optional colouring of cluster peaks if these are assigned. Furthermore plotMDS() performs a multidimensional scaling of the distance matrix and plots this as a scatterplot. If clusters are assigned observations are coloured according to their assignment.

Cluster detection

The two main functions for this package are densityClust() and findClusters(). The former takes a distance matrix and optionally a distance cutoff and calculates rho and delta for each observation. The latter takes the output of densityClust() and make cluster assignment for each observation based on a user defined rho and delta threshold. If the thresholds are not specified the user is able to supply them interactively by clicking on a decision plot.

Usage

library(densityClust)
irisDist <- dist(iris[,1:4])
irisClust <- densityClust(irisDist, gaussian=TRUE)
#> Distance cutoff calculated to 0.2767655
plot(irisClust) # Inspect clustering attributes to define thresholds

irisClust <- findClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)

split(iris[,5], irisClust$clusters)
#> $`1`
#>  [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [11] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [21] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [31] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [41] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> Levels: setosa versicolor virginica
#> 
#> $`2`
#>   [1] versicolor versicolor versicolor versicolor versicolor versicolor
#>   [7] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [13] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [19] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [25] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [31] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [37] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [43] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [49] versicolor versicolor virginica  virginica  virginica  virginica 
#>  [55] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [61] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [67] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [73] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [79] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [85] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [91] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [97] virginica  virginica  virginica  virginica 
#> Levels: setosa versicolor virginica

Note that while the iris dataset contains information on three different species of iris, only two clusters are detected by the algorithm. This is because two of the species (versicolor and virginica) are not clearly seperated by their data.

Refences

Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. https://doi.org/10.1126/science.1242072

More Repositories

1

patchwork

The Composer of ggplots
R
2,431
star
2

gganimate

A Grammar of Animated Graphics
R
1,935
star
3

ggraph

Grammar of Graph Graphics
R
1,063
star
4

ggforce

Accelerating ggplot2
R
915
star
5

tidygraph

A tidy API for graph manipulation
R
545
star
6

ggplot2_workshop

Material for "Drawing Anything with ggplot2" workshop
491
star
7

lime

Local Interpretable Model-Agnostic Explanations (R port of original Python package)
R
480
star
8

scico

Palettes for R based on the Scientific Colour-Maps
R
409
star
9

tweenr

Interpolate your data
R
399
star
10

fiery

A flexible and lightweight web server
R
242
star
11

shinyFiles

A shiny extension for server side file access
JavaScript
195
star
12

ggfx

Filters and Shaders for 'ggplot2'
R
166
star
13

farver

High Performance Colourspace Manipulation in R
R
127
star
14

particles

A particle simulation engine based on a port of d3-force
R
118
star
15

transformr

Smooth Polygon Transformations
R
115
star
16

ambient

A Generator of Multidimensional Noise
R
93
star
17

euclid

Exact Computation Geometry Framework Based on 'CGAL'
C++
82
star
18

routr

Routing of Web Requests in R
R
55
star
19

hierarchicalSets

Scalable Set Visualization using Hierarchies
R
54
star
20

Hr

Easy Access to Uppercase H
R
53
star
21

reqres

Powerful classes for http requests and responses
R
36
star
22

curry

Partial Function Application with %<%, %-<%, and %><%
R
30
star
23

FindMyFriends

Fast alignment-free pangenome creation and exploration
R
27
star
24

pipeplotter

Syntactic ggplot2 Sugar for a Tidy World
R
26
star
25

fawkes

An R Interface to the AxiDraw plotter
R
25
star
26

pearls

Operations on Lists of Data Frames
R
18
star
27

PanVizGenerator

Create your own PanViz visualizations
R
18
star
28

ink

The Modern, High-Performant, Graphic Device for R
C++
17
star
29

ggplot2_mechanics

The Mechanics of ggplot2
TeX
16
star
30

plotting_benchmark

Investigating R graphics performance
HTML
16
star
31

grid

personal devel version of grid
R
15
star
32

PanViz

D3 based visualisation for comparative genomics
JavaScript
14
star
33

boundaries

Algorithms for Working With and Modifying Polygon Boundaries
C++
13
star
34

data_imaginist

data_imaginist source
HTML
12
star
35

nanodev

Graphic Devices for R based on NanoVG
C
10
star
36

MSGFgui

A gui overlay and extension for MSGFplus
R
10
star
37

orion

Spatial Searching for Euclid
C++
9
star
38

web_dev_in_R

Web Development for R Users
TeX
9
star
39

heroku-fiery-demo

A demo fiery application for deployment on Heroku
R
8
star
40

unmeshy

A Vector Based 3D Renderer
C++
7
star
41

MSsary

Mass spectrometry data in R
R
7
star
42

RcppSNAP

'Rcpp' Integration for the SNAP Network Library
C++
6
star
43

tidy_graph_analysis

Tidy Network Analysis in R
TeX
6
star
44

polyclid

Polygon Support for Euclid
C++
6
star
45

mzID

An mzIdentML parser for R
R
6
star
46

MSGFplus

An MSGF+ interface for R
R
6
star
47

thomasp85.github.io

The source for data-imaginist.com
HTML
6
star
48

mvpcran

CRAN on a stick
R
5
star
49

shady

Compile and Execute Shaders from R
C++
5
star
50

anomaly

Detecting those outliers
R
4
star
51

firedock

Dockerfiles for fiery
R
4
star
52

phd_dissertation

Pangenome Tools for Rapid, Large-Scale Analysis of Bacterial Genomes
TeX
4
star
53

d3Disco

A showcase for Shiny and D3 integration
JavaScript
4
star
54

pepmaps

R package for quantitative peptidomics
R
2
star
55

masochist

For some reason I’m doing all of this from my phone
R
2
star
56

firedock_test

R
1
star
57

Biotools

Scripts for CMG-Biotools
Perl
1
star
58

firesafety

Security for fiery apps
1
star
59

circosScripts

Perl scripts to automate circos plots
Perl
1
star
60

CHtools

A list of diverse functions for CH
R
1
star