• Stars
    star
    149
  • Rank 239,892 (Top 5 %)
  • Language
    R
  • Created almost 10 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Clustering by fast search and find of density peaks

Clustering by fast search and find of density peaks

R-CMD-check Codecov test coverage CRAN_Release_Badge CRAN_Download_Badge

This package implement the clustering algorithm described by Alex Rodriguez and Alessandro Laio (2014). It provides the user with tools for generating the initial rho and delta values for each observation as well as using these to assign observations to clusters. This is done in two passes so the user is free to reassign observations to clusters using a new set of rho and delta thresholds, without needing to recalculate everything.

Plotting

Two types of plots are supported by this package, and both mimics the types of plots used in the publication for the algorithm. The standard plot function produces a decision plot, with optional colouring of cluster peaks if these are assigned. Furthermore plotMDS() performs a multidimensional scaling of the distance matrix and plots this as a scatterplot. If clusters are assigned observations are coloured according to their assignment.

Cluster detection

The two main functions for this package are densityClust() and findClusters(). The former takes a distance matrix and optionally a distance cutoff and calculates rho and delta for each observation. The latter takes the output of densityClust() and make cluster assignment for each observation based on a user defined rho and delta threshold. If the thresholds are not specified the user is able to supply them interactively by clicking on a decision plot.

Usage

library(densityClust)
irisDist <- dist(iris[,1:4])
irisClust <- densityClust(irisDist, gaussian=TRUE)
#> Distance cutoff calculated to 0.2767655
plot(irisClust) # Inspect clustering attributes to define thresholds

irisClust <- findClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)

split(iris[,5], irisClust$clusters)
#> $`1`
#>  [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [11] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [21] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [31] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [41] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> Levels: setosa versicolor virginica
#> 
#> $`2`
#>   [1] versicolor versicolor versicolor versicolor versicolor versicolor
#>   [7] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [13] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [19] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [25] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [31] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [37] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [43] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [49] versicolor versicolor virginica  virginica  virginica  virginica 
#>  [55] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [61] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [67] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [73] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [79] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [85] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [91] virginica  virginica  virginica  virginica  virginica  virginica 
#>  [97] virginica  virginica  virginica  virginica 
#> Levels: setosa versicolor virginica

Note that while the iris dataset contains information on three different species of iris, only two clusters are detected by the algorithm. This is because two of the species (versicolor and virginica) are not clearly seperated by their data.

Refences

Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. https://doi.org/10.1126/science.1242072

More Repositories

1

patchwork

The Composer of ggplots
R
2,354
star
2

gganimate

A Grammar of Animated Graphics
R
1,920
star
3

ggraph

Grammar of Graph Graphics
R
1,032
star
4

ggforce

Accelerating ggplot2
R
898
star
5

tidygraph

A tidy API for graph manipulation
R
529
star
6

ggplot2_workshop

Material for "Drawing Anything with ggplot2" workshop
481
star
7

lime

Local Interpretable Model-Agnostic Explanations (R port of original Python package)
R
478
star
8

scico

Palettes for R based on the Scientific Colour-Maps
R
401
star
9

tweenr

Interpolate your data
R
395
star
10

fiery

A flexible and lightweight web server
R
239
star
11

shinyFiles

A shiny extension for server side file access
JavaScript
188
star
12

ggfx

Filters and Shaders for 'ggplot2'
R
161
star
13

particles

A particle simulation engine based on a port of d3-force
R
118
star
14

transformr

Smooth Polygon Transformations
R
116
star
15

farver

High Performance Colourspace Manipulation in R
R
112
star
16

ambient

A Generator of Multidimensional Noise
R
91
star
17

euclid

Exact Computation Geometry Framework Based on 'CGAL'
C++
82
star
18

hierarchicalSets

Scalable Set Visualization using Hierarchies
R
54
star
19

Hr

Easy Access to Uppercase H
R
53
star
20

routr

Routing of Web Requests in R
R
53
star
21

reqres

Powerful classes for http requests and responses
R
36
star
22

curry

Partial Function Application with %<%, %-<%, and %><%
R
30
star
23

FindMyFriends

Fast alignment-free pangenome creation and exploration
R
27
star
24

pipeplotter

Syntactic ggplot2 Sugar for a Tidy World
R
26
star
25

fawkes

An R Interface to the AxiDraw plotter
R
25
star
26

pearls

Operations on Lists of Data Frames
R
18
star
27

ink

The Modern, High-Performant, Graphic Device for R
C++
17
star
28

PanVizGenerator

Create your own PanViz visualizations
R
17
star
29

ggplot2_mechanics

The Mechanics of ggplot2
TeX
16
star
30

plotting_benchmark

Investigating R graphics performance
HTML
16
star
31

grid

personal devel version of grid
R
15
star
32

PanViz

D3 based visualisation for comparative genomics
JavaScript
14
star
33

boundaries

Algorithms for Working With and Modifying Polygon Boundaries
C++
13
star
34

data_imaginist

data_imaginist source
HTML
12
star
35

nanodev

Graphic Devices for R based on NanoVG
C
10
star
36

MSGFgui

A gui overlay and extension for MSGFplus
R
10
star
37

orion

Spatial Searching for Euclid
C++
9
star
38

web_dev_in_R

Web Development for R Users
TeX
9
star
39

heroku-fiery-demo

A demo fiery application for deployment on Heroku
R
8
star
40

marquee

Markdown Parser and Renderer for R Graphics
C
8
star
41

unmeshy

A Vector Based 3D Renderer
C++
7
star
42

MSsary

Mass spectrometry data in R
R
7
star
43

RcppSNAP

'Rcpp' Integration for the SNAP Network Library
C++
6
star
44

tidy_graph_analysis

Tidy Network Analysis in R
TeX
6
star
45

polyclid

Polygon Support for Euclid
C++
6
star
46

mzID

An mzIdentML parser for R
R
6
star
47

MSGFplus

An MSGF+ interface for R
R
6
star
48

thomasp85.github.io

The source for data-imaginist.com
HTML
6
star
49

mvpcran

CRAN on a stick
R
5
star
50

shady

Compile and Execute Shaders from R
C++
5
star
51

anomaly

Detecting those outliers
R
4
star
52

d3Disco

A showcase for Shiny and D3 integration
JavaScript
4
star
53

firedock

Dockerfiles for fiery
R
4
star
54

phd_dissertation

Pangenome Tools for Rapid, Large-Scale Analysis of Bacterial Genomes
TeX
4
star
55

pepmaps

R package for quantitative peptidomics
R
2
star
56

masochist

For some reason I’m doing all of this from my phone
R
2
star
57

firedock_test

R
1
star
58

firesafety

Security for fiery apps
1
star
59

Biotools

Scripts for CMG-Biotools
Perl
1
star
60

circosScripts

Perl scripts to automate circos plots
Perl
1
star
61

CHtools

A list of diverse functions for CH
R
1
star