• Stars
    star
    107
  • Rank 323,587 (Top 7 %)
  • Language
    R
  • License
    Other
  • Created about 3 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tidy unified interface to clustering models

tidyclust

Codecov test coverage R-CMD-check

The goal of tidyclust is to provide a tidy, unified interface to clustering models. The packages is closely modeled after the parsnip package.

Installation

You can install the released version of tidyclust from CRAN with:

install.packages("tidyclust")

and the development version of tidyclust from GitHub with:

# install.packages("pak")
pak::pak("tidymodels/tidyclust")

Example

The first thing you do is to create a cluster specification. For this example we are creating a K-means model, using the stats engine.

library(tidyclust)
set.seed(1234)

kmeans_spec <- k_means(num_clusters = 3) %>%
  set_engine("stats")

kmeans_spec
#> K Means Cluster Specification (partition)
#> 
#> Main Arguments:
#>   num_clusters = 3
#> 
#> Computational engine: stats

This specification can then be fit using data.

kmeans_spec_fit <- kmeans_spec %>%
  fit(~., data = mtcars)
kmeans_spec_fit
#> tidyclust cluster object
#> 
#> K-means clustering with 3 clusters of sizes 7, 11, 14
#> 
#> Cluster means:
#>        mpg cyl     disp        hp     drat       wt     qsec        vs
#> 1 19.74286   6 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286
#> 3 26.66364   4 105.1364  82.63636 4.070909 2.285727 19.13727 0.9090909
#> 2 15.10000   8 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000
#>          am     gear     carb
#> 1 0.4285714 3.857143 3.428571
#> 3 0.7272727 4.090909 1.545455
#> 2 0.1428571 3.285714 3.500000
#> 
#> Clustering vector:
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>                   1                   1                   2                   1 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>                   3                   1                   3                   2 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>                   2                   1                   1                   3 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>                   3                   3                   3                   3 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>                   3                   2                   2                   2 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>                   2                   3                   3                   3 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>                   3                   2                   2                   2 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>                   3                   1                   3                   2 
#> 
#> Within cluster sum of squares by cluster:
#> [1] 13954.34 11848.37 93643.90
#>  (between_SS / total_SS =  80.8 %)
#> 
#> Available components:
#> 
#> [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
#> [6] "betweenss"    "size"         "iter"         "ifault"

Once you have a fitted tidyclust object, you can do a number of things. predict() returns the cluster a new observation belongs to

predict(kmeans_spec_fit, mtcars[1:4, ])
#> # A tibble: 4 × 1
#>   .pred_cluster
#>   <fct>        
#> 1 Cluster_1    
#> 2 Cluster_1    
#> 3 Cluster_2    
#> 4 Cluster_1

extract_cluster_assignment() returns the cluster assignments of the training observations

extract_cluster_assignment(kmeans_spec_fit)
#> # A tibble: 32 × 1
#>    .cluster 
#>    <fct>    
#>  1 Cluster_1
#>  2 Cluster_1
#>  3 Cluster_2
#>  4 Cluster_1
#>  5 Cluster_3
#>  6 Cluster_1
#>  7 Cluster_3
#>  8 Cluster_2
#>  9 Cluster_2
#> 10 Cluster_1
#> # ℹ 22 more rows

and extract_centroids() returns the locations of the clusters

extract_centroids(kmeans_spec_fit)
#> # A tibble: 3 × 12
#>   .cluster    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <fct>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Cluster_1  19.7     6  183. 122.   3.59  3.12  18.0 0.571 0.429  3.86  3.43
#> 2 Cluster_2  26.7     4  105.  82.6  4.07  2.29  19.1 0.909 0.727  4.09  1.55
#> 3 Cluster_3  15.1     8  353. 209.   3.23  4.00  16.8 0     0.143  3.29  3.5

Visual comparison of clustering methods

Below is a visualization of the available models and how they compare using 2 dimensional toy data sets.

Mock comparison for different clustering methods for different data sets. Each row correspods to a clustering method, each column corresponds to a data set type.

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

More Repositories

1

broom

Convert statistical analysis objects from R into tidy format
R
1,445
star
2

tidymodels

Easily install and load the tidymodels packages
R
747
star
3

infer

An R package for tidyverse-friendly statistical inference
R
723
star
4

parsnip

A tidy unified interface to models
R
595
star
5

corrr

Explore correlations in R
R
588
star
6

TMwR

Code and content for "Tidy Modeling with R"
RMarkdown
585
star
7

recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
R
558
star
8

yardstick

Tidy methods for measuring model performance
R
367
star
9

rsample

Classes and functions to create and summarize resampling objects
R
335
star
10

stacks

An R package for tidy stacked ensemble modeling
R
294
star
11

tune

Tools for tidy parameter tuning
R
268
star
12

tidypredict

Run predictions inside the database
R
259
star
13

workflows

Modeling Workflows
R
201
star
14

textrecipes

Extra recipes for Text Processing
R
159
star
15

themis

Extra recipes steps for dealing with unbalanced data
R
141
star
16

embed

Extra recipes for predictor embeddings
R
141
star
17

butcher

Reduce the size of model objects saved to disk
R
130
star
18

censored

Parsnip wrappers for survival models
R
123
star
19

probably

Tools for post-processing class probability estimates
R
113
star
20

dials

Tools for creating tuning parameter values
R
111
star
21

tidyposterior

Bayesian comparisons of models using resampled statistics
R
102
star
22

hardhat

Construct Modeling Packages
R
100
star
23

aml-training

The most recent version of the Applied Machine Learning notes
HTML
100
star
24

tidymodels.org-legacy

Legacy Source of tidymodels.org
HTML
99
star
25

workshops

Website and materials for tidymodels workshops
JavaScript
92
star
26

workflowsets

Create a collection of modeling workflows
R
92
star
27

usemodels

Boilerplate Code for tidymodels
R
85
star
28

modeldb

Run models inside a database using R
R
80
star
29

multilevelmod

Parsnip wrappers for mixed-level and hierarchical models
R
74
star
30

spatialsample

Create and summarize spatial resampling objects 🗺
R
70
star
31

learntidymodels

Learn tidymodels with interactive learnr primers
R
68
star
32

brulee

High-Level Modeling Functions with 'torch'
R
67
star
33

finetune

Additional functions for model tuning
R
62
star
34

bonsai

parsnip wrappers for tree-based models
R
51
star
35

shinymodels

R
46
star
36

applicable

Quantify extrapolation of new samples given a training set
R
46
star
37

model-implementation-principles

recommendations for creating R modeling packages
HTML
41
star
38

rules

parsnip extension for rule-based models
R
40
star
39

planning

Documents to plan and discuss future development
37
star
40

discrim

Wrappers for discriminant analysis and naive Bayes models for use with the parsnip package
R
28
star
41

baguette

parsnip Model Functions for Bagging
R
24
star
42

modeldata

Data Sets Used by tidymodels Packages
R
22
star
43

poissonreg

parsnip wrappers for Poisson regression
R
22
star
44

agua

Create and evaluate models using 'tidymodels' and 'h2o'
R
21
star
45

extratests

Integration and other testing for tidymodels
R
20
star
46

tidymodels.org

Source of tidymodels.org
JavaScript
19
star
47

plsmod

Model Wrappers for Projection Methods
R
14
star
48

cloudstart

RStudio Cloud ☁️ resources to accompany tidymodels.org
12
star
49

orbital

Turn Tidymodels Workflows Into Series of Equations
R
12
star
50

desirability2

Desirability Functions for Multiparameter Optimization
R
10
star
51

modeldatatoo

More Data Sets Useful for Modeling Examples
R
7
star
52

.github

GitHub contributing guidelines for tidymodels packages
4
star
53

modelenv

Provide Tools to Register Models for use in Tidymodels
R
4
star
54

tailor

Sandbox for a postprocessor object.
R
2
star
55

survivalauc

What the Package Does (One Line, Title Case)
C
2
star