• Stars
    star
    181
  • Rank 212,110 (Top 5 %)
  • Language
    R
  • License
    GNU General Publi...
  • Created about 9 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Mining Association Rules and Frequent Itemsets with R

R package arules - Mining Association Rules and Frequent Itemsets

CRAN version stream r-universe status CRAN RStudio mirror downloads Anaconda.org

Introduction

The arules package family for R provides the infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat. In addition, the following mining algorithms are available via fim4r:

  • Apriori
  • Eclat
  • Carpenter
  • FPgrowth
  • IsTa
  • RElim
  • SaM

Code examples can be found in Chapter 5 of the web book R Companion for Introduction to Data Mining.

Please cite the use of this package as:

Hahsler M, Gruen B, Hornik K (2005). “arules - A Computational Environment for Mining Association Rules and Frequent Item Sets.” Journal of Statistical Software, 14(15), 1-25. ISSN 1548-7660, doi:10.18637/jss.v014.i15 https://doi.org/10.18637/jss.v014.i15.

@Article{,
  title = {arules -- {A} Computational Environment for Mining Association Rules and Frequent Item Sets},
  author = {Michael Hahsler and Bettina Gruen and Kurt Hornik},
  year = {2005},
  journal = {Journal of Statistical Software},
  volume = {14},
  number = {15},
  pages = {1--25},
  doi = {10.18637/jss.v014.i15},
  month = {October},
  issn = {1548-7660},
}

Packages

arules core packages

  • arules: arules base package with data structures, mining algorithms (APRIORI and ECLAT), interest measures.
  • arulesViz: Visualization of association rules.
  • arulesCBA: Classification algorithms based on association rules (includes CBA).
  • arulesSequences: Mining frequent sequences (cSPADE).

Other related packages

Additional mining algorithms

  • arulesNBMiner: Mining NB-frequent itemsets and NB-precise rules.
  • fim4r: Provides fast implementations for several mining algorithms. An interface function called fim4r() is provided in arules.
  • opusminer: OPUS Miner algorithm for finding the op k productive, non-redundant itemsets. Call opus() with format = 'itemsets'.
  • RKEEL: Interface to KEEL’s association rule mining algorithm.
  • RSarules: Mining algorithm which randomly samples association rules with one pre-chosen item as the consequent from a transaction dataset.

In-database analytics

  • ibmdbR: IBM in-database analytics for R can calculate association rules from a database table.
  • rfml: Mine frequent itemsets or association rules using a MarkLogic server.

Interface

  • rattle: Provides a graphical user interface for association rule mining.
  • pmml: Generates PMML (predictive model markup language) for association rules.

Classification

  • arc: Alternative CBA implementation.
  • inTrees: Interpret Tree Ensembles provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner.
  • rCBA: Alternative CBA implementation.
  • qCBA: Quantitative Classification by Association Rules.
  • sblr: Scalable Bayesian rule lists algorithm for classification.

Outlier Detection

Recommendation/Prediction

  • recommenerlab: Supports creating predictions using association rules.

The following R packages use arules: aPEAR, arc, arulesCBA, arulesNBMiner, arulesSequences, arulesViz, Biocomb, clickstream, CLONETv2, ctsem, discnorm, fcaR, fdm2id, GroupBN, ibmdbR, immcp, inTrees, liayson, MACP, opusminer, pmml, qCBA, RareComb, rattle, rCBA, recommenderlab, RKEEL, RSarules, TELP

Installation

Stable CRAN version: Install from within R with

install.packages("arules")

Current development version: Install from r-universe.

install.packages("arules", repos = "https://mhahsler.r-universe.dev")

Usage

Load package and mine some association rules.

library("arules")
data("IncomeESL")

trans <- transactions(IncomeESL)
trans
## transactions in sparse format with
##  8993 transactions (rows) and
##  84 items (columns)
rules <- apriori(trans, supp = 0.1, conf = 0.9, target = "rules")
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 899 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[84 item(s), 8993 transaction(s)] done [0.00s].
## sorting and recoding items ... [42 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [457 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Inspect the rules with the highest lift.

inspect(head(rules, n = 3, by = "lift"))
##     lhs                           rhs                      support confidence coverage lift count
## [1] {dual incomes=no,                                                                            
##      householder status=own}   => {marital status=married}    0.10       0.97     0.10  2.6   914
## [2] {years in bay area=>10,                                                                      
##      dual incomes=yes,                                                                           
##      type of home=house}       => {marital status=married}    0.10       0.96     0.10  2.6   902
## [3] {dual incomes=yes,                                                                           
##      householder status=own,                                                                     
##      type of home=house,                                                                         
##      language in home=english} => {marital status=married}    0.11       0.96     0.11  2.6   988

Using arules with tidyverse

arules works seamlessly with tidyverse. For example:

  • dplyr can be used for cleaning and preparing the transactions.
  • transaction() and other functions accept tibble as input.
  • Functions in arules can be connected with the pipe operator |>.
  • arulesViz provides visualizations based on ggplot2.

For example, we can remove the ethnic information column before creating transactions and then mine and inspect rules.

library("tidyverse")
library("arules")
data("IncomeESL")

trans <- IncomeESL |>
    select(-`ethnic classification`) |>
    transactions()
rules <- trans |>
    apriori(supp = 0.1, conf = 0.9, target = "rules", control = list(verbose = FALSE))
rules |>
    head(3, by = "lift") |>
    as("data.frame") |>
    tibble()
## # A tibble: 3 × 6
##   rules                                  support confidence coverage  lift count
##   <chr>                                    <dbl>      <dbl>    <dbl> <dbl> <int>
## 1 {dual incomes=no,householder status=o…   0.102      0.971    0.105  2.62   914
## 2 {years in bay area=>10,dual incomes=y…   0.100      0.961    0.104  2.59   902
## 3 {dual incomes=yes,householder status=…   0.110      0.960    0.114  2.59   988

Using arules from Python

arules and arulesViz can now be used directly from Python with the Python package arulespy available form PyPI.

Support

Please report bugs here on GitHub. Questions should be posted on stackoverflow and tagged with arules.

References

More Repositories

1

dbscan

Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package
C++
278
star
2

recommenderlab

recommenderlab - Lab for Developing and Testing Recommender Algorithms - R package
R
207
star
3

Introduction_to_Data_Mining_R_Examples

R Code to accompany the book Introduction to Data Mining by Tan, Steinbach and Kumar (Code by Michael Hahsler)
HTML
101
star
4

rBLAST

Interface for the Basic Local Alignment Search Tool (BLAST) - R-Package
R
99
star
5

seriation

Infrastructure for Ordering using Seriation - R Package
R
72
star
6

TSP

Traveling Salesperson Problem - R package
R
61
star
7

arulesViz

Visualizing Association Rules and Frequent Itemsets with R
R
52
star
8

CS7320-AI

Examples for an AI course following the textbook Artificial Intelligence: A Modern Approach by Russell and Norvig.
Jupyter Notebook
44
star
9

stream

A framework for data stream modeling and associated data mining tasks such as clustering and classification. - R Package
R
36
star
10

pomdp

R package for Partially Observable Markov Decision Processes
R
14
star
11

CS2341

Code Examples for Data Structures with C++
C++
13
star
12

streamMOA

Interface for data stream clustering algorithms implemented in the MOA (Massive Online Analysis) framework.
R
12
star
13

rMSA

Interface to Popular Multiple Sequence Alignment Tools - R-package
TeX
8
star
14

arulespy

Python interface to arules for association rule mining
TeX
7
star
15

arulesNBMiner

Mining NB-Frequent Itemsets and NB-Precise Rules - R Package
Java
6
star
16

qap

Heuristics for the Quadratic Assignment Problem (QAP) - R package
Fortran
4
star
17

ShinyApp_DB_HelloWorld

An example for a simple ShinyApp that connects to a remote database (SQLServer)
R
3
star
18

mdp

R package for Discrete-Time Markov Decision Processes
R
2
star
19

fit_dist

Simple R script to fit distributions to data
R
2
star
20

rRDP

Seamlessly interfaces RDP classifier.
R
1
star
21

streamConnect

Connecting Stream Mining Components Using Web Services
R
1
star
22

rEMM

R
1
star
23

pomdpSolve

Provides Cassandra's pomdp-solve program.
C
1
star