• Stars
    star
    156
  • Rank 232,586 (Top 5 %)
  • Language
    R
  • License
    GNU General Publi...
  • Created over 11 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Julia package for loading many of the data sets available in R

RDatasets.jl

Build status

The RDatasets package provides an easy way for Julia users to experiment with most of the standard data sets that are available in the core of R as well as datasets included with many of R's most popular packages. This package is essentially a simplistic port of the Rdatasets repo created by Vincent Arelbundock, who conveniently gathered data sets from many of the standard R packages in one convenient location on GitHub at https://github.com/vincentarelbundock/Rdatasets

In order to load one of the data sets included in the RDatasets package, you will need to have the DataFrames package installed. This package is automatically installed as a dependency of the RDatasets package if you install RDatasets as follows:

Pkg.add("RDatasets")

After installing the RDatasets package, you can then load data sets using the dataset() function, which takes the name of a package and a data set as arguments:

using RDatasets
iris = dataset("datasets", "iris")
neuro = dataset("boot", "neuro")

Data Sets

The RDatasets.packages() function returns a table of represented R packages:

Package Title
COUNT Functions, data and code for count data.
Ecdat Data sets for econometrics
HSAUR A Handbook of Statistical Analyses Using R (1st Edition)
HistData Data sets from the history of statistics and data visualization
ISLR Data for An Introduction to Statistical Learning with Applications in R
KMsurv Data sets from Klein and Moeschberger (1997), Survival Analysis
MASS Support Functions and Datasets for Venables and Ripley's MASS
SASmixed Data sets from "SAS System for Mixed Models"
Zelig Everyone's Statistical Software
adehabitatLT Analysis of Animal Movements
boot Bootstrap Functions (Originally by Angelo Canty for S)
car Companion to Applied Regression
cluster Cluster Analysis Extended Rousseeuw et al.
datasets The R Datasets Package
gamair Datasets used in the book Generalized Additive Models: An Introduction with R
gap Genetic analysis package
ggplot2 An Implementation of the Grammar of Graphics
lattice Lattice Graphics
lme4 Linear mixed-effects models using Eigen and S4
mgcv Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation
mlmRev Examples from Multilevel Modelling Software Review
nlreg Higher Order Inference for Nonlinear Heteroscedastic Models
plm Linear Models for Panel Data
plyr Tools for splitting, applying and combining data
pscl Political Science Computational Laboratory, Stanford University
psych Procedures for Psychological, Psychometric, and Personality Research
quantreg Quantile Regression
reshape2 Flexibly Reshape Data: A Reboot of the Reshape Package.
robustbase Basic Robust Statistics
rpart Recursive Partitioning and Regression Trees
sandwich Robust Covariance Matrix Estimators
sem Structural Equation Models
survival Survival Analysis
vcd Visualizing Categorical Data

The RDatasets.datasets() function returns a table describing the 700+ included datasets. Or pass in a package name (e.g. RDatasets.datasets("mlmRev")) for a targeted table:

Package Dataset Title Rows Columns
mlmRev Chem97 Scores on A-level Chemistry in 1997 31022 8
mlmRev Contraception Contraceptive use in Bangladesh 1934 6
mlmRev Early Early childhood intervention study 309 4
mlmRev Exam Exam scores from inner London 4059 10
mlmRev Gcsemv GCSE exam score 1905 5
mlmRev Hsb82 High School and Beyond - 1982 7185 8
mlmRev Mmmec Malignant melanoma deaths in Europe 354 6
mlmRev Oxboys Heights of Boys in Oxford 234 4
mlmRev ScotsSec Scottish secondary school scores 3435 6
mlmRev bdf Language Scores of 8-Graders in The Netherlands 2287 28
mlmRev egsingle US Sustaining Effects study 7230 12
mlmRev guImmun Immunization in Guatemala 2159 13
mlmRev guPrenat Prenatal care in Guatemala 2449 15
mlmRev star Student Teacher Achievement Ratio (STAR) project data 26796 18

Licensing and Intellectual Property

Following Vincent's lead, we have assumed that all of the data sets in this repository can be made available under the GPL-3 license. If you know that one of the datasets released here should not be released publicly or if you know that a data set can only be released under a different license, please contact me so that I can remove the data set from this repository.

More Repositories

1

Distributions.jl

A Julia package for probability distributions and associated functions.
Julia
1,073
star
2

GLM.jl

Generalized linear models in Julia
Julia
574
star
3

StatsBase.jl

Basic statistics for Julia
Julia
565
star
4

Distances.jl

A Julia package for evaluating distances (metrics) between vectors.
Julia
410
star
5

MixedModels.jl

A Julia package for fitting (statistical) mixed-effects models
Julia
395
star
6

MultivariateStats.jl

A Julia package for multivariate statistics and data analysis (e.g. dimension reduction)
Julia
370
star
7

Clustering.jl

A Julia package for data clustering
Julia
343
star
8

TimeSeries.jl

Time series toolkit for Julia
Julia
342
star
9

HypothesisTests.jl

Hypothesis tests for Julia
Julia
287
star
10

StatsModels.jl

Specifying, fitting, and evaluating statistical models in Julia
Julia
245
star
11

StatsFuns.jl

Mathematical functions related to statistics.
Julia
224
star
12

MLBase.jl

A set of functions to support the development of machine learning algorithms
Julia
186
star
13

KernelDensity.jl

Kernel density estimators for Julia
Julia
169
star
14

Klara.jl

MCMC inference in Julia
Julia
168
star
15

Lasso.jl

Lasso/Elastic Net linear and generalized linear models
Julia
141
star
16

StatsKit.jl

Convenience meta-package to load essential packages for statistics
Julia
136
star
17

PDMats.jl

Uniform Interface for positive definite matrices of various structures
Julia
103
star
18

GLMNet.jl

Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet
Julia
96
star
19

Loess.jl

Local regression, so smooooth!
Julia
95
star
20

NMF.jl

A Julia package for non-negative matrix factorization
Julia
89
star
21

LogExpFunctions.jl

Julia package for various special functions based on `log` and `exp`.
Julia
73
star
22

Survival.jl

Survival analysis in Julia
Julia
69
star
23

Statistics.jl

The Statistics stdlib that ships with Julia.
Julia
64
star
24

TimeModels.jl

Modeling time series in Julia
Julia
57
star
25

DataArrays.jl

DEPRECATED: Data structures that allow missing values
Julia
53
star
26

PGM.jl

A Julia framework for probabilistic graphical models.
Julia
52
star
27

ConjugatePriors.jl

A Julia package to support conjugate prior distributions.
Julia
46
star
28

SVM.jl

SVM's for Julia
Julia
41
star
29

Roadmap.jl

A centralized location for planning the direction of JuliaStats
35
star
30

NullableArrays.jl

DEPRECATED Prototype of the new JuliaStats NullableArrays package
Julia
35
star
31

Distance.jl

Julia module for Distance evaluation
Julia
27
star
32

DimensionalityReduction.jl

Deprecated in favor of MultivariateStats.jl
Julia
27
star
33

Rmath-julia

The Rmath library from R
C
25
star
34

RegERMs.jl

DEPRECATED: Regularised Empirical Risk Minimisation Framework (SVMs, LogReg, Linear Regression) in Julia
Julia
17
star
35

StatsAPI.jl

A statistics-focused namespace for packages to share functions
Julia
17
star
36

Rmath.jl

Archive of functions that emulate R's d-p-q-r functions for probability distributions
Julia
16
star
37

JuliaStats.github.io

The home page of JuliaStats
CSS
11
star
38

NullableStats.jl

DEPRECATED: Statistical functionality for NullableArrays
Julia
4
star
39

RmathBuilder

Builder repository for rmath-julia
Julia
3
star
40

RmathDist.jl

Julia interface to Rmath distribution functions
Julia
3
star