• Stars
    star
    401
  • Rank 103,705 (Top 3 %)
  • Language
    Ruby
  • License
    BSD 3-Clause "New...
  • Created about 14 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A suite for basic and advanced statistics on Ruby.

Statsample

Homepage :: https://github.com/sciruby/statsample

Build Status Gem Version

DESCRIPTION

A suite for basic and advanced statistics on Ruby. Tested on Ruby 2.1.1p76 (June 2014), 1.8.7, 1.9.1, 1.9.2 (April, 2010), ruby-head(June, 2011) and JRuby 1.4 (Ruby 1.8.7 compatible).

Include:

  • Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).
  • Imports and exports datasets from and to Excel, CSV and plain text files.
  • Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and gamma. Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.
  • Intra-class correlation
  • Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with contrasts for One-way ANOVA.
  • Tests: F, T, Levene, U-Mannwhitney.
  • Regression: Simple, Multiple (OLS), Probit and Logit
  • Factorial Analysis: Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax) and Parallel Analysis and Velicer's MAP test, for estimation of number of factors.
  • Reliability analysis for simple scale and a DSL to easily analyze multiple scales using factor analysis and correlations, if you want it.
  • Basic time series support
  • Dominance Analysis, with multivariate dependent and bootstrap (Azen & Budescu)
  • Sample calculation related formulas
  • Structural Equation Modeling (SEM), using R libraries +sem+ and +OpenMx+
  • Creates reports on text, html and rtf, using ReportBuilder gem
  • Graphics: Histogram, Boxplot and Scatterplot

Principles

  • Software Design:
    • One module/class for each type of analysis
    • Options can be set as hash on initialize() or as setters methods
    • Clean API for interactive sessions
    • summary() returns all necessary informacion for interactive sessions
    • All statistical data available though methods on objects
    • All (important) methods should be tested. Better with random data.
  • Statistical Design
    • Results are tested against text results, SPSS and R outputs.
    • Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible
    • (When possible) All references for methods are documented, providing sensible information on documentation

Features

  • Classes for manipulation and storage of data:
    • Statsample::Vector: An extension of an array, with statistical methods like sum, mean and standard deviation
    • Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample.
    • Statsample::Multiset: multiple datasets with same fields and type of vectors
  • Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast
  • Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices
  • Multiple types of regression.
    • Simple Regression : Statsample::Regression::Simple
    • Multiple Regression: Statsample::Regression::Multiple
    • Logit Regression: Statsample::Regression::Binomial::Logit
    • Probit Regression: Statsample::Regression::Binomial::Probit
  • Factorial Analysis algorithms on Statsample::Factor module.
    • Classes for Extraction of factors:
      • Statsample::Factor::PCA
      • Statsample::Factor::PrincipalAxis
    • Classes for Rotation of factors:
      • Statsample::Factor::Varimax
      • Statsample::Factor::Equimax
      • Statsample::Factor::Quartimax
    • Classes for calculation of factors to retain
      • Statsample::Factor::ParallelAnalysis performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components.
      • Statsample::Factor::MAP performs Velicer's Minimum Average Partial (MAP) test, which retain components as long as the variance in the correlation matrix represents systematic variance.
  • Dominance Analysis. Based on Budescu and Azen papers, dominance analysis is a method to analyze the relative importance of one predictor relative to another on multiple regression
    • Statsample::DominanceAnalysis class can report dominance analysis for a sample, using uni or multivariate dependent variables
    • Statsample::DominanceAnalysis::Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by Azen & Budescu (2003) link[http://psycnet.apa.org/journals/met/8/2/129/].
  • Module Statsample::Codification, to help to codify open questions
  • Converters to import and export data:
    • Statsample::Database : Can create sql to create tables, read and insert data
    • Statsample::CSV : Read and write CSV files
    • Statsample::Excel : Read and write Excel files
    • Statsample::Mx : Write Mx Files
    • Statsample::GGobi : Write Ggobi files
  • Module Statsample::Crosstab provides function to create crosstab for categorical data
  • Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
    • Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted.
    • Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them.
    • Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations.
  • Module Statsample::SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples
  • Module Statsample::Test provides several methods and classes to perform inferencial statistics
    • Statsample::Test::BartlettSphericity
    • Statsample::Test::ChiSquare
    • Statsample::Test::F
    • Statsample::Test::KolmogorovSmirnov (only D value)
    • Statsample::Test::Levene
    • Statsample::Test::UMannWhitney
    • Statsample::Test::T
    • Statsample::Test::WilcoxonSignedRank
  • Module Graph provides several classes to create beautiful graphs using rubyvis
    • Statsample::Graph::Boxplot
    • Statsample::Graph::Histogram
    • Statsample::Graph::Scatterplot
  • Gem bio-statsample-timeseries provides module Statsample::TimeSeries with support for time series, including ARIMA estimation using Kalman-Filter.
  • Gem statsample-sem provides a DSL to R libraries +sem+ and +OpenMx+
  • Gem statsample-glm provides you with GML method, to work with Logistic, Poisson and Gaussian regression ,using ML or IRWLS.
  • Close integration with gem reportbuilder, to easily create reports on text, html and rtf formats.

Examples of use:

See the examples folder too.

Boxplot

require 'statsample'

ss_analysis(Statsample::Graph::Boxplot) do 
  n=30
  a=rnorm(n-1,50,10)
  b=rnorm(n, 30,5)
  c=rnorm(n,5,1)
  a.push(2)
  boxplot(:vectors=>[a,b,c], :width=>300, :height=>300, :groups=>%w{first first second}, :minimum=>0)
end
Statsample::Analysis.run # Open svg file on *nix application defined

Correlation matrix

require 'statsample'
# Note R like generation of random gaussian variable
# and correlation matrix

ss_analysis("Statsample::Bivariate.correlation_matrix") do
  samples=1000
  ds=data_frame(
    'a'=>rnorm(samples), 
    'b'=>rnorm(samples),
    'c'=>rnorm(samples),
    'd'=>rnorm(samples))
  cm=cor(ds) 
  summary(cm)
end

Statsample::Analysis.run_batch # Echo output to console

Requirements

Optional:

  • Plotting: gnuplot and rbgnuplot, SVG::Graph
  • Factorial analysis and polychorical correlation(joint estimate and polychoric series): gsl library and rb-gsl (https://rubygems.org/gems/rb-gsl/). You should install it using gem install rb-gsl.

Note: Use gsl 1.12.109 or later.

Resources

Installation

$ sudo gem install statsample

On *nix, you should install statsample-optimization to retrieve gems gsl, statistics2 and a C extension to speed some methods.

There are available precompiled version for Ruby 1.9 on x86, x86_64 and mingw32 archs.

$ sudo gem install statsample-optimization

If you use Ruby 1.8, you should compile statsample-optimization, usign parameter --platform ruby

$ sudo gem install statsample-optimization --platform ruby

If you need to work on Structural Equation Modeling, you could see +statsample-sem+. You need R with +sem+ or +OpenMx+ [http://openmx.psyc.virginia.edu/] libraries installed

$ sudo gem install statsample-sem

Available setup.rb file

sudo gem ruby setup.rb

License

BSD-3 (See LICENSE.txt)

Could change between version, without previous warning. If you want a specific license, just choose the version that you need.

More Repositories

1

rubyvis

Ruby version of Protovis, a visualization toolkit.
Ruby
236
star
2

rinruby

Ruby library that integrates the R interpreter in Ruby, making R's statistical routines and graphics available within Ruby.
Ruby
152
star
3

distribution

Statistical Distributions multi library wrapper. Uses Ruby by default and C (statistics2/GSL) or Java extensions where available.
Ruby
141
star
4

Rserve-Ruby-client

Pure Ruby client for Rserve. Based on 'new' Java client provided with server, but with modifications to adhere to POLS
Ruby
126
star
5

PHP_Beautifier

This program reformat and beautify PHP 4 and PHP 5 source code files automatically. The program is Open Source and distributed under the terms of PHP Licence. It is written in PHP 5 and has a command line tool.
PHP
106
star
6

extendmatrix

Cosmin Bonchis's enhancements to the Ruby "Vector" and "Matrix" module and includes: LU and QR (Householder, Givens, Gram Schmidt, Hessenberg) decompositions, bidiagonalization, eigenvalue and eigenvector calculations. Work on Ruby 1.8.7, 1.9.1 and 1.9.2 (SVN version)
Ruby
33
star
7

buhos

Web based systematic literature review software
Ruby
27
star
8

dominanceAnalysis

Dominance Analysis (Azen and Bodescu), for multiple regression models: OLS (univariate, multivariate), beta regression, Dynamic Linear Models, GLM and HLM
R
21
star
9

minimization

Minimization algorithms on pure Ruby
Ruby
19
star
10

integration

Integration methods, based on original work by Beng
Ruby
15
star
11

svg-graph

SVG:::Graph is a pure Ruby library for generating charts, which are a type of graph where the values of one axis are not scalar. SVG::Graph has a verry similar API to the Perl library SVG::TT::Graph, and the resulting charts also look the same. This isn't surprising, because SVG::Graph started as a loose port of SVG::TT::Graph, although the internal code no longer resembles the Perl original at all.
Ruby
13
star
12

reportbuilder

Ruby report abstract interface. Creates text, html and rtf output, based on a common framework.
JavaScript
11
star
13

web-plotter3d

A web based 3d plotter, using canvas and JS
JavaScript
6
star
14

analisis_covid19_chile

Ejemplo de analisis de serie de tiempo sobre casos Covid-19. Se presenta un reporte comparativo de Chile con otros países, así como por regiones.
HTML
6
star
15

statsample-optimization

Extension for statsample, that speed some methods.
C
5
star
16

gsl_narray_stdlib_comparison

Compare perfomance of GSL, Narray and stdlib Matrix on matrix and vectors operations
Ruby
4
star
17

statsample-sem

Structural Equation Modeling on Ruby, using statsample and R
Ruby
4
star
18

exner

Create structural summary for Rorschach Comprehensive System. Works with Znudic method for Zulliger test, based on Exner's RCS.
Ruby
3
star
19

analisis_constitucion

Análisis de contenido de las diferencias entre la versión de 1980 y la versión de 2005 de la Constitución Chilena
Ruby
3
star
20

coolcat-r

Clustering algorithm Coolcat(Barbará, Couto, Li) implemented on R
R
3
star
21

buhos-windows-tk

Windows toolkit to build Buhos installer
HTML
2
star
22

dirty-memoize

Like Memoize, but designed for mutable and parametizable objects
Ruby
2
star
23

rcebn

Just some random methods I use on R. Mainly mplus interfaces and some utilities
R
1
star
24

statsample-bivariate-extension

Polychoric and Tetrachoric support for statsample
Ruby
1
star