• Stars
    star
    404
  • Rank 106,897 (Top 3 %)
  • Language
    R
  • Created almost 11 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Professional data validation for the R environment

CRAN Downloads status Mentioned in Awesome Official Statistics

Easy data validation for the masses.

The validate R-package makes it super-easy to check whether data lives up to expectations you have based on domain knowledge. It works by allowing you to define data validation rules independent of the code or data set. Next you can confront a dataset, or various versions thereof with the rules. Results can be summarized, plotted, and so on. Below is a simple example.

> library(validate)
> check_that(iris, Sepal.Width < 0.5*Sepal.Length) |> summary()
  rule items passes fails nNA error warning                       expression
1   V1   150     79    71   0 FALSE   FALSE Sepal.Width < 0.5 * Sepal.Length

With validate, data validation rules are treated as first-class citizens. This means you can import, export, annotate, investigate and manipulate data validation rules in a meaninful way.

To get started: see our data validation cookbook.

Citing

Please cite the JSS article

@article{van2021data,
  title={Data validation infrastructure for R},
  author={van der Loo, Mark PJ and de Jonge, Edwin},
  journal={Journal of Statistical Software},
  year={2021},
  volume ={97},
  issue = {10},
  pages = {1-33},
  doi={10.18637/jss.v097.i10},
  url = {https://www.jstatsoft.org/article/view/v097i10}
}

To cite the theory, please cite our Wiley StatsRef chapter.

@article{loo2020data,
  title = {Data Validation},
  year = {2020},
  journal = {Wiley StatsRef: Statistics Reference Online},
  author = {M.P.J. van der Loo and E. de Jonge},
  pages = {1--7},
  doi = {https://doi.org/10.1002/9781118445112.stat08255},
  url = {https://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat08255}
}

Other Resources

Installation

The latest release can be installed from the R command-line

install.packages("validate")

The development version can be installed as follows.

git clone https://github.com/data-cleaning/validate
cd validate
make install

Note that the development version likely contain bugs (please report them!) and interfaces that may not be stable.

More Repositories

1

useR2019_tutorial

Tutorial for useR2019
R
36
star
2

validatedb

Validate on a table in a DB, using dbplyr
R
32
star
3

errorlocate

Find and replace erroneous fields in data using validation rules
R
21
star
4

editrules

R package for handling, checking and enforcing data rules
R
20
star
5

validatetools

R
15
star
6

book

Resources for Statistical Data Cleaning with Applications in R
R
13
star
7

deductive

Methods for deductive data correction and imputation
R
11
star
8

uRos2018_tutorial

Data-Cleaning tutorial for Use of R in Official Statistics
TeX
10
star
9

dcmodify

Modify data records using separately defined modification rules
R
10
star
10

useR2021_tutorial

Materials for the useR!2021 tutorial on data validation
R
8
star
11

deducorrect

An R package for rule-based record correction and imputation
R
7
star
12

ISM2020_tutorial

Data Cleaning for Official Statistics
R
6
star
13

ValidatPoC

Rules and data for the PoC of the ESSnet on Validation
TeX
6
star
14

ValidatReport

Standard validation report structure for the ESS
TeX
6
star
15

dcmodifydb

Deterministic, documented correction rules on a database
R
5
star
16

validatesuggest

Generate validation rules from data
R
5
star
17

validatereport

Create attractive validation reports, export validation results to ESS json reporting standard
R
4
star
18

dcmodifydt

dcmodify on a data.table
R
3
star
19

dirtyharry

Make your data dirty
Makefile
3
star
20

IW2024

Course material for the International Week
R
3
star
21

lintools

Tools for manipulating systems of linear (in)equalities
R
3
star
22

EESW2019_tutorial

Materials for the short course at the European Establishment Statistics Workshop 2019
R
2
star
23

Madrid2019

Slides and exercises for Mark's visit to the Complutense University of Madrid and INE
TeX
1
star
24

drat

Beta versions of data-cleaning packages
1
star
25

uRos2019_tutorial

Tutorial materials for uRos 2019
TeX
1
star
26

validate.viz

visualisations for the R package validate
R
1
star