• This repository has been archived on 15/May/2018
  • Stars
    star
    141
  • Rank 250,580 (Top 6 %)
  • Language
    R
  • Created over 10 years ago
  • Updated almost 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A package to run unit tests on tabular data

Build Status

testdat!

This package provides a test suite to ensure that tabular data are correctly formatted. It will ensure that columns do not have unicode characters, numeric columns don't have characters, and that columns of data can be tested to ensure that there are no outliers. This suite would be extremely useful alongside unit tests for code to ensure that data read into R do not have errors in them.

Installation

library(devtools)
# if you don't have the package, run install.packages("devtools")
install_github("ropensci/testdat")

Use Cases

The testdat package has two types of functions -- those to test for errors in data.frames, and those to correct for these errors in data.frames.

The testing suite of functions should be used immediately after loading a data.frame into R. These functions are prepended with test. This suite has two goals -- first, it allows you as a user to immediately identify potential issues with the data. Second, it functions to communiate to readers of your analysis that you investigated errors in your data. One possible usecase, then, is to print the results of these tests in your analysis or documentation immediately after loading the data.

> data <- read.csv(system.file("extdata", "2012.csv", package="testdat"))
> test_utf8(data)
[1] FALSE

The correcting suite of functions should be used in the case that the testing suite of functions elucidate issues with your data. These functions are prepended with fix. Not every testing function has a correction function -- for example, a correction function for test_outliers.R would have serious statistical implications. However, for functions such as test_utf8.R, we have included a fix_utf8.R function for a quick fix to a negative test.

> filename <- system.file("extdata", "km1314-waypoints.csv", package="testdat")
> data <- read.csv(filename, header=FALSE)
> test_utf8(data)
[1] TRUE
> clean_data <- fix_utf8(data)
> test_utf8(clean_data)
[1] FALSE

(Note the above is just pseudocode until we get these functions working.)

Using the testdat suite of functions allows you to create a convincing argument that you have properly dealt with data quality issues, in a way that is easily followed by readers of your analysis. Presenting these tests and corrections in documentation adds reproducibility to the way that you identified and corrected errors, or verifying that your data did not have errors.

Examples

Testing for outliers

Testing/Fixing for continuous dates

Testing/Fixing whitespaces

Testing/Fixing utf8 characters

Testing/Fixing NAs

dat <- data.frame(
  date = rep(as.Date("2014-01-01"),10),
  num = c(1:8,999,"n/a"),
  name = c("NULL","naa",rep("foo",8))
)

dat
test_NA(dat)
class(dat$num)
class(dat$name)
clean_dat <- fix_NA(dat, custom_NAs="naa")
clean_dat
class(clean_dat$num)
class(clean_dat$name)

More Repositories

1

wesanderson

A Wes Anderson color palette for R
R
1,766
star
2

markdown_science

Learn how to use markdown for science
TeX
300
star
3

holepunch

Make your R project Binder ready
R
249
star
4

rdrop2

Dropbox Interface from R
R
248
star
5

rstudio2019

Resources from my Rstudio::conf 2019 talk
220
star
6

life-hacks

Some life hacks documented
97
star
7

smb_git

A review paper describing how git can be used to improve reproducibility in science
Shell
85
star
8

rDrop

Programmatic interface to Dropbox
R
64
star
9

dashboard

A R package dashboard generator
CSS
44
star
10

rdat

An R interface to data Dat
R
43
star
11

ddd

data, data, data paper
TeX
43
star
12

ggplot-lecture

My lecture on ggplot at Cal (spring 2013)
TeX
41
star
13

Rtools

Sublime text package for R
Python
33
star
14

zenodo

Deposit data (or any research object) into Zenodo
R
17
star
15

woRstfigures

woRst R figures
12
star
16

fujifilm

Just a collection of my notes on the Fujifilm X-T2 and X-mount ecosystem
12
star
17

randNames

Random username (+other useful info) generator
R
11
star
18

binder-test

R
11
star
19

git_intro

A quick introduction to git
CSS
10
star
20

sheetseeR

A quick and easy interface to Google Docs + SheetSee.js
R
9
star
21

dcTemplate

Data Carpentry R lesson template
HTML
7
star
22

mozfest-data-lessons

Repo for the data carpentry session at Mozfest
5
star
23

dlab-advanced-r

Repo with material on learning testing, documentation, and package development in R.
5
star
24

ggit

Git visualization tools for git2r
R
5
star
25

mapboxer

Mapbox from R
HTML
5
star
26

software-mapping-workshop

Slides, notes, and resources for the CZI Essentials of Open Source grantee meeting, December 9th, 2020
4
star
27

colorPalette

A colorPalette API for R
4
star
28

webnativesci

OKFestival session on tools and skills for web native science
JavaScript
4
star
29

sevilla15

Repo for workshop at Sevilla, February 2015
CSS
3
star
30

coyote

Functions from my .rprofile now moved to a package
R
3
star
31

esa_data_viz

ESA 2013, Data Visualization in R workshop
R
3
star
32

Rtools3

R tools for Sublime text 3
Python
3
star
33

stat290

R
2
star
34

bmc_post

Set of files for BMC blog post
2
star
35

commcall

R
2
star
36

rProvenance

A prototype for a provenance library implementation in R
R
2
star
37

gigadb

R
2
star
38

RRR

The reproducibility lexicon project at BIDS
Python
2
star
39

rthings

Provides short lists of fake data for use in examples and demos
R
2
star
40

tradeoff

revisiting classic life-history tradeoffs with realistic assumptions.
R
2
star
41

almviz

ALM Visualization from the PLOS Hackathon
CSS
2
star
42

api-best-practices

plos comp bio paper draft
2
star
43

badge-test

1
star
44

roweb-test

HTML
1
star
45

ecotools

Find out what tools other scientists use to stay productive
1
star
46

distill

HTML
1
star
47

2014-02-13-UNSW

Repository for the UNSW Software Carpentry Bootcamp
Python
1
star
48

urssi_figures

Repository accompanying "A Survey of the State of the Practice for Research Software in the United States"
HTML
1
star
49

karthikram.github.com

My Jekyll-Bootstrap github page
JavaScript
1
star
50

binder-test-docker-tidyverse

R
1
star
51

misc-scripts

Python
1
star
52

nick_dates

R
1
star
53

dygraph-test

R
1
star
54

DataCabin

1
star
55

themegray4lyfe

HTML
1
star
56

2014-10-31-nw

Repository for the NorthWestern University bootcamp
Perl
1
star
57

species_interactions

1
star
58

eeguide

A guide to using the ecoengine
CSS
1
star
59

R-Icon

An alternative icon for R.
1
star
60

demo

R
1
star
61

antweb_paper

Paper on AntWeb
TeX
1
star
62

sparse

TeX
1
star
63

write-with-make

A quick repo with an example of how to write with Makefiles
TeX
1
star
64

ucsf19

Notes and slides from my workshop on open code at UCSF
1
star
65

shiny

Short Shiny tutorial
R
1
star