• Stars
    star
    169
  • Rank 224,453 (Top 5 %)
  • Language
    R
  • License
    Other
  • Created about 9 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

✂️ Extract Tables from Microsoft Word Documents with R

Travis-CI Build Status AppVeyor Build Status Coverage Status CRAN_Status_Badge

docxtractr

Extract Data Tables and Comments from ‘Microsoft’ ‘Word’ Documents

Description

An R package for extracting tables & comments out of Word documents (docx). Development versions are available here and production versions are on CRAN.

Microsoft Word docx files provide an XML structure that is fairly straightforward to navigate, especially when it applies to Word tables. The docxtractr package provides tools to determine table count, table structure and extract tables from Microsoft Word docx documents.

Many tables in Word documents are in twisted formats where there may be labels or other oddities mixed in that make it difficult to work with the underlying data. docxtractr provides a function—assign_colnames—that makes it easy to identify a particular row in a scraped (or any, really) data.frame as the one containing column names and have it become the column names, removing it and (optionally) all of the rows before it (since that’s usually what needs to be done).

What’s in the tin?

The following functions are implemented:

  • read_docx: Read in a Word document for table extraction
  • docx_describe_tbls: Returns a description of all the tables in the Word document
  • docx_describe_cmnts: Returns a description of all the comments in the Word document
  • docx_extract_tbl: Extract a table from a Word document
  • docx_extract_all_cmnts: Extract comments from a Word document
  • docx_extract_all_tbls: Extract all tables from a Word document (docx_extract_all is now deprecated)
  • docx_tbl_count: Get number of tables in a Word document
  • docx_cmnt_count: Get number of comments in a Word document
  • assign_colnames: Make a specific row the column names for the specified data.frame
  • mcga : Make column names great again
  • set_libreoffice_path: Point to Local soffice.exe File

The following data file are included:

  • system.file("examples/data.docx", package="docxtractr"): Word docx with 1 table
  • system.file("examples/data3.docx", package="docxtractr"): Word docx with 3 tables
  • system.file("examples/none.docx", package="docxtractr"): Word docx with 0 tables
  • system.file("examples/complex.docx", package="docxtractr"): Word docx with non-uniform tables
  • system.file("examples/comments.docx", package="docxtractr"): Word docx with comments
  • system.file("examples/realworld.docx", package="docxtractr"): A “real world” Word docx file with tables of all shapes and sizes
  • system.file("examples/trackchanges.docx", package="docxtractr"): Word docx with track changes in a table

Installation

# devtools::install_github("hrbrmstr/docxtractr")
# OR 
install.packages("docxtractr")

Usage

library(docxtractr)
library(tibble)
library(dplyr)

# current version
packageVersion("docxtractr")
#> [1] '0.6.0'
# one table
doc <- read_docx(system.file("examples/data.docx", package="docxtractr"))

docx_tbl_count(doc)
#> [1] 1

docx_describe_tbls(doc)
#> Word document [/Library/Frameworks/R.framework/Versions/3.5/Resources/library/docxtractr/examples/data.docx]
#> 
#> Table 1
#>   total cells: 16
#>   row count  : 4
#>   uniform    : likely!
#>   has header : likely! => possibly [This, Is, A, Column]

docx_extract_tbl(doc, 1)
#> # A tibble: 3 x 4
#>   This  Is      A     Column  
#>   <chr> <chr>   <chr> <chr>   
#> 1 1     Cat     3.4   Dog     
#> 2 3     Fish    100.3 Bird    
#> 3 5     Pelican -99   Kangaroo

docx_extract_tbl(doc)
#> # A tibble: 3 x 4
#>   This  Is      A     Column  
#>   <chr> <chr>   <chr> <chr>   
#> 1 1     Cat     3.4   Dog     
#> 2 3     Fish    100.3 Bird    
#> 3 5     Pelican -99   Kangaroo

docx_extract_tbl(doc, header=FALSE)
#> NOTE: header=FALSE but table has a marked header row in the Word document
#> # A tibble: 4 x 4
#>   V1    V2      V3    V4      
#>   <chr> <chr>   <chr> <chr>   
#> 1 This  Is      A     Column  
#> 2 1     Cat     3.4   Dog     
#> 3 3     Fish    100.3 Bird    
#> 4 5     Pelican -99   Kangaroo

# url 

budget <- read_docx("http://rud.is/dl/1.DOCX")

docx_tbl_count(budget)
#> [1] 2

docx_describe_tbls(budget)
#> Word document [http://rud.is/dl/1.DOCX]
#> 
#> Table 1
#>   total cells: 24
#>   row count  : 6
#>   uniform    : likely!
#>   has header : unlikely
#> 
#> Table 2
#>   total cells: 28
#>   row count  : 4
#>   uniform    : likely!
#>   has header : unlikely

docx_extract_tbl(budget, 1)
#> # A tibble: 5 x 4
#>   ``                                 `Short-term Portfolio` `Long-term Portfolio` `Total Portfolio Values`
#>   <chr>                              <chr>                  <chr>                 <chr>                   
#> 1 Portfolio Balance (Market Value) * $  123,651,911         $ 294,704,136         $ 418,356,047           
#> 2 Effective Yield                    0.16 %                 1.42 %                1.05 %                  
#> 3 Avg. Weighted Maturity             11 Days                2.4 Years             1.7 Years               
#> 4 Net Earnings                       $      18,470          $      350,554        $      369,024          
#> 5 Benchmark**                        0.02 %                 0.41 %                0.27 %

docx_extract_tbl(budget, 2) 
#> # A tibble: 3 x 7
#>   ``                   `Amount of Funds … Maturity  `Effective Yiel… `Interpolated Y… `Total Return  … `Total Return  …
#>   <chr>                <chr>              <chr>     <chr>            <chr>            <chr>            <chr>           
#> 1 Short-Term Portfolio $ 123,651,911      11 days   0.16 %           0.01 %           0.013            0.160           
#> 2 Long-Term Portfolio  $ 294,704,136      2.4 years 1.42 %           0.41 %           0.437            0.250           
#> 3 Total Portfolio      $ 418,356,047      1.7 years 1.05 %           0.27 %           0.298            0.222

# three tables
doc3 <- read_docx(system.file("examples/data3.docx", package="docxtractr"))

docx_tbl_count(doc3)
#> [1] 3

docx_describe_tbls(doc3)
#> Word document [/Library/Frameworks/R.framework/Versions/3.5/Resources/library/docxtractr/examples/data3.docx]
#> 
#> Table 1
#>   total cells: 16
#>   row count  : 4
#>   uniform    : likely!
#>   has header : likely! => possibly [This, Is, A, Column]
#> 
#> Table 2
#>   total cells: 12
#>   row count  : 4
#>   uniform    : likely!
#>   has header : likely! => possibly [Foo, Bar, Baz]
#> 
#> Table 3
#>   total cells: 14
#>   row count  : 7
#>   uniform    : likely!
#>   has header : likely! => possibly [Foo, Bar]

docx_extract_tbl(doc3, 3)
#> # A tibble: 6 x 2
#>   Foo   Bar  
#>   <chr> <chr>
#> 1 Aa    Bb   
#> 2 Dd    Ee   
#> 3 Gg    Hh   
#> 4 1     2    
#> 5 Zz    Jj   
#> 6 Tt    ii

# no tables
none <- read_docx(system.file("examples/none.docx", package="docxtractr"))

docx_tbl_count(none)
#> [1] 0

# wrapping in try since it will return an error
# use docx_tbl_count before trying to extract in scripts/production
try(docx_describe_tbls(none))
#> No tables in document
try(docx_extract_tbl(none, 2))

# 5 tables, with two in sketchy formats
complx <- read_docx(system.file("examples/complex.docx", package="docxtractr"))

docx_tbl_count(complx)
#> [1] 5

docx_describe_tbls(complx)
#> Word document [/Library/Frameworks/R.framework/Versions/3.5/Resources/library/docxtractr/examples/complex.docx]
#> 
#> Table 1
#>   total cells: 16
#>   row count  : 4
#>   uniform    : likely!
#>   has header : likely! => possibly [This, Is, A, Column]
#> 
#> Table 2
#>   total cells: 12
#>   row count  : 4
#>   uniform    : likely!
#>   has header : likely! => possibly [Foo, Bar, Baz]
#> 
#> Table 3
#>   total cells: 14
#>   row count  : 7
#>   uniform    : likely!
#>   has header : likely! => possibly [Foo, Bar]
#> 
#> Table 4
#>   total cells: 11
#>   row count  : 4
#>   uniform    : unlikely => found differing cell counts (3, 2) across some rows
#>   has header : likely! => possibly [Foo, Bar, Baz]
#> 
#> Table 5
#>   total cells: 21
#>   row count  : 7
#>   uniform    : likely!
#>   has header : unlikely

docx_extract_tbl(complx, 3, header=TRUE)
#> # A tibble: 6 x 2
#>   Foo   Bar  
#>   <chr> <chr>
#> 1 Aa    Bb   
#> 2 Dd    Ee   
#> 3 Gg    Hh   
#> 4 1     2    
#> 5 Zz    Jj   
#> 6 Tt    ii

docx_extract_tbl(complx, 4, header=TRUE)
#> # A tibble: 3 x 3
#>   Foo   Bar   Baz  
#>   <chr> <chr> <chr>
#> 1 Aa    BbCc  <NA> 
#> 2 Dd    Ee    Ff   
#> 3 Gg    Hh    ii

docx_extract_tbl(complx, 5, header=TRUE)
#> # A tibble: 6 x 3
#>   Foo   Bar   Baz  
#>   <chr> <chr> <chr>
#> 1 Aa    Bb    Cc   
#> 2 Dd    Ee    Ff   
#> 3 Gg    Hh    Ii   
#> 4 Jj88  Kk    Ll   
#> 5 ""    Uu    Ii   
#> 6 Hh    Ii    h

# a "real" Word doc
real_world <- read_docx(system.file("examples/realworld.docx", package="docxtractr"))

docx_tbl_count(real_world)
#> [1] 8

# get all the tables
tbls <- docx_extract_all_tbls(real_world)

# see table 1
tbls[[1]]
#> # A tibble: 9 x 9
#>   V1                V2        V3         V4                     V5                     V6        V7      V8     V9     
#>   <chr>             <chr>     <chr>      <chr>                  <chr>                  <chr>     <chr>   <chr>  <chr>  
#> 1 Lesson 1:  Step 1 <NA>      <NA>       <NA>                   <NA>                   <NA>      <NA>    <NA>   <NA>   
#> 2 Country           Birthrate Death Rate Population Growth 2005 Population Growth 2050 Relative… Social… Socia… Social…
#> 3 USA               2.06      0.51%      0.92%                  -0.06%                 Post- In… Female… Stabl… Good t…
#> 4 China             1.62      0.3%       0.6%                   -0.58%                 Post- In… Govern… Techn… Urbani…
#> 5 Egypt             2.83      0.41%      2.0%                   1.32%                  Mature I… Not ye… More … Slight…
#> 6 India             2.35      0.34%      1.56%                  0.76%                  Post Ind… Econom… Pover… Becomi…
#> 7 Italy             1.28      0.72%      0.35%                  -1.33%                 Late Pos… Stable… Peopl… Better…
#> 8 Mexico            2.43      0.25%      1.41%                  0.96%                  Mature I… Better… Emigr… Econom…
#> 9 Nigeria           4.78      0.26%      2.46%                  3.58%                  End of M… Disease Peopl… People…

# make table 1 better
assign_colnames(tbls[[1]], 2)
#> # A tibble: 7 x 9
#>   Country Birthrate `Death Rate` `Population Grow… `Population Grow… `Relative place… `Social Factors… `Social Factors…
#>   <chr>   <chr>     <chr>        <chr>             <chr>             <chr>            <chr>            <chr>           
#> 1 USA     2.06      0.51%        0.92%             -0.06%            Post- Industrial Female Independ… Stable Birth Ra…
#> 2 China   1.62      0.3%         0.6%              -0.58%            Post- Industrial Government inte… Technology      
#> 3 Egypt   2.83      0.41%        2.0%              1.32%             Mature Industri… Not yet industr… More children n…
#> 4 India   2.35      0.34%        1.56%             0.76%             Post Industrial  Economic growth  Poverty         
#> 5 Italy   1.28      0.72%        0.35%             -1.33%            Late Post indus… Stable birth ra… People marry la…
#> 6 Mexico  2.43      0.25%        1.41%             0.96%             Mature Industri… Better health c… Emigration      
#> 7 Nigeria 4.78      0.26%        2.46%             3.58%             End of Mechaniz… Disease          People marry ea…
#> # ... with 1 more variable: `Social Factors 3` <chr>

# make table 1's column names great again 
mcga(assign_colnames(tbls[[1]], 2))
#> # A tibble: 7 x 9
#>   country birthrate death_rate population_growt… population_growt… relative_place_in… social_factors_1 social_factors_2
#>   <chr>   <chr>     <chr>      <chr>             <chr>             <chr>              <chr>            <chr>           
#> 1 USA     2.06      0.51%      0.92%             -0.06%            Post- Industrial   Female Independ… Stable Birth Ra…
#> 2 China   1.62      0.3%       0.6%              -0.58%            Post- Industrial   Government inte… Technology      
#> 3 Egypt   2.83      0.41%      2.0%              1.32%             Mature Industrial  Not yet industr… More children n…
#> 4 India   2.35      0.34%      1.56%             0.76%             Post Industrial    Economic growth  Poverty         
#> 5 Italy   1.28      0.72%      0.35%             -1.33%            Late Post industr… Stable birth ra… People marry la…
#> 6 Mexico  2.43      0.25%      1.41%             0.96%             Mature Industrial  Better health c… Emigration      
#> 7 Nigeria 4.78      0.26%      2.46%             3.58%             End of Mechanizat… Disease          People marry ea…
#> # ... with 1 more variable: social_factors_3 <chr>

# see table 5
tbls[[5]]
#> # A tibble: 5 x 6
#>   V1                V2      V3            V4        V5        V6      
#>   <chr>             <chr>   <chr>         <chr>     <chr>     <chr>   
#> 1 Lesson 2:  Step 1 <NA>    <NA>          <NA>      <NA>      <NA>    
#> 2 Nigeria           Default Prediction    + 5 years +15 years -5 years
#> 3 Birth rate        4.78    Goes Down     4.76      4.72      4.79    
#> 4 Death rate        0.36%   Stay the Same 0.42%     0.52%     0.3%    
#> 5 Population growth 3.58%   Goes Down     3.02%     2.32%     4.38%

# make table 5 better
assign_colnames(tbls[[5]], 2)
#> # A tibble: 3 x 6
#>   Nigeria           Default Prediction    `+ 5 years` `+15 years` `-5 years`
#>   <chr>             <chr>   <chr>         <chr>       <chr>       <chr>     
#> 1 Birth rate        4.78    Goes Down     4.76        4.72        4.79      
#> 2 Death rate        0.36%   Stay the Same 0.42%       0.52%       0.3%      
#> 3 Population growth 3.58%   Goes Down     3.02%       2.32%       4.38%

# preserve lines
intracell_whitespace <- read_docx(system.file("examples/preserve.docx", package="docxtractr"))
docx_extract_all_tbls(intracell_whitespace, preserve=TRUE)
#> [[1]]
#> # A tibble: 6 x 2
#>   `Test1:` Apple                                  
#>   <chr>    <chr>                                  
#> 1 Test2:   Banana                                 
#> 2 Test3:   "Cranberry\nDark"                      
#> 3 Test4:   "Elephant, Farm\nGrandpa"              
#> 4 Test5:   "Hat\nIgloo\nJackrabbit"               
#> 5 Test6:   " \nQuestion1\n[ ] Underwear\n[ ] VM\n"
#> 6 Test7:   Warm                                   
#> 
#> [[2]]
#> # A tibble: 2 x 4
#>   ``    Kite  Lemur      Madagascar
#>   <chr> <chr> <chr>      <chr>     
#> 1 Nanny Open  Port       Quarter   
#> 2 Rain  Sand  Television Unicorn   
#> 
#> [[3]]
#> # A tibble: 2 x 2
#>   `Test8:` `Xylophone\nYew`             
#>   <chr>    <chr>                        
#> 1 Test9:   Zebra                        
#> 2 Test10:  "Apple2\nBanana2\nCranberry2"

docx_extract_all_tbls(intracell_whitespace)
#> [[1]]
#> # A tibble: 6 x 2
#>   `Test1:` Apple                                                                                        
#>   <chr>    <chr>                                                                                        
#> 1 Test2:   Banana                                                                                       
#> 2 Test3:   CranberryDark                                                                                
#> 3 Test4:   Elephant, FarmGrandpa                                                                        
#> 4 Test5:   HatIglooJackrabbit                                                                           
#> 5 Test6:   KiteLemurMadagascarNannyOpenPortQuarterRainSandTelevisionUnicorn Question1[ ] Underwear[ ] VM
#> 6 Test7:   Warm                                                                                         
#> 
#> [[2]]
#> # A tibble: 2 x 4
#>   ``    Kite  Lemur      Madagascar
#>   <chr> <chr> <chr>      <chr>     
#> 1 Nanny Open  Port       Quarter   
#> 2 Rain  Sand  Television Unicorn   
#> 
#> [[3]]
#> # A tibble: 2 x 2
#>   `Test8:` XylophoneYew           
#>   <chr>    <chr>                  
#> 1 Test9:   Zebra                  
#> 2 Test10:  Apple2Banana2Cranberry2

# comments
cmnts <- read_docx(system.file("examples/comments.docx", package="docxtractr"))

print(cmnts)
#> No tables in document
#> Word document [/Library/Frameworks/R.framework/Versions/3.5/Resources/library/docxtractr/examples/comments.docx]
#> 
#> Found 3 comments.
#> # A tibble: 1 x 2
#>   author    `# Comments`
#>   <chr>            <int>
#> 1 boB Rudis            3

glimpse(docx_extract_all_cmnts(cmnts))
#> Observations: 3
#> Variables: 5
#> $ id           <chr> "0", "1", "2"
#> $ author       <chr> "boB Rudis", "boB Rudis", "boB Rudis"
#> $ date         <chr> "2016-07-01T21:09:00Z", "2016-07-01T21:09:00Z", "2016-07-01T21:09:00Z"
#> $ initials     <chr> "bR", "bR", "bR"
#> $ comment_text <chr> "This is the first comment", "This is the second comment", "This is a reply to the second comm...

Track Changes (depends on pandoc being available)

# original
read_docx(
  system.file("examples/trackchanges.docx", package="docxtractr")
) %>% 
  docx_extract_all_tbls(guess_header = FALSE)
#> NOTE: header=FALSE but table has a marked header row in the Word document
#> [[1]]
#> # A tibble: 1 x 1
#>   V1   
#>   <chr>
#> 1 21

# accept
read_docx(
  system.file("examples/trackchanges.docx", package="docxtractr"),
  track_changes = "accept"
) %>% 
  docx_extract_all_tbls(guess_header = FALSE)
#> [[1]]
#> # A tibble: 1 x 1
#>   V1   
#>   <chr>
#> 1 2

# reject
read_docx(
  system.file("examples/trackchanges.docx", package="docxtractr"),
  track_changes = "reject"
) %>% 
  docx_extract_all_tbls(guess_header = FALSE)
#> [[1]]
#> # A tibble: 1 x 1
#>   V1   
#>   <chr>
#> 1 1

Test Results

library(docxtractr)
library(testthat)
#> 
#> Attaching package: 'testthat'
#> The following object is masked from 'package:dplyr':
#> 
#>     matches

date()
#> [1] "Tue Oct 23 08:10:10 2018"

test_dir("tests/")
#> ✔ | OK F W S | Context
#> ══ testthat results  ═════════════════════════════════════════════════
#> OK: 16 SKIPPED: 0 FAILED: 0
#> 
#> ══ Results ═══════════════════════════════════════════════════════════
#> Duration: 0.2 s
#> 
#> OK:       0
#> Failed:   0
#> Warnings: 0
#> Skipped:  0

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

More Repositories

1

hrbrthemes

🔏 Opinionated, typographic-centric ggplot2 themes and theme components
R
1,139
star
2

pewpew

⭐ ⭐ ⭐ Build your own IP Attack Maps with SOUND!
JavaScript
1,011
star
3

waffle

🍁 Make waffle (square pie) charts in R
R
747
star
4

ggalt

🌎 Extra Coordinate Systems, Geoms, Statistical Transformations & Scales for 'ggplot2'
R
641
star
5

markdowntemplates

✅🔻 A collection of alternate R markdown templates
CSS
318
star
6

vegalite

R ggplot2 "bindings" for Vega-Lite
JavaScript
158
star
7

ggchicklet

🀫 Create Chicklet (Rounded Segmented Column) Charts
HTML
157
star
8

streamgraph

〰️ htmlwidget for creating streamgraph visualizations in R
HTML
146
star
9

metricsgraphics

📈 htmlwidget interface to the MetricsGraphics.js D3 chart library
HTML
133
star
10

sergeant

💂‍♂️ Tools to Transform and Query Data with 'Apache' 'Drill'
R
126
star
11

statebins

Alternative to choropleths of US States ala http://bit.ly/statebins
R
120
star
12

splashr

💦 Tools to Work with the 'Splash' JavaScript Rendering Service in R
R
99
star
13

RSwitch

🎛 A small menubar app that allows you to switch between R versions quickly (if you have multiple versions of R framework installed).
Swift
99
star
14

newsflash

Tools to Work with the Internet Archive and GDELT Television Explorer in R
R
88
star
15

curlconverter

➰ ➡️ ➖ Translate cURL command lines into parameters for use with httr or actual httr calls (R)
R
88
star
16

darksky

☁️ R interface to the Dark Sky API [APPLE IS SHUTTING DOWN THE API 2022-12-31]
R
82
star
17

freebase

👃🏽A 'usethis'-like Package for Base R Pseudo-equivalents of 'tidyverse' Code
R
82
star
18

albersusa

Tools, shapefiles & data to work with an "AlbersUSA" composite projection in R
R
75
star
19

21-recipes

📕 An R/rtweet edition of Matthew A. Russell's Python Twitter Recipes Book
CSS
72
star
20

nominatim

🌏 Tools for Working with the 'Nominatim' API in R
R
71
star
21

hrbraddins

Additional Addins for RStudio
R
68
star
22

ggeconodist

📉 Create Diminutive Distribution Charts
R
67
star
23

decapitated

Headless 'Chrome' Orchestration in R
R
66
star
24

taucharts

📊 An R htmlwidget interface to the TauCharts javascript library
HTML
65
star
25

speedtest

📐 Measure upload/download speed/bandwidth for your network with R
R
64
star
26

ggcounty

🌐 Generate ggplot2 geom_map county maps
R
62
star
27

pluralize

An R package to "Pluralize and Singularize Any Word"
JavaScript
60
star
28

qrencoder

🔳 Make QR codes in R via libqrencode
C
59
star
29

quarto-organization-template

A Quarto RevealJS Organization Boilerplate Template You Can Clone And Modify Quickly
SCSS
59
star
30

swatches

🎨 Read, Inspect, Manipulate, and Save (ASE-only for save) Color Swatch Files
R
56
star
31

cdcfluview

😷 R package to Retrieve U.S. Flu Season Data from the CDC FluView Portal (WHO & ILINet)
R
56
star
32

rgeocodio

Tools to Work with the https://geocod.io/ API
R
56
star
33

cloc

🔢 R package to the perl cloc script (which counts blank lines, comment lines, and physical lines of source code in source files/trees/archives)
Perl
55
star
34

dtupdate

The dtupdate package has functions that try to make it easier to keep up with the non-CRAN universe
R
55
star
35

wayback

⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs
R
54
star
36

orangetext

🍊📄 : An #rstats project to keep track of The 🍊 One's speeches
R
53
star
37

rstudioconf2017

Slides/code/data from rstudio:: conf 2017
ASP
52
star
38

ndjson

♨️ Wicked-Fast Streaming 'JSON' ('ndjson') Reader in R
C++
51
star
39

ggvis-maps

Examples of various kinds of maps in ggvis (with & without shiny)
R
51
star
40

iptools

🍴 A toolkit for manipulating, validating and testing IP addresses and ranges, along with datasets relating to IP addresses. While it primarily has support for the IPv4 address space, more extensive IPv6 support is intended.
Scilab
51
star
41

tidyweb

Easily Install and Load Modern Web-Scraping Packages
R
50
star
42

weatherkit

🍎🌡🔎 Obtain Historical, Current, and Predictive Weather Data from Apple WeatherKit REST API in R
R
46
star
43

pdfbox

📄◻️ Create, Maniuplate and Extract Data from PDF Files (R Apache PDFBox wrapper)
Java
46
star
44

xmlview

📃 Format, Query and Pretty Print 'HTML'/'XML' Content in R (RStudio viewer or browser)
JavaScript
46
star
45

msgxtractr

📇 Extract contents from Outlook '.msg' files in R
C
44
star
46

worldtilegrid

🔲🗺 World Tile Grid Geom for ggplot2 [WIP]
R
43
star
47

QuickLookR

macOS QuickLook plugin for R save(), saveRDS() & feather files
C
42
star
48

voteogram

U.S. House and Senate Voting Cartogram Generators in R
R
41
star
49

nifffty

Small R package to post events to IFTTT Maker channel/recipes
R
40
star
50

githubdashboard

#rstats github flexdashboard
HTML
40
star
51

overpass

ℹ️ Tools to Work With the OpenStreetMap (OSM) Overpass API in R
HTML
40
star
52

hrbragg

Typography-centric Themes, Theme Components, and Utilities for 'ggplot2' and 'ragg'.
R
39
star
53

netintel

A collection of "network intelligence" utilities for R. ASN info, IP reputation, etc.
R
39
star
54

htmlunit

🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library
R
38
star
55

Rforecastio

☁️ Simple R interface to forecast.io weather data
R
38
star
56

omdbapi

R package to access the OMDB API (http://www.omdbapi.com/)
R
38
star
57

mactheknife

🦈 Various ‘macOS’-oriented Tools and Utilities in R
R
37
star
58

tdigest

Wicked Fast, Accurate Quantiles Using 't-Digests'
C
36
star
59

hrbrmisc

personal R pkg
R
35
star
60

webr-experiments

🕸️ 🧪 hrbrmstr's WebR Experiments
HTML
34
star
61

2017-year-in-review

Year in Review with R Rmd Template
34
star
62

crafter

🔬 An R package to work with PCAPs
R
33
star
63

longurl

ℹ️ Small R package for no-API-required URL expansion
R
32
star
64

greywatch

🕵🏽 macOS Big Sur desktop app to monitor active TCP connections through the lens of GreyNoise
Swift
32
star
65

ipv4-heatmap

Update to The Measurement Factory ipv4-heatmap codebase
C
32
star
66

jsonview

JSON pretty printer & viewer in R
JavaScript
30
star
67

rpwnd

🙅 The Most Benignly Malicious R Package on the Internet
R
30
star
68

statically

📸 Generate Webpage Screenshots Using the Statically API
R
28
star
69

rradar

🌊 Animate current U.S. NOAA NWS N0R Radar Images
R
27
star
70

archinfo

𖼆 Returns a list of running processes and the architecture (x86_64/arm64) they are running under.
C
26
star
71

ohq2quarto

Save an Observable HQ Notebook to a Quarto project
Rust
25
star
72

osqueryr

⁇ 'osquery' 'DBI' and 'dbplyr' Interface for R
R
25
star
73

ulid

⚙️ Universally Unique Lexicographically Sortable Identifiers in R
C++
25
star
74

webr-monaco-repl

🧪 🕸️ Monaco-powered WebR "REPL"
JavaScript
24
star
75

imprint

Create Customized 'ggplot2' and 'R Markdown' Themes for Your Organization
R
24
star
76

gdns

Tools to work with the Google DNS over HTTPS API in R
R
24
star
77

2020-george-floyd-protests

Code to collect data from various sources on the 2020 George Floyd protests.
HTML
23
star
78

fileio

⏳ Ephemeral File, Text or R Data Sharing with 'file.io'
R
23
star
79

widgetcard

Tools to Enable Easier Content Embedding in Tweets
R
23
star
80

attckr

⚔️MITRE ATT&CK Machinations in R
R
23
star
81

mgrs

🌐 An R Package to Convert 'MGRS' (Military Grid Reference System) References From/To Other Coordiante Systems
C
23
star
82

webr-app

🧪 🕸️ A Way Better Structured WebR Demo App
JavaScript
23
star
83

pubcrawl

🍺📖 Convert 'epub' Files to Text (Use https://github.com/ropensci/epubr instead)
R
22
star
84

xslt

lightweight XSLT processing package for R based on xmlwrapp
R
22
star
85

swiftr

Seamless R and Swift Integration
R
22
star
86

ggsolar

🪐 Generate "solar system" plots with {ggplot2}
R
22
star
87

facetedcountryheatmaps

Small sample Rmd to show how to make faceted country heatmaps in a couple different ways in R
HTML
22
star
88

ipapi

An R package to geolocate IPv4/6 addresses and/or domain names using ip-api.com's API
HTML
22
star
89

firasans

🔏 Fira Sans Condensed + Fira Mono Font Theme Based on hrbrthemes
R
22
star
90

drill-sergeant-rstats

📗 A Little Book About Using Apache Drill and R
R
22
star
91

slopegraph

A 'slopegraph' ('table-chart') generator in Python using Cairo/Raphaël. Currently handles a two column chart with _many_ output options. Look at the '/examples' directory for sample configurations, data files and output formats.
JavaScript
22
star
92

reveal-qmd

Chrome Extension To Reveal Observable Notebooks As Quarto QMD {ojs} Blocks & provide downloads of FileAttachments and zipped Quarto project
JavaScript
21
star
93

elpresidente

🇺🇸 Search and Extract Corpus Elements from 'The American Presidency Project'
R
21
star
94

wand

Use 'magic' to guess file types
R
21
star
95

warc

📇 Tools to Work with the Web Archive Ecosystem in R
R
21
star
96

rstudio-electron-quarto-installer

Download and install the latest macOS RStudio (electron) daily along with the latest Quarto pre-release
Shell
21
star
97

urlscan

👀 Analyze Websites and Resources They Request
R
21
star
98

wondr

Tools to Work with there CDC WONDER API in R
R
20
star
99

supercaliheatmapwidget

📅 Supercalifragilistic HTML Calendar Heatmaps
JavaScript
20
star
100

secede-2014

R dplyr/tidyr/rvest/TopoJSON tutorial using the 2014 Scotland secession vote
R
20
star