• Stars
    star
    140
  • Rank 261,473 (Top 6 %)
  • Language
    R
  • License
    Other
  • Created over 6 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Join tables based on events occurring in sequence in a funnel.

funneljoin

Travis-CI Build Status

The goal of funneljoin is to make it easy to analyze behavior funnels. For example, maybe you’re interested in finding the people who visit a page and then register. Or you want all the times people click on an item and add it to their cart within 2 days. These can all be answered quickly with funneljoin’s after_join() or funnel_start() and funnel_step(). As funneljoin uses dplyr, it can also work with remote tables, but has only been tried on postgres.

For more examples of how to use funneljoin, check out the vignette, which shows different types of joins and the optional arguments, or this blog post, which showcases how to use funneljoin analyze questions and answers on StackOverflow.

Installation

You can install this package from GitHub with remotes:

library(remotes)
install_github("datacamp/funneljoin")

after_join()

library(dplyr)
library(funneljoin)

We’ll take a look at two tables that come with the package, landed and registered. Each has a column user_id and timestamp.

Let’s say we wanted to get the first time people landed and the first time afterward they registered. We would after_inner_join() with a first-firstafter type:

landed %>%
  after_inner_join(registered, 
                   by_user = "user_id",
                   by_time = "timestamp",
                   type = "first-firstafter",
                   suffix = c("_landed", "_registered"))
#> # A tibble: 5 x 3
#>   user_id timestamp_landed timestamp_registered
#>     <dbl> <date>           <date>              
#> 1       1 2018-07-01       2018-07-02          
#> 2       4 2018-07-01       2018-07-02          
#> 3       3 2018-07-02       2018-07-02          
#> 4       6 2018-07-07       2018-07-10          
#> 5       5 2018-07-10       2018-07-11

The first two arguments are the tables we’re joining, with the first table being the events that happen first. We then specify:

  • by_time: the time columns in each table. This would typically be a datetime or a date column. These columns are used to filter for time y being after or the same as time x.
  • by_user:the user or identity columns in each table. These must be identical for a pair of rows to match.
  • type: the type of funnel used to distinguish between event pairs, such as “first-first”, “last-first”, “any-firstafter”.
  • suffix (optional): just like dplyr’s join functions, this specifies what should be appended to the names of columns that are in both tables.

type can be any combination of first, last, any, and lastbefore with first, last, any, and firstafter. Some common ones you may use include:

  • first-first: Take the earliest x and y for each user before joining. For example, you want the first time someone entered an experiment, followed by the first time someone ever registered. If they registered, entered the experiment, and registered again, you do not want to include that person.
  • first-firstafter: Take the first x, then the first y after that. For example, you want when someone first entered an experiment and the first course they started afterwards. You don’t care if they started courses before entering the experiment.
  • lastbefore-firstafter: First x that’s followed by a y before the next x. For example, in last click paid ad attribution, you want the last ad someone clicked before the first subscription they did afterward.
  • any-firstafter: Take all Xs followed by the first Y after it. For example, you want all the times someone visited a homepage and their first product page they visited afterwards.
  • any-any: Take all Xs followed by all Ys. For example, you want all the times someone visited a homepage and all the product pages they saw afterward.

If your time and user columns have different names, you can work with that too:

landed <- landed %>%
  rename(landed_at = timestamp,
         user_id_x = user_id)

registered <- registered %>%
  rename(registered_at = timestamp,
         user_id_y = user_id)
landed %>%
  after_inner_join(registered, 
                   by_user = c("user_id_x" = "user_id_y"),
                   by_time = c("landed_at" = "registered_at"),
                   type = "first-first")
#> # A tibble: 4 x 3
#>   user_id_x landed_at  registered_at
#>       <dbl> <date>     <date>       
#> 1         1 2018-07-01 2018-07-02   
#> 2         3 2018-07-02 2018-07-02   
#> 3         6 2018-07-07 2018-07-10   
#> 4         5 2018-07-10 2018-07-11

funnel_start() and funnel_step()

Sometimes you have all the data you need in one table. For example, let’s look at this table of user activity on a website.

activity <- tibble::tribble(
  ~ "user_id", ~ "event", ~ "timestamp",
  1, "landing", "2019-07-01",
  1, "registration", "2019-07-02",
  1, "purchase", "2019-07-07",
  1, "purchase", "2019-07-10",
  2, "landing", "2019-08-01",
  2, "registration", "2019-08-15",
  3, "landing", "2019-05-01",
  3, "registration", "2019-06-01",
  3, "purchase", "2019-06-04",
  4, "landing", "2019-06-13"
)

We can use funnel_start() and funnel_step() to make an activity funnel. funnel_start() takes five arguments:

  • tbl: The table of events.
  • moment_type: The first moment, or event, in the funnel.
  • moment: The name of the column that indicates the moment_type.
  • tstamp: The name of the column with the timestamps of the moment.
  • user: The name of the column indicating the user who did the moment.
activity %>%
  funnel_start(moment_type = "landing", 
               moment = "event", 
               tstamp = "timestamp", 
               user = "user_id")
#> # A tibble: 4 x 2
#>   user_id timestamp_landing
#>     <dbl> <chr>            
#> 1       1 2019-07-01       
#> 2       2 2019-08-01       
#> 3       3 2019-05-01       
#> 4       4 2019-06-13

funnel_start() returns a table with the user_ids and a column with the name of your timestamp column, _, and the moment type. This table also includes metadata.

To add more moments to the funnel, you use funnel_step(). Since you’ve indicated in funnel_start() what columns to use for each part, now you only need to have the moment_type and the type of after_join() (e.g. “first-first”, “first-any”).

activity %>%
  funnel_start(moment_type = "landing", 
               moment = "event", 
               tstamp = "timestamp", 
               user = "user_id") %>%
  funnel_step(moment_type = "registration",
              type = "first-firstafter")
#> # A tibble: 4 x 3
#>   user_id timestamp_landing timestamp_registration
#>     <dbl> <chr>             <chr>                 
#> 1       3 2019-05-01        2019-06-01            
#> 2       4 2019-06-13        <NA>                  
#> 3       1 2019-07-01        2019-07-02            
#> 4       2 2019-08-01        2019-08-15

You can continue stacking on funnel_step() with more moments.

activity %>%
  funnel_start(moment_type = "landing", 
               moment = "event", 
               tstamp = "timestamp", 
               user = "user_id") %>%
  funnel_step(moment_type = "registration",
              type = "first-firstafter") %>%
  funnel_step(moment_type = "purchase",
              type = "first-any")
#> # A tibble: 5 x 4
#>   user_id timestamp_landing timestamp_registration timestamp_purchase
#>     <dbl> <chr>             <chr>                  <chr>             
#> 1       3 2019-05-01        2019-06-01             2019-06-04        
#> 2       1 2019-07-01        2019-07-02             2019-07-07        
#> 3       1 2019-07-01        2019-07-02             2019-07-10        
#> 4       2 2019-08-01        2019-08-15             <NA>              
#> 5       4 2019-06-13        <NA>                   <NA>

If you use a type that allows multiple moments of one type for a user, like “first-any”, you will get more rows per user rather than more columns. For example, user 1 had two purchases, so she now has two rows. The timestamp_landing and timestamp_registration is the same for both rows, but they have a different timestamp_purchase.

Finally, you can use the summarize_funnel() to understand how many and what percentage of people make it through to each next step of the funnel. We can also switch to funnel_steps() to shorten our code a bit: we give it a character vector of moment_types in order and the type for each step.

activity %>%
  funnel_start(moment_type = "landing", 
               moment = "event", 
               tstamp = "timestamp", 
               user = "user_id") %>%
  funnel_steps(moment_types = c("registration", "purchase"),
              type = "first-firstafter") %>%
  summarize_funnel()
#> # A tibble: 3 x 4
#>   moment_type  nb_step pct_cumulative pct_step
#>   <fct>          <int>          <dbl>    <dbl>
#> 1 landing            4           1      NA    
#> 2 registration       3           0.75    0.75 
#> 3 purchase           2           0.5     0.667

nb_step is how many users made it to each step, pct_cumulative is what percent that is out of the original step, and pct_step is what percentage that is out of those who made it to the previous step. So in our case, 2 people had a purchase, which is 50% of the people who landed but 66% of those who registered.

Reporting bugs and adding features

If you find any bugs or have a feature request or question, please create an issue. If you’d like to add a feature, tests, or other functionality, please also make an issue first and let’s discuss!

funneljoin was developed at DataCamp by Anthony Baker, David Robinson, and Emily Robinson. It is now maintained by the DataCamp engineering team.

More Repositories

1

datacamp-light

Convert any blog or website to an interactive learning platform for data science
TypeScript
1,235
star
2

datacamp-community-tutorials

Tutorials for DataCamp (www.datacamp.com)
Jupyter Notebook
965
star
3

course-resources-ml-with-experts-budgets

Further student resources for DrivenData's 'Machine Learning with the Experts: School Budgets' DataCamp course.
Jupyter Notebook
559
star
4

courses-introduction-to-python

Introduction to Python by Filip Schouwenaars
Shell
367
star
5

rdocumentation-2.0

đź“š RDocumentation provides an easy way to search the documentation for every version of every R package on CRAN and Bioconductor.
TypeScript
283
star
6

RDocumentation

R package to integrate rdocumentation.org into your R workflow
R
212
star
7

COVID-19-EDA-tutorial

This tutorial's purpose is to introduce people to the [2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE](https://github.com/CSSEGISandData/COVID-19) and how to explore it using some foundational packages in the Scientific Python Data Science stack.
Jupyter Notebook
159
star
8

shinybones

A highly opinionated framework for building shiny dashboards.
R
136
star
9

courses-introduction-to-r

Introduction to R by Jonathan Cornelissen
R
132
star
10

datacamp_facebook_live_nlp

DataCamp Facebook Live Code Along Session 1: Enjoy.
Jupyter Notebook
126
star
11

courses-intermediate-sql-queries

Intermediate SQL Queries by Nick Carchedi
Python
124
star
12

viewflow

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Python
122
star
13

careerhub-data

Certification Data
103
star
14

tutorial

R Package to convert R Markdown files to DataCamp Light HTML files
R
82
star
15

pythonwhat

Verify Python code submissions and auto-generate meaningful feedback messages.
Python
61
star
16

datacamp_facebook_live_titanic

DataCamp Facebook Live Code Along Session 2: Learn how to complete a Kaggle competition using exploratory data analysis, data munging, data cleaning and machine leaning. Enjoy.
Jupyter Notebook
61
star
17

courses-introduction-to-version-control-with-git

Introduction to Version Control with Git by DataCamp
Shell
57
star
18

misc-courses-HarvardX-IDS-Mod-1

R
46
star
19

tidymetrics

Dimensional modeling done the tidy way!
R
45
star
20

rdocumentation-app

The web application running rdocumentation.org.
JavaScript
44
star
21

courses-introduction-to-shell

Introduction to Shell by Greg Wilson
Shell
44
star
22

datacamp_facebook_live_ny_resolution

In this Facebook live code along session with Hugo Bowne-Anderson, you're going to check out Google trends data of keywords 'diet', 'gym' and 'finance' to see how they vary over time.
Jupyter Notebook
44
star
23

Market-Basket-Analysis-in-python-live-training

Live Training: Market Basket Analysis in Python
Jupyter Notebook
42
star
24

antlr-ast

Library for building abstract syntax trees from antlr parsers
Python
39
star
25

datacamp_facebook_live_intro_to_tidyverse

DataCamp Facebook Live Code Along Session 5: Learn how to to use tidy tools in R, such as dplyr and ggplot2, to intuitively explore & analyze your data.
38
star
26

testwhat

Write Submission Correctness Tests for R exercises
R
33
star
27

ast-viewer

app to visualize antlr parse tree ast
Vue
26
star
28

community-courses-kaggle-python-tutorial-on-machine-learning

Kaggle Python Tutorial on Machine Learning by Weston Stearns [OPEN]
Shell
26
star
29

datacamp_facebook_live_dataframed

DataCamp Facebook Live Code Along Session 4: Learn techniques that guests on the DataFramed podcast say are their favourite. Enjoy!
Jupyter Notebook
24
star
30

design-system

The DataCamp Design System, aka Waffles
TypeScript
21
star
31

jsconfig

All the dotfiles for javascript @ DataCamp
JavaScript
20
star
32

courses-intro-to-r-beta

Reworked introduction to R course hosted on DataCamp
R
20
star
33

community-groupby

This repository contains notebook + code for DataCamp community post on groupbys, split-apply-combine and pandas.
Jupyter Notebook
19
star
34

datacamp-light-wordpress

A WordPress Plugin that allows easy integration of the DataCamp Light interactive learning widget into posts and pages.
PHP
18
star
35

antlr-plsql

ANTLR
17
star
36

data-cleaning-with-pyspark-live-training

Live Training Session: Cleaning Data with Pyspark
Jupyter Notebook
14
star
37

Machine-Learning-With-XGboost-live-training

Live Training Session: Machine Learning with XGboost
Jupyter Notebook
14
star
38

awesome

A list of tools and resources we love
14
star
39

machine-learning-with-scikit-learn-live-training

Live Training Session: Machine Learning with Scikit Learn
Jupyter Notebook
13
star
40

waffles

Waffles is the DataCamp design system.
TypeScript
13
star
41

Hacker-Stats-in-Python-Live-Training

Live Training Session: Hacker Stats in Python
Jupyter Notebook
12
star
42

universal-rx-request

Library to do HTTP requests with RxJS
JavaScript
11
star
43

codemirror-6-getting-started

Getting started with CodeMirror 6, the popular code editor library
JavaScript
10
star
44

shinymetrics

Shiny modules for visualizing tidy metrics
R
10
star
45

community-courses-tidy-data-in-python-mini-course

Tidy Data in Python Mini-Course by Vincent Lan [OPEN]
Jupyter Notebook
9
star
46

data-analysis-in-sql-live-training

Live Training Session: Data Analysis in SQL
Jupyter Notebook
9
star
47

Applied-Machine-Learning-Ensemble-Modeling-live-training

Live Training Session: Applied Machine Learning:Ensemble Modeling
Jupyter Notebook
9
star
48

time-series-analysis-in-python-live-training

Live Training Session: Time Series Analysis in Python
Jupyter Notebook
8
star
49

projects-introduction-to-datacamp-projects-python-guided

Introduction to DataCamp Projects by Rasmus BĂĄĂĄth
Jupyter Notebook
8
star
50

authoring

CSS
8
star
51

datacamp-metoo-analysis

What can data science tell us about tweets with the #MeToo hashtag? This repository contains the code for the analysis
Jupyter Notebook
8
star
52

antlr-tsql

ANTLR
7
star
53

dbconnect-python

Easily connect to all internal databases. Only for internal use.
Python
7
star
54

community-hierarchical-indices

This repository contains notebook + code for DataCamp community post on hierarchical indices, groupby, split-apply-combine and pandas.
Jupyter Notebook
6
star
55

ggdc

Datacamp Themes for ggplot2.
R
6
star
56

dbconnectr

Fetch credentials on the fly as you connect to databases
R
6
star
57

working-with-text-data-in-python-live-training

Live Training Session: Working with Text Data in Python
Jupyter Notebook
6
star
58

community-courses-introduction-to-probability-and-data-labs

Introduction to Probability and Data - Labs by Mine Çetinkaya-Rundel [OPEN]
HTML
6
star
59

community-courses-education-data-analysis-primer-r-dplyr-and-plotly

Education Data Analysis Prime: R, dplyr, and Plotly by Jake Moody [OPEN]
R
6
star
60

react-native-survicate

React Native bindings for the Survicate SDK
Java
6
star
61

datacamp-thanksgiving-spending

How much money does America spend in the holiday season? Let's delve into the data to find out.
Jupyter Notebook
5
star
62

python-live-training-template

Jupyter Notebook
5
star
63

sheetwhat

Verify Spreadsheets and auto-generate meaningful feedback messages.
Python
4
star
64

community-courses-r-for-the-intimidated

R for the Intimidated by Annika Salzberg [OPEN]
4
star
65

community-courses-exploring-polling-data-in-r

Exploring Polling Data in R by Matt Isaacs [OPEN]
R
4
star
66

Visualizing-Big-Data-in-R-live-training

Live Training Session: Visualizing Big Data in R
Jupyter Notebook
4
star
67

community-courses-dataframe-manipulation-r-chinese

Dataframe Manipulation in R by Yao-Jen Kuo [OPEN]
R
3
star
68

sqlwhat

Python
3
star
69

Brand-Analysis-using-Social-Media-Data-in-R-Live-Training

Live Training Session: Brand Analysis using Social Media Data in R
Jupyter Notebook
3
star
70

base-plugin

JS boilerplate to create plugins
JavaScript
3
star
71

community-courses-introduction-to-r-chinese

Introduction to R (Chinese) by Jonathan Cornelissen/Translated by Yao-Jen Kuo [OPEN]
R
3
star
72

data-processing-in-shell-live-training

Jupyter Notebook
3
star
73

projects-introduction-to-datacamp-projects-r-guided

Introduction to DataCamp Projects by Rasmus BĂĄĂĄth
Jupyter Notebook
3
star
74

community-hurricane-visualizations

This repository contains notebook + code used to generate the figures in my DataCamp article 'How not to plot hurricane predictions'
Jupyter Notebook
3
star
75

protobackend

Python
3
star
76

IRkernel.testthat

R
3
star
77

projects-instructor-application-python

The application process for becoming a project instructor
Jupyter Notebook
3
star
78

workspace-codealong-afors

Explore US Air Force personnel data
Jupyter Notebook
3
star
79

asana

An R package for accessing the Asana API
R
3
star
80

shellwhat

Python
2
star
81

protowhat

Python
2
star
82

community-courses-reading-data-into-r-with-readr

Reading Data into R with readr by Hadley Wichkam [OPEN]
R
2
star
83

sme-dle-case-study-application

The audition portion of the application process for becoming a subject matter expert (SME) for Data Literacy and Essentials (DLE).
2
star
84

ipython_nose

Python
2
star
85

community-courses-introduction-to-r-french

Introduction to R (French) by Jonathan Cornelissen/Translated by Vincent Guyader [OPEN]
R
2
star
86

shell-notebook-sandbox

Jupyter Notebook
2
star
87

Visualizing-Big-Data-in-R-live-training2

DataCamp live training on visualizing big data in R
Jupyter Notebook
2
star
88

community-courses-r-yelp-and-the-search-for-good-indian-food

R, Yelp and the search for good Indian food by Weston Stearns [OPEN]
R
2
star
89

community-courses-introduction-to-r-portuguese

Introduction to R (Portugese) by Jonathan Cornelissen/Translated by Paulo Vasconcellos [OPEN]
R
2
star
90

catsim

Python
2
star
91

r-live-training-template

Jupyter Notebook
2
star
92

string-manipulation-in-sql-live-training

Live Training Session: String Manipulation with SQL
Jupyter Notebook
2
star
93

bash_kernel

Python
2
star
94

finddatasetpkgs

An R package that lists all CRAN R packages with "datasets" or "data sets" in the Title field of their DESCRIPTION file.
R
1
star
95

Creating-Dashboards-in-Shiny-R-live-training

Live Training Session: Creating Dashboards with Shiny R
Jupyter Notebook
1
star
96

Visualizing-Big-Data-in-R-live-training3

DataCamp live training on visualizing big data in R
Jupyter Notebook
1
star
97

Visualizing-Big-Data-in-R-live-training4

DataCamp live training on visualizing big data in R
Jupyter Notebook
1
star
98

setting-up-your-environment-in-r-live-training

Live Training Session: Setting Up Your Environment in R
Jupyter Notebook
1
star
99

workspace-tutorial-python-linear-regression

Notebook for a video tutorial on modeling in Python, focussed on linear regression.
Jupyter Notebook
1
star
100

shell-live-training-template

Jupyter Notebook
1
star