• Stars
    star
    121
  • Rank 293,924 (Top 6 %)
  • Language
    C++
  • Created almost 7 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

FTRL-Proximal with python bindings

FTRL-Proximal

This is an implementation of the FTRL-Proximal algorithm in C with python bindings. FTRL-Proximal is an algorithm for online learning which is quite successful in solving sparse problems. The implementation is based on the algorithm from the "Ad Click Prediction: a View from the Trenches" paper.

Some of the features:

  • Uses Open MP to parallelize training, and hence is very fast
  • The python code can operate directly on scipy CSR matrices

Pre-requisites

Dependensies:

  • It needs: numpy, scipy and open mp
  • If you use anaconda, it already has numpy, scipy
  • to install GOMP_4.0 for anaconda, use conda install libgcc

Building

cmake . && make
mv libftrl.so ftrl/
python setup.py install

If you don't have cmake, it's easy to install:

mkdir cmake && cd cmake
wget https://cmake.org/files/v3.10/cmake-3.10.0-Linux-x86_64.sh
bash cmake-3.10.0-Linux-x86_64.sh --skip-license
export CMAKE_HOME=`pwd`
export PATH=$PATH:$CMAKE_HOME/bin

Example

import numpy as np
import scipy.sparse as sp

from sklearn.metrics import roc_auc_score

import ftrl

X = [
    [1, 1, 0, 0, 0],
    [1, 1, 1, 0, 0],
    [0, 1, 1, 1, 0],
    [0, 0, 1, 1, 1],
    [0, 0, 0, 1, 1],   
]

X = sp.csr_matrix(X)
y = np.array([1, 1, 1, 0, 0], dtype='float32')

model = ftrl.FtrlProximal(alpha=1, beta=1, l1=10, l2=0)

# make 10 passes over the data
for i in range(10):
    model.fit(X, y)
    y_pred = model.predict(X)
    auc = roc_auc_score(y, y_pred)
    print('%02d: %.5f' % (i + 1, auc))

We can also use it to solve the regression problem:

from sklearn.metrics import mean_squared_error

y = np.array([1, 2, 3, 4, 5], dtype='float32')

model = ftrl.FtrlProximal(alpha=0.5, beta=1, l1=0, l2=0, model_type='regression')

# make 10 passes over the data
for i in range(10):
    model.fit(X, y)
    y_pred = model.predict(X)
    mse = mean_squared_error(y, y_pred)
    print('%02d: %.5f' % (i + 1, mse))

Use case

This library was used for the Criteo Ad Placement Challenge and showed very competitive performance. You can have a look at the solution here: https://github.com/alexeygrigorev/nips-ad-placement-challenge

In particular, it performed significantly faster than sklearn's Logistic Regression (a wrapper for LIBLINEAR):

  • sklearn: 1.2 hours to train, auc=0.734
  • libftrl-python: 2 minutes to train, auc=0.734

More Repositories

1

data-science-interviews

Data science interview questions and answers
HTML
7,905
star
2

outbrain-click-prediction-kaggle

Solution to the Outbrain Click Prediction competition
Python
144
star
3

datascience-recruitment-challenges

Home assignments for data science positions
Jupyter Notebook
143
star
4

mlbookcamp-code

The code from the Machine Learning Bookcamp book
Jupyter Notebook
119
star
5

clothing-dataset

Closing dataset, all classes
84
star
6

avito-duplicates-kaggle

Solution for Avito Duplicate Ads Detection competition
Python
60
star
7

unpossibly-instagram-challenge

Predicting the number of likes an instagram post will receive in 24 hours - winning solution
Python
56
star
8

nips-ad-placement-challenge

The winning solution to the Ad Placement Challenge (NIPS'17 Causal Inference and Machine Learning Workshop)
TeX
37
star
9

clothing-dataset-small

Clothing dataset, 10 classes
36
star
10

lightweight-mlops-zoomcamp

A stripped-down version MLOps Zoomcamp (1.5 hours workshop)
Jupyter Notebook
34
star
11

alexeygrigorev

27
star
12

leetcode-solutions

My solutions to some of leetcode challenges
C++
27
star
13

cikm-cup-2016-cross-device

Solution for the Cross-Device linking challenge from CIKM CUP 2016
Python
23
star
14

aws-lambda-docker

Using AWS Lambda with Docker to deploy a deep learning model
Jupyter Notebook
22
star
15

classifying-crisis-reports-dsc

The top 10 solution to the "Growing Instability: Classifying Crisis Reports" challenge
Python
20
star
16

serverless-deep-learning

Example from my "Serverless Deep Learning" talk
Jupyter Notebook
20
star
17

tensorflow-protobuf

Protobuf files from TensorFlow without TensorFlow
Python
18
star
18

mastering-java-data-science

The code for the book "Mastering Java for Data Science"
Java
18
star
19

wsdmcup17-vandalism-detection

The 2nd place solution for WSDM Cup 2017: Vandalism Detection
Python
17
star
20

kubeflow-deep-learning

Deploying a Keras model with KServe (formerly KFServing) and EKS
Python
15
star
21

mlbookcamp-page

The webpage for ML Bookcamp
HTML
13
star
22

keras-image-helper

A lightweight library for pre-processing images for pre-trained keras models
Jupyter Notebook
12
star
23

tflite-aws-lambda

Compiled TF-Lite for AWS Lambda
Shell
11
star
24

deep-learning-keras-aws-lambda

Jupyter Notebook
11
star
25

hands-on-mlops-workshop

MLOps Hands-on Guide: From Training to Deployment and Monitoring: A day long workshop about MLOps
Jupyter Notebook
11
star
26

warc-extractor

Extract text information from warc files
Java
10
star
27

student-acceptance-project

Building a model for predicting whether a student will be admitted to college. Done as a part of Project of the Week at DataTalks.Club
Jupyter Notebook
10
star
28

java-vk-oauth20

OAuth 2.0 interface for interacting with VK
Java
9
star
29

clickbait-challenge

The 3rd place solution to the clickbait challenge
Python
8
star
30

datasets

different datasets for private and public use
8
star
31

product-recommendation

Experimenting with recommender system techniques as a part of the project-of-the-week at DataTalks.Club
Jupyter Notebook
7
star
32

projects

Various projects
TeX
6
star
33

kubernetes-deep-learning

Deployng a Keras model with TF-Serving and EKS
Python
6
star
34

toloka-data-collection

Jupyter Notebook
6
star
35

ml-observability-workshop

Creating an end-to-end observability platform
Jupyter Notebook
5
star
36

codeforces-crawler

Crawls submissions from codeforces
Java
5
star
37

e2e-ml-workshop

The code for the introduction to end-to-end machine learning workshop (based on Machine Learning Bookcamp)
Jupyter Notebook
5
star
38

dino-or-dragon

Using Stable Diffusion to generate images of dinosaurs and dragons
Jupyter Notebook
5
star
39

itshared-howto

Projects for itshared.org
Java
4
star
40

au-tomator-telegram-bot

Forwarding messages from Telegram to Slack and other things
Python
4
star
41

ml-projects

Jupyter Notebook
4
star
42

competitions

Finished but not cleaned stuff - for my future reference
Jupyter Notebook
4
star
43

aws-lambda-model-deployment-workshop

The materials for the model deployment workshop
Jupyter Notebook
4
star
44

large-datasets

A bunch of large datasets that I don't want to put in the other dataset repo
3
star
45

kaggle

Scripts from Kaggle competitions
Jupyter Notebook
3
star
46

avito-page-view-prediction-boosters

Solution for Avito Page View prediction competition (Avito BI contest task 3 on boosters)
Python
3
star
47

wiktionary-parser

Parsing dumps of wiktionary
Java
2
star
48

alexeygrigorev.github.io

My website
CSS
2
star
49

maven-repo

Artifacts not available on Maven Central
2
star
50

yt8m-kaggle

The solution to the YouTube-8M Video Understanding Challenge
Python
2
star
51

twitter-raffles

Doing give-aways on Twitter with Github actions
Python
2
star
52

aws-emr-spark-model-deployment-workshop

The code for the Spark model deployment workshop
Jupyter Notebook
2
star
53

slack-coffee-matcher

Randomly pairing people in a Slack group. Like Donut, but free and serverless
Python
2
star
54

fast-api-student-acceptance

A project for project-of-the-week from DataTalks.Club for deploying the student acceptance model with FastAPI
Python
2
star
55

datatasks

Data Tasks for Data Talks
1
star
56

airtable-mailchimp-poller

Exporting data from airtable to mailchimp
Jupyter Notebook
1
star
57

aws-ses-util

A handy util for sending emails with AWS SES
Python
1
star
58

frontend-projects

HTML
1
star
59

barololometer

Search engine results tracker and comparer
Java
1
star
60

ololearning-vector

Ololo Learning C
C++
1
star
61

zapier-telegram-bot

Telegram bot that sends messages to Zapier and other webhooks
Python
1
star
62

transcript-utils

Transcript yamlifier and timecode extractor
Python
1
star
63

namespacediscovery-pipeline

Mathematical namespace discovery
Python
1
star
64

rseq

Sequence pattern matching library
Java
1
star
65

TyrianMediawiki-Skin

Bootstrap-based skin for MediaWiki
CSS
1
star
66

notebooks

IPython notebooks
Jupyter Notebook
1
star
67

codeforces-solutions-java

Codeforces solutions and some algorithms
Java
1
star
68

product-review-clustering-project

Experimenting with different clustering techniques. Done as a part of Project of the Week at DataTalks.Club
1
star
69

ds-toolbox

Data Science toolbox for Java
Java
1
star
70

au-tomator-lambda

The Au-Tomator Slack bot as an AWS Lambda function
Jupyter Notebook
1
star
71

JLP

Java Language Processing: retrive identifiers from java
Java
1
star
72

kfserving-keras-transformer

A transformer for KFServing that users keras_image_helper
Python
1
star
73

search-engine-workshop

1
star
74

rest-crawler

A REST API for crawling web pager
Java
1
star
75

llm-zoomcamp

Repo and codespace for LLM Zoomcamp
1
star