• Stars
    star
    172
  • Rank 219,862 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Deep learning models to identify clickbaits taking content into consideration

Clickbaits Revisited

This repository provides the code used for : https://www.linkedin.com/pulse/clickbaits-revisited-deep-learning-title-content-features-thakur

Data Collection

To run the code you must first collect the data:

Data Pre-Processing

After the data has been collected, you need to run the following files to obtain training and test data. The order is important!

- $ cd data_processing
- $ python create_data.py
- $ python html_scraper.py
- $ python feature_generation.py
- $ python merge_data.py
- $ python data_cleaning.py

After the steps above, you will end up with train.csv and test.csv in data/

Please note that the above steps will require a lot of memory. So, if you have anything less than 64GB, please modify the code according to your needs.

GloVe embeddings

Obtain GloVe embeddings from the following URL:

http://nlp.stanford.edu/data/glove.840B.300d.zip

Extract the zip and place the CSV in data/

Deepnets

After all the above steps, you are ready to go and play around with the deep neural networks to classify clickbaits

Change directory to deepnets/

cd deepnets/

The deepnets are as folllows:

LSTM_Title.py : LSTM on title text without GloVe embeddings
LSTM_Title_Content.py : LSTM on title text and content text without GloVe embeddings
LSTM_Title_Content_with_GloVe.py : LSTM on title and content text with GloVe emebeddings
TDD_Title_Content_with_Glove.py : Time distributed dense on title and content text with GloVe embeddings
LSTM_Title_Content_Numerical_with_GloVe.py : LSTM on title + content text with GloVe embeddings & dense net for numerical features.

Performance

The network with LSTM on title and content text with GloVe embeddings with numerical features achieves an accuracy of 0.996 during validation and 0.992 on the test set.

All models were trained on NVIDIA TitanX, Ubuntu 16.04 system with 64GB memory.

More Repositories

1

approachingalmost

Approaching (Almost) Any Machine Learning Problem
6,935
star
2

colabcode

Run VSCode (codeserver) on Google Colab or Kaggle Notebooks
Python
2,054
star
3

tez

Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.
Python
1,162
star
4

autoxgb

XGBoost + Optuna
Python
657
star
5

diffuzers

a web ui & api for 🤗 diffusers
Python
583
star
6

is_that_a_duplicate_quora_question

Python
441
star
7

approaching_almost_nlp

Approaching (Almost) Any Natural Language Processing Problem
341
star
8

mlspace

MLSpace: Hassle-free machine learning & deep learning development
Python
303
star
9

wtfml

WTFML: Well That's Fantastic Machine Learning
Python
295
star
10

bert-sentiment

Python
269
star
11

how-to-become-a-ds-in-30-days

How to become a data scientist in 30 days
215
star
12

mlframework

Python
199
star
13

long-text-token-classification

Python
162
star
14

greedyFeatureSelection

greedy feature selection based on ROC AUC
Python
126
star
15

bert-entity-extraction

Python
122
star
16

StableSAM

98
star
17

pysembler

An automatic ensembler of machine learning models in python
Python
67
star
18

captcha-recognition-pytorch

Python
59
star
19

sandesh

A simple app to send messages to Slack channels / members using webhook
Python
56
star
20

ml_dev_env

Machine Learning / Deep Learning Environment. Everywhere. Anywhere.
Dockerfile
50
star
21

commonlit-pairwise-model

Pairwise model for commonlit competition
Python
46
star
22

e01

Python
37
star
23

chaii-hindi-tamil-question-answering

chaii: hindi and tamil question answering
Python
36
star
24

melanoma-deep-learning

JavaScript
34
star
25

bert-tweet-sentiment

Python
31
star
26

automl_gpu

Python
26
star
27

walmart2015

Python
26
star
28

csv_test

26
star
29

AutoML

Python
24
star
30

imet-collection

Python
23
star
31

anime_hentai

Distinguishing between anime and hentai
Python
15
star
32

autonlp

AutoNLP: AutoML for NLP (WIP)
Python
12
star
33

abhishekkrthakur

9
star
34

ApproachingAlmostNLP

8
star
35

competitions-template

8
star
36

LCE

Local Collective Embeddings. Python translation of https://github.com/msaveski/LCE
Python
7
star
37

moa-kaggle

6
star
38

movie_recommender

6
star
39

av_minihack

Python
5
star
40

naivebees

Python
5
star
41

amazon_challenge

code for amazon employee access challenge
C
4
star
42

nuSVM

implementation of nusvm using cvxopt
Python
4
star
43

aaamlp_figures

4
star
44

ultramnist

3
star
45

testing

3
star
46

NDSB

national data science bowl @ kaggle
Python
2
star
47

finetuning_googlenet

Python
2
star
48

images

2
star
49

pyCoDi

implementation of CoDi saliency in python
Python
2
star
50

illumination-compensation

C++
2
star
51

fastFibonacci

fast fibonacci in cython
C
1
star
52

xformers

1
star
53

kaggle-afsis

Beating the Benchmark in Kaggle Afsis challenge
Python
1
star
54

EMC

em clustering
Python
1
star