• Stars
    star
    465
  • Rank 94,287 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    GNU General Publi...
  • Created almost 8 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Detects clickbait headlines using deep learning.

Clickbait Detector

Detects clickbait headlines using deep learning.

Find the Chrome Extension here ( built by rahulkapoor90 )

The doi for this project is https://doi.org/10.17605/OSF.IO/T3UJ9

Requirements

  • Python 2.7.12
  • Keras 1.2.1
  • Tensorflow 0.12.1
  • Numpy 1.11.1
  • NLTK 3.2.1

Getting Started

  1. Install a virtualenv in the project directory

    virtualenv venv
    
  2. Activate the virtualenv

    • On Windows:

      cd venv/Scripts
      activate
      
    • On Linux

      source venv/bin/activate
      
  3. Install the requirements

     pip install -r requirements.txt
    
  4. Try it out! Try running one of the examples.

Accuracy

Training Accuracy after 25 epochs = 93.8 % (loss = 0.1484)

Validation Accuracy after 25 epochs = 90.15 % (loss = 0.2670)

Examples

$ python src/detect.py "Novak Djokovic stunned as Australian Open title defence ends against Denis Istomin"
Using TensorFlow backend.
headline is 0.33 % clickbaity
$ python src/detect.py "Just 22 Cute Animal Pictures You Need Right Now"
Using TensorFlow backend.
headline is 85.38 % clickbaity
$ python src/detect.py " 15 Beautifully Created Doors You Need To See Before You Die. The One In Soho Blew Me Away"
Using TensorFlow backend.
headline is 52.29 % clickbaity
$ python src/detect.py "French presidential candidate Emmanuel Macrons anti-system angle is a sham | Philippe Marlire"
Using TensorFlow backend.
headline is 0.05 % clickbaity

Model Summary

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
embedding_1 (Embedding)          (None, 20, 30)        195000      embedding_input_1[0][0]          
____________________________________________________________________________________________________
convolution1d_1 (Convolution1D)  (None, 19, 32)        1952        embedding_1[0][0]                
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 19, 32)        128         convolution1d_1[0][0]            
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 19, 32)        0           batchnormalization_1[0][0]       
____________________________________________________________________________________________________
convolution1d_2 (Convolution1D)  (None, 18, 32)        2080        activation_1[0][0]               
____________________________________________________________________________________________________
batchnormalization_2 (BatchNorma (None, 18, 32)        128         convolution1d_2[0][0]            
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 18, 32)        0           batchnormalization_2[0][0]       
____________________________________________________________________________________________________
convolution1d_3 (Convolution1D)  (None, 17, 32)        2080        activation_2[0][0]               
____________________________________________________________________________________________________
batchnormalization_3 (BatchNorma (None, 17, 32)        128         convolution1d_3[0][0]            
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 17, 32)        0           batchnormalization_3[0][0]       
____________________________________________________________________________________________________
maxpooling1d_1 (MaxPooling1D)    (None, 1, 32)         0           activation_3[0][0]               
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 32)            0           maxpooling1d_1[0][0]             
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1)             33          flatten_1[0][0]                  
____________________________________________________________________________________________________
batchnormalization_4 (BatchNorma (None, 1)             4           dense_1[0][0]                    
____________________________________________________________________________________________________
activation_4 (Activation)        (None, 1)             0           batchnormalization_4[0][0]       
====================================================================================================
Total params: 201,533
Trainable params: 201,339
Non-trainable params: 194
____________________________________________________________________________________________________

Data

The dataset consists of about 12,000 headlines half of which are clickbait. The clickbait headlines were fetched from BuzzFeed, NewsWeek, The Times of India and, The Huffington Post. The genuine/non-clickbait headlines were fetched from The Hindu, The Guardian, The Economist, TechCrunch, The wall street journal, National Geographic and, The Indian Express.

Some of the data was from peterldowns's clickbait-classifier repository

Pretrained Embeddings

I used Stanford's Glove Pretrained Embeddings PCA-ed to 30 dimensions. This sped up the training.

Improving accuracy

To improve Accuracy,

  • Increase Embedding layer dimension (Currently it is 30) - src/preprocess_embeddings.py
  • Use more data
  • Increase vocabulary size - src/preprocess_text.py
  • Increase maximum sequence length - src/train.py
  • Do better data cleaning

More Repositories

1

Neural-Chatbot

A Neural Network based Chatbot
Python
68
star
2

happy-and-you-know-it

Facial Emotion Recognition using deep residual learning.
Jupyter Notebook
29
star
3

movie-recommendations

Recommend movies to users by RBMs, TruncatedSVD, Stochastic SVD and Variational Inference
Jupyter Notebook
18
star
4

VITacademics-Unlimited

VITAcademics student-login API server
HTML
16
star
5

gutenberg-stories

a collection of short stories from project gutenberg
HTML
3
star
6

github-network

A visualisation of VITians on Github
Python
2
star
7

saurabhmathur96.github.io

My portfolio
SCSS
2
star
8

Machine-Learning

Some Machine Learning algorithms that I implemented in python
Python
2
star
9

key-value-db

A simple datastore written in C, that stores data in the form of key-value pairs.
C
2
star
10

image-stitcher

Image warping, matching, stitching and, blending
Python
1
star
11

uncertain-classifier

A neural-network based image classifier that quantifies its uncertainty using Bayesian methods, as described in Kendall and Gal (2017)
Python
1
star
12

correlated-topic-model

An implementation of a topic model with logistic normal prior, as described by Blei and Lafferty (2007)
Python
1
star
13

error-poetry

1
star
14

Backpropagation

A modular implementation of the modern ANN in Numpy.
Python
1
star
15

Programming-in-Java-Lab-Cyclesheet

Java
1
star
16

catdog

Retrain VGG16 to differentiate cats from dogs
Python
1
star
17

ExSPN-SPFlow

ExSPN: Explaining Sum-Product Networks
Jupyter Notebook
1
star
18

digit-classification

Digit classification using Tensorflow.
Jupyter Notebook
1
star
19

fierce-shore-8534

Ping News Application Server Every post expires in 24 hours (86400 seconds).
JavaScript
1
star
20

cifar10-classification

Classifying objects into 10 classes using Convolutional and Residual Nets.
Python
1
star
21

jupyter-notebooks

A collection of my jupyter notebooks.
Jupyter Notebook
1
star
22

ExSPN

Jupyter Notebook
1
star
23

messaging-api

An FCM based messaging app written in Typescript
TypeScript
1
star
24

riviera-poll

Poll for artists to be invited in Riviera 2017
HTML
1
star
25

NICHDChallenge_GDM

Project Repository for the NICHD's Decoding Maternal Morbidity Data Challenge
Jupyter Notebook
1
star
26

CBSE-Results-Scraper

A python program that scrapes the CBSE-Results website
Python
1
star
27

ITE201-File-Compressor

Project for ITE201: Object Oriented Programming Concepts
HTML
1
star
28

caching-policies

Implementation of some advanced caching policies
Python
1
star
29

Automated-Image-Captioning-System

My undergraduate capstone project. A deep learning based image caption generator.
Jupyter Notebook
1
star
30

holmes-text-generator

Generates text in the style of Arthur Conan Doyle's Sherlock Canon using deep learning.
Python
1
star
31

Speech-to-Text-Engine

An end to end speech recognition system based on DeepSpeech and Film (Perez, Ethan, et al.)
Jupyter Notebook
1
star