• Stars
    star
    140
  • Rank 260,030 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 8 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Spam filtering made easy for you

spammy

PyPI version Build Status Python Versions percentagecov Requirements Status License

Author:Tasdik Rahman
Latest version:1.0.3

1   Overview

spammy : Spam filtering at your service

spammy powers the web app https://plino.herokuapp.com

2   Features

  • train the classifier on your own dataset to classify your emails into spam or ham
  • Dead simple to use. See usage
  • Blazingly fast once the classifier is trained. (See benchmarks)
  • Custom exceptions raised so that when you miss something, spammy tells you where did you go wrong in a graceful way
  • Written in uncomplicated python
  • Built on top of the giant shoulders of nltk

3   Example

[back to top]

  • Your data directory structure should be something similar to
$ tree /home/tasdik/Dropbox/projects/spammy/examples/test_dataset
/home/tasdik/Dropbox/projects/spammy/examples/test_dataset
├── ham
│   ├── 5458.2001-04-25.kaminski.ham.txt
│   ├── 5459.2001-04-25.kaminski.ham.txt
│   ...
│   ...
│   └── 5851.2001-05-22.kaminski.ham.txt
└── spam
    ├── 4136.2005-07-05.SA_and_HP.spam.txt
    ├── 4137.2005-07-05.SA_and_HP.spam.txt
    ...
    ...
    └── 5269.2005-07-19.SA_and_HP.spam.txt

Example

>>> import os
>>> from spammy import Spammy
>>>
>>> directory = '/home/tasdik/Dropbox/projects/spamfilter/data/corpus3'
>>>
>>> # directory structure
>>> os.listdir(directory)
['spam', 'Summary.txt', 'ham']
>>> os.listdir(os.path.join(directory, 'spam'))[:3]
['4257.2005-04-06.BG.spam.txt', '0724.2004-09-21.BG.spam.txt', '2835.2005-01-19.BG.spam.txt']
>>>
>>> # Spammy object created
>>> cl = Spammy(directory, limit=100)
>>> cl.train()
>>>
>>> SPAM_TEXT = \
... """
... My Dear Friend,
...
... How are you and your family? I hope you all are fine.
...
... My dear I know that this mail will come to you as a surprise, but it's for my
... urgent need for a foreign partner that made me to contact you for your sincere
... genuine assistance My name is Mr.Herman Hirdiramani, I am a banker by
... profession currently holding the post of Director Auditing Department in
... the Islamic Development Bank(IsDB)here in Ouagadougou, Burkina Faso.
...
... I got your email information through the Burkina's Chamber of Commerce
... and industry on foreign business relations here in Ouagadougou Burkina Faso
... I haven'disclose this deal to any body I hope that you will not expose or
... betray this trust and confident that I am about to repose on you for the
... mutual benefit of our both families.
...
... I need your urgent assistance in transferring the sum of Eight Million,
... Four Hundred and Fifty Thousand United States Dollars ($8,450,000:00) into
... your account within 14 working banking days This money has been dormant for
... years in our bank without claim due to the owner of this fund died along with
... his entire family and his supposed next of kin in an underground train crash
... since years ago. For your further informations please visit
... (http://news.bbc.co.uk/2/hi/5141542.stm)
... """
>>> cl.classify(SPAM_TEXT)
'spam'
>>>

3.1   Accuracy of the classifier

>>> from spammy import Spammy
>>> directory = '/home/tasdik/Dropbox/projects/spammy/examples/training_dataset'
>>> cl = Spammy(directory, limit=300)  # training on only 300 spam and ham files
>>> cl.train()
>>> data_dir = '/home/tasdik/Dropbox/projects/spammy/examples/test_dataset'
>>>
>>> cl.accuracy(directory=data_dir, label='spam', limit=300)
0.9554794520547946
>>> cl.accuracy(directory=data_dir, label='ham', limit=300)
0.9033333333333333
>>>

NOTE:

4   Installation

[back to top]

NOTE: spammy currently supports only python2

Install the dependencies first

$ pip install nltk==3.2.1, beautifulsoup4==4.4.1

To install use pip:

$ pip install spammy

or if you don't have pip``use ``easy_install

$ easy_install spammy

Or build it yourself (only if you must):

$ git clone https://github.com/tasdikrahman/spammy.git
$ python setup.py install

4.1   Upgrading

To upgrade the package,

$ pip install -U spammy

4.2   Installation behind a proxy

If you are behind a proxy, then this should work

$ pip --proxy [username:password@]domain_name:port install spammy

5   Benchmarks

[back to top]

Spammy is blazingly fast once trained

Don't believe me? Have a look

>>> import timeit
>>> from spammy import Spammy
>>>
>>> directory = '/home/tasdik/Dropbox/projects/spamfilter/data/corpus3'
>>> cl = Spammy(directory, limit=100)
>>> cl.train()
>>> SPAM_TEXT_2 = \
... """
... INTERNATIONAL MONETARY FUND (IMF)
... DEPT: WORLD DEBT RECONCILIATION AGENCIES.
... ADVISE: YOUR OUTSTANDING PAYMENT NOTIFICATION
...
... Attention
... A power of attorney was forwarded to our office this morning by two gentle men,
... one of them is an American national and he is MR DAVID DEANE by name while the
... other person is MR... JACK MORGAN by name a CANADIAN national.
... This gentleman claimed to be your representative, and this power of attorney
... stated that you are dead; they brought an account to replace your information
... in other to claim your fund of (US$9.7M) which is now lying DORMANT and UNCLAIMED,
...  below is the new account they have submitted:
...                     BANK.-HSBC CANADA
...                     Vancouver, CANADA
...                     ACCOUNT NO. 2984-0008-66
...
... Be further informed that this power of attorney also stated that you suffered.
... """
>>>
>>> def classify_timeit():
...    result = cl.classify(SPAM_TEXT_2)
...
>>> timeit.repeat(classify_timeit, number=5)
[0.1810469627380371, 0.16121697425842285, 0.16121196746826172]
>>>

6   Contributing

[back to top]

Refer CONTRIBUTING page for details

6.1   Roadmap

  • Include more algorithms for increased accuracy
  • python3 support

7   Licensing

[back to top]

Spammy is built by Tasdik Rahman and licensed under GPLv3.

spammy Copyright (C) 2016 Tasdik Rahman([email protected])

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

You can find a full copy of the LICENSE file here

8   Credits

[back to top]

If you'd like give me credit somewhere on your blog or tweet a shout out to @tasdikrahman, well hey, I'll take it.

9   Donation

If you have found my little bits of software of any use to you, you can help me pay my internet bills :)

Paypal badge

Instamojo

gratipay

patreon

More Repositories

1

vocabulary

[Not Maintained anymore] Python Module to get Meanings, Synonyms and what not for a given word
Python
558
star
2

spaceShooter

🎮 The classic retro game recreated using Pygame and python
Python
505
star
3

tnote

đź“‹ A command line note taking app so simple that even your grandparents will love it!
Python
230
star
4

xkcd-dl

⏬ Download ALL xkcd's which have been uploaded till date. Ever!
Python
145
star
5

plino

Flask based spam filtering system built on top of https://github.com/prodicus/spammy
CSS
74
star
6

terraform-gcp-examples

Terraform Google cloud platform examples
HCL
58
star
7

pyzipcode-cli

📬 Extract all possible meta data using Zipcode
Python
34
star
8

pyCalc

đź’» A GUI Calculator using Tkinter
Python
31
star
9

easyrbac

RBAC0 implementation (core part)
Python
30
star
10

markovipy

Yet another markov chain sentence generator
Python
26
star
11

thanos

A dead simple demonstration of SQL injection in an SQLite database
Python
24
star
12

pygame-boilerplate

A dead simple pygame boilerplate
Python
21
star
13

movieReviewsAnalysis

Some stupid Movie reviews analyzed and classified using nltk and scikitlearn
Jupyter Notebook
20
star
14

ansible-bootstrap-server

the bare essentials when you spin up a server
Shell
17
star
15

spamfilter

DEPRECATED: Go to https://github.com/prodicus/spammy for DEV version
Python
14
star
16

bhola

Bhola tells it's overseer on when the certificates tracked by bhola are about to expire
Ruby
13
star
17

opencv_edge_detection

[WIP]
Python
10
star
18

margo

An opinionated Slack bot written for SRMSE's slack channel
Python
9
star
19

dotfiles

🔧
Shell
7
star
20

terraform-google-network

Terraform module : GCP : for creation of VPC network
HCL
5
star
21

cgi_login

a simple login system using CGIHTTPServer and sqlite3
Python
5
star
22

datasets

random public datasets encountered by me
4
star
23

terraform-google-network-subnet

Terraform module : GCP : for creation of subnet inside a VPC network
HCL
4
star
24

web_crawlers

as the name says
Python
3
star
25

foodoh

OUTDATED: Go to https://github.com/foodoh. Made a food recommendor system at Startup Weekend Chennai '15. Named it "foodoh"!
Python
3
star
26

terraform-google-network-firewall

Terraform module : GCP : for creation of firewall rules inside the VPC
HCL
3
star
27

tasdikrahman.me

NOTE: has been moved over to tasdikrahman.com
CSS
2
star
28

tasdikrahman.com

[wip] mirroring tasdikrahman.me for now.
CSS
2
star
29

srm_search_engine

OUTDATED: Head over to https://github.com/SRMSE
Python
2
star
30

opencv3-ansible-vagrant-playbook

Ansible playbook for configuring OpenCV 3.1.0 with python3 on top of a Ubuntu 14.04.05 vagrant box/your preferred cloud service (AWS/DO droplet etc.)
2
star
31

k8s-cluster-upgrade-tool

OSS release of the internal k8s cluster upgrade tooling we have
Go
2
star
32

test-goreleaser

testing goreleaser
Go
1
star
33

shivangidwivedi.com

shivangidwivedi.com
CSS
1
star
34

zshrc

My zshrc
Shell
1
star
35

go-whois

1
star
36

infra

my setup on DO for my homelab on k8s
HCL
1
star
37

btech-major-project-thesis

"Processing textual notes using advanced image processing techniques"
TeX
1
star
38

talks

contains the talks that I have given so far
Python
1
star
39

scripts

This lives in my ~/bin/scripts directory and is added to my $PATH.
1
star
40

docker-flask

Simple example of integrating docker with web apps
Python
1
star
41

kubecon-2017-notes

1
star