• Stars
    star
    129
  • Rank 277,712 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created over 13 years ago
  • Updated about 12 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Social sentiment flagger intended to judge given text as: positive, neutral or negative.

Synt

http://github.com/Tawlk/synt

About

Synt (pronounced: "cent") is a python library for sentiment classification on social text.

The end-goal is to have a simple library that "just works". It should have an easy barrier to entry and be thoroughly documented.

Current Features

  • Can collect negative/positive tweets from twitter and store it to a local database (can also fetch a pre-existing samples database)
  • Can train a classifier based on a samples database
  • Can classifiy text and output a score between -1 and 1. (where -1 is negative, +1 is positive and anything close to 0 can be considered neutral)
  • abilitiy to collect, train, guess, and test (accuracy) from cli

Requirements

Usage / Installation

Note: Many of these commands have additional arguments you can pass, use the -h flag to get help on any particular command and see more options.

  1. Grab the latest synt:
```bash
pip install -e git+https://github.com/Tawlk/synt/#egg=synt
```
  1. Grab the sample database to train on (or build one (below)):
**Note: On your first run of any cli command a config will be copied into
~/.synt/config.py that you should configure. It uses sane defaults. The values should be
self-explanatory. This will only happen on the first run of synt.**

```bash
synt fetch --db_name "mysamples.db"
```

By default (with no db_name) it will be stored as 'samples.db'.

If you'd prefer to build a fresh sample db and have the time, just run collect with
the desired amount.

**Note: In order to collect you will require [kral](http://github.com/tawlk/kral), which is our
"social data gatherer" built in Python.**

```bash
synt collect --max_collect 10000 --db_name 'awesome.db'
```

**Note: You can also collect incrementally by providing the same db_name.**
  1. Train classifier
A basic example of training

```bash
synt train 'samples.db' 20000
```

Train takes two required arguments: a training database (name), and the amount of
samples to train on.
  1. Classifier accuracy
At this point you might want to see the classifiers accuracy on the
training data.

```bash
synt accuracy
```

Accuracy takes a number of testing samples. By default 25% of your training
sample count will be used as the testing set. You can over-ride this by
providing the --test_samples argument.

The database used for these testing samples will be the same as the database
used to train. The testing samples will be new samples and can be
guaranteed to be samples the classifier hasn't already seen.
  1. Guessing/classifying text
You should now have a trained classifier and its time to see
some classification of text.

```bash
synt guess
```

This will drop you into a synt prompt where you can write text and see
the score between -1 and 1.

You can alternativley also just classify text without having to drop into
a prompt:

```bash
synt guess --text "i like ponies and rainbows"
```

Notes

  • We have acheived best accuracy using stopwords filtering with tweets collected on negwords.txt and poswords.txt (see downloads).

  • In the future we will also add the MaxEnt and Decision tree classifiers and the functionality to do clasiffier voting.

  • Note that this is optimized for classification on social text as this is our primary usecase. However, with a little tweaking it should be possible to get good results on other corpus'.

This code is still in production; use at your own risk. You may be eaten by a grue.

Questions/Comments? Please check us out on IRC via irc://irc.freenode.net/#tawlk

More Repositories

1

youtube-dl

RIAA: Please go die in a fire.
Python
621
star
2

security-token-docs

Documentation of Security Tokens and their uses.
313
star
3

hyve

Social media crawling engine implemented totally browser side in JS
JavaScript
100
star
4

raspi-hd44780

Python library for controlling the HD44780 LCD using 6 GPIO pins on the Raspberry Pi.
Python
51
star
5

kral

Provides a unified way to collect and distribute data live from potentially any social network
Python
38
star
6

dotvim

My ultimate vim setup (of doom)
Vim Script
24
star
7

visodea

An open source Kickstarter clone, aimed for more project privacy and control.
JavaScript
14
star
8

osae

Open Sentiment Analysis Engine - Allows humans to flag sentiment for random Twitter samples. The data is saved for use with machine learning sentiment classification projects like Synt
Python
12
star
9

gest

Experimental sentiment classification library in Javascript
JavaScript
12
star
10

dotfiles

All my dotfiles that I port around from system to system with me
Python
11
star
11

django-barcode-auth

Allow django auth system login with 2d barcodes/scanner
Python
11
star
12

rant

CLI driven, blog aware static site generator in Python
Python
10
star
13

django-orbited-chat

Simple "Comet" chat service with log persistence using Django, Orbited, and STOMP
Python
10
star
14

grfety

Code for http://grfety.me - A full screen real-time multi-user drawing application using NodeJS, SockJS, and HTML5 canvas
JavaScript
9
star
15

ewrl

A set of functions for unrolling shortened URLs to their origin and fetching Titles/Descriptions
Python
7
star
16

tawlk-old

The old django codebase for tawlk: A flexible framework for searching, collecting, archiving, interpreting, and responding to public social media, with a scalable web interface.
JavaScript
6
star
17

github-forged-sigs

Demo of GitHub forging git signatures
5
star
18

pixy

Raspberry Pi driven platform for scriptable robotic photo/video capture using digital cameras
Python
5
star
19

influencer

Visually compare the social influence of 2 keywords, using Kral
JavaScript
4
star
20

docker-kms-decrypt

Minimal container to decrypt AWS KMS secrets from env vars to an env file for sourcing to another container
Shell
4
star
21

baren

Baren.js is a 1d/2d barcode generator in javascript forked from bwip-js. This fork is intended to continue project development with public git version control as well as to rewrite the code library to have more modular API suitable for selective importing and easy integration into modern JS frameworks.
JavaScript
4
star
22

diblog

Drop-in PHP Blog with all the main features you would expect from a much heavier blogging solution. Drop it on a server and enjoy.
PHP
3
star
23

stuffdb

A tool to make it trivial to use a mobile phone or web browser to digitally attach special instructions, considerations, safety concerns, notes, or tips about any object in a lab/shop/office environment accessible by generated QR code labels.
Python
3
star
24

scripts

Various random scripts I have written for fun, clients, and to make my life easier.
Shell
3
star
25

python-abx

A simple GTK application for performing ABX audio compression testing using python and gstreamer.
Python
3
star
26

euler-solutions

My personal Project Euler solutions. Spoilers, obviously. This will probably take a very long time to complete...
Python
3
star
27

njkt

Proxy a given web site and inject scripts into it. Also remote reload and remotely change url for complete browser remote control.
JavaScript
3
star
28

craigmailer

Scrapes ads from Craigslist topics then sends them to you as emails you can reply to.
PHP
2
star
29

password-store

My fork of 'pass' ( https://passwordstore.org ) with support for multiple git repos
Shell
2
star
30

infra

Terraform configuration for all my personal infrastructure
HCL
2
star
31

weechat-helm

Helm chart for deploying weechat with access via ssh and relay
Smarty
2
star
32

digallery

Drop-in gallery creator with optional fancy stuff. Stick this in a folder full of images and enjoy.
PHP
2
star
33

pologen

Generates random convincing Poloroid(tm) collages from folders full of images.
PHP
2
star
34

createaband

Created to connect bands to musicians. Ran out of funding before it was done so... heres the code :-)
PHP
2
star
35

lrvick

The repository for my personal "rant" driven website
CSS
2
star
36

nodesb

WIP tool to compile nodejs projects into portable static binaries
Ruby
2
star
37

jquery-simpleslideshow

Most jquery slideshows are way over complicated to understand/use. I raged then made this.
JavaScript
2
star
38

difonts

Drop-in single-file solution for having dynamic text render in any TTF font you want in all major browsers.
PHP
2
star
39

eggweblog

Indexes and allows for searching of a directory of eggdrop-style IRC logs.
PHP
2
star
40

zorkin

Play zork online :D
1
star
41

DarkGmailComplete

Chrome Extension for completing the official Dark Gmail theme.
JavaScript
1
star
42

ircyte

Website for viewing and searching logs, users and stats for given IRC channels.
Python
1
star
43

synhax

Website code for synhax: Creative code snippets only a hacker would appreciate
Python
1
star
44

todoost

Todoost is a Django/HTML5 universal mobile/web instant Todo list app.
JavaScript
1
star
45

cloudprint

Google cloudprint proxy
Python
1
star
46

grapheneos-build

Build repo for GrapheneOS using the AOSP-build build system
Makefile
1
star
47

murfie

A Python CLI tool and library for interacting with the undocumented murfie.com API
Python
1
star
48

cli.life

A curated directory of tools for a CLI driven lifestyle.
1
star
49

jquery-githubindex

jQuery plugin to render an date-sorted html index with stats for any combination of repos for given Github users or individual repos. Useful for portfolios.
JavaScript
1
star
50

jquery-githubnav

jQuery driven stack to render a tree-driven explorer of all your Github code via the Github v3 API
1
star
51

essid-ad

Disposable battery powered ESP8266 ESSID advertisement beacons
Makefile
1
star
52

nfc-switch

Control the state of latching relays via NFC tags.
Arduino
1
star
53

sokral

Distant PHP ancestor of tawlk. Social search engine that that will scale to 3 or 4 users!
JavaScript
1
star
54

provisor

Server that provisions new accounts on a Linux system. Can be configured to allow new users to set up accounts for themselves. Ideal for hackerspaces and community shell servers.
Python
1
star
55

python-boilerplate

Python boilerplate project for a library with a class and testing/linting/coverage/compatibility via a makefile
Makefile
1
star
56

QAV250-Parts

1
star
57

weechat-docker

Docker container that exposes a weechat/tmux session over ssh
Shell
1
star
58

lrvick-old

The old django tree from my website (lrvick.net) before I switched everything over to rant
Python
1
star
59

gibrsh

Showcasing social media stupidity in real-time
1
star
60

jquery-picasagallery

Create a simple browsable image gallery from a given public Picasa account.
JavaScript
1
star
61

hdwallet

A CLI based hierarchical deterministic cryptocurrency wallet generator based on a 24 word mnemonic seed.
1
star