• Stars
    star
    100
  • Rank 340,703 (Top 7 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 7 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Twitter bot written in Python to replace yourself, search and publish news about specific subjects on Twitter. PyTweetBot use Machine Learning to filter interesting articles and web pages found on the web.


A Twitter bot and library written in Python to replace yourself, search and publish news about specific subjects on Twitter, and automatize content publishing.

Tweet

Join our community to create datasets and deep-learning models! Chat with us on Gitter and join the Google Group to collaborate with us.

PyPI version Documentation Status

This repository consists of:

  • pytweetbot.config : Configuration file management;
  • pytweetbot.db : MySQL database management;
  • pytweetbot.directmessages : Twitter direct message functions;
  • pytweetbot.docs : Documentation;
  • pytweetbot.executor : Function and objects to execute actions;
  • pytweetbot.friends : Function and objects to manage friends and followers;
  • pytweetbot.learning : Machine learning functions;
  • pytweetbot.mail : Mail functions;
  • pytweetbot.news : Manage news acquisition and sources;
  • pytweetbot.patterns : Python class patterns;
  • pytweetbot.retweet : Manage retweets and sources;
  • pytweetbot.stats : Statistics;
  • pytweetbot.templates : HTML templates for mail;
  • pytweetbot.tools : Tools;
  • pytweetbot.tweet : Manage tweets;
  • pytweetbot.twitter : Manage access to Twitter;

Getting started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

You need to following package to install pyTweetBot.

  • nltk
  • argparse
  • logging
  • tweepy
  • sklearn
  • pygithub
  • brotli
  • httplib2
  • urlparse2
  • HTMLParser
  • bs4
  • simplejson
  • dnspython
  • dill
  • lxml
  • sqlalchemy
  • feedparser
  • textblob
  • numpy
  • scipy
  • mysql-python

Installation

pip install pyTweetBot

Authors

License

This project is licensed under the GPLv3 License - see the LICENSE file for details.

Configuration

Configuration file

pyTweetBot takes its configuration in a JSON file which looks as follow :

{
	"database" :
	{
		"host" : "",
		"username" : "",
		"password" : "",
		"database" : ""
	},
	"email" : "[email protected]",
	"scheduler" :
	{
		"sleep": [6, 13]
	},
	"hashtags":
	[
	],
	"twitter" :
	{
		"auth_token1" : "",
		"auth_token2" : "",
		"access_token1" : "",
		"access_token2" : "",
		"user" : ""
	},
	"friends" :
	{
		"max_new_followers" : 40,
		"max_new_unfollow" : 40,
		"interval" : [15, 60],
		"unfollow_interval" : 604800
	},
	"forbidden_words" :
	[
	],
	"direct_message" : "",
	"tweet" : {
		"max_tweets" : 1800,
		"exclude" : [],
		"interval" : [4.0, 6.0],
		"intervals" : [
			{
				"day": 5,
				"start": 17,
				"end": 23,
				"interval" : [1.0, 3.0]
			}
		]
	},
	"news" :
	[
		{
			"keyword" : "",
			"countries" : ["us","fr"],
			"languages" : ["en","fr"],
			"hashtags" : []
		}
	],
	"rss" :
	[
		{"url" : "http://feeds.feedburner.com/TechCrunch/startups", "hashtags" : "#startups", "lang": ["en"]},
		{"url" : "http://feeds.feedburner.com/TechCrunch/fundings-exits", "hashtags" : "#fundings", "lang": ["en"]}
	],
	"retweet" :
	{
		"max_retweets" : 600,
		"max_likes" : 0,
		"keywords" : [],
		"nbpages" : 40,
		"retweet_prob" : 0.5,
		"limit_prob" : 1.0,
		"interval" : [2.0, 4.0]
	},
	"github" :
	{
		"login": "",
		"password": "",
		"exclude": [],
		"topics" : []
	}
}

Their is two required sections :

  • Database : contains the information to connect to the MySQL database (host, username, password, database)
  • Twitter : contains the information for the Twitter API (auth and access tokens)

Database configuration

The database part of the configuration file looks like the following

"database" :
{
    "host" : "",
    "username" : "",
    "password" : "",
    "database" : ""
}

This section is mandatory.

Update e-mail configuration

You can configure your bot to send you an email with the number of new followers in the email section

"email" : "[email protected]"

Scheduler configuration

The scheduler is responsible for executing the bot's actions and you can configure it the sleep for a specific period of time.

"scheduler" :
{
    "sleep": [6, 13]
}

Here the scheduler will sleep during 6h00 and 13h00.

Hashtags

You can add text to be replace as hashtags in your tweet in the "hashtags" section

"hashtags":
[
    {"from" : "My Hashtag", "to" : "#MyHashtag", "case_sensitive" : true}
]

Here, occurences of "My Hashtag" will be replaced by #MyHashtag.

Twitter

To access Twitter, pyTweetBot needs four tokens for the Twitter API and your username.

"twitter" :
{
    "auth_token1" : "",
    "auth_token2" : "",
    "access_token1" : "",
    "access_token2" : "",
    "user" : ""
}

TODO: tutorial to get the tokens

Friends settings

The friends section has four parameters.

"friends" :
{
	"max_new_followers" : 40,
	"max_new_unfollow" : 40,
	"interval" : [15, 60],
	"unfollow_interval" : 604800
}
  • The max_new_followers set the maximum user that can be followed each day;
  • The max_new_unfollow set the maximum user that can be unfollowed each day;
  • The interval parameter set the interval in minutes between each follow/unfollow action choosen randomly between the min and the max;

Create database

You have then to create the database on your MySQL host

python -m pyTweetBot tools
    --create-database : Create the database structure on the MySQL host
    --export-database : Export tweets, tweeted and followers/friends to a file
    --import-database     Import tweets, tweeted and followers/friends from a file
    --file : File to import / to export to

You can use the "create-database" action for that :

python -m pyTweetBot tools --config /path/to/config.json --create

It is possible to export bot's data to a file with the export-database command.

python -m pyTweetBot tools --config /path/to/config.json --export --file export_file.p

And then import the bot's data from the file

python -m pyTweetBot tools --config /path/to/config.json --import --file export_file.p

Model training

Create a dataset

The first step to train a model is to create a dataset of positive and negative examples. This can be done with the train command and the "dataset" action.

python -m pyTweetBot train --dataset test.p --config ../nils-config/nilsbot.json --text-size 100 --action dataset --source news

The source argument can take the following value :

  • News : URLs from Google News and and RSS streams;
  • tweets : Tweets found directly on Twitter;
  • friends : Description of Twitter users found directly on Twitter;
  • followers : Description of Twitter users found in your list of followers;
  • home : Tweets found on our home feed;

Train a model

Once the dataset is created, we can train a model using the "train" action :

python -m pyTweetBot train --dataset test.p --config ../nils-config/nilsbot.json --model mymodel.p --action train --text-size 100 --classifier SVM
INFO:pyTweetBot:Finalizing training...
INFO:pyTweetBot:Training finished... Saving model to mymodel.p

The classifier parameter can take the following values :

  • NaiveBayes : Naive Bayes classifier;
  • DecisionTree : Simple decision tree;
  • RandomForest : Random forest;
  • SVM : Support Vector Machine;

Test a model

You can test your model's accuracy with the "test" action :

python -m pyTweetBot train --dataset test.p --config ../nils-config/nilsbot.json --model mymodel.p --action test --text-size 100
Success rate of 56.1108362197 on dataset

You can now use your model to class tweets.

Command line

Launch executors

pyTweetBot launch an executor thread for each action type. You can launch the executor daemon that way :

python -m pyTweetBot executor --config /etc/bots/bot.conf

Find new tweets

python -m pyTweetBot find-tweets --config /etc/bots/bot.conf --model /etc/bots/models/find_tweets.p

Find new retweets

python -m pyTweetBot find-retweets --config /etc/bots/bot.conf --model /etc/bots/moedls/find_retweets.p

Automatise execution with crontab

Development

Files

More Repositories

1

SFGram-dataset

SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published on the public domain and were obtained from the Gutenberg project or the archive.org website.
OpenEdge ABL
18
star
2

pyInstaBot

A Instagram Bot using Deep Learning methods written in Python. PyInstaBot can find interesting people to follow and post new contents on a scheduled time after applying filters.
Python
15
star
3

TorchLanguage

TorchLanguage is the equivalent of TorchVision for Natural Language Processing. It gives you access to text transformers (tokens, index, n-grams, etc), datasets, pre-trained models and embedding.
Python
9
star
4

instaGIMP

A Gimp plug-in with Instagram-like filters
Python
7
star
5

Oger

Oger is a Reservoir Computing Toolbox for Reservoir Computing, this is an augmented version to support more options and sparse matrices for NLP tasks
Python
7
star
6

RCNLP-authorship-attribution

Authorship attribution with Echo State Network-based Reservoir Computing (Part of the RCNLP project)
OpenEdge ABL
3
star
7

PAN18-Author-Profiling

Author Profiling Challenge of the PAN @ CLEF 2018
Python
3
star
8

NS.ai

Source code of NS.ai channel's tutorials
Python
3
star
9

OgerMNIST

Master thesis's research work applying ESN to MNIST digit recognition with Oger
Python
3
star
10

Character-Embedding

Create and access various character embeddings
Python
2
star
11

RCNLP

RCNLP - Echo State Network for Natural Language Processing is a research project about Reservoir Computing for NLP tasks
OpenEdge ABL
2
star
12

panda3d-tutorials

Panda 3D - Tutos
Python
1
star
13

RCNLP-text-generation

Text generation with Echo State Network-based Reservoir Computing (part of the RCNLP project)
Python
1
star
14

elena-ferrante-nlp

Authorship attribution study of Elena Ferrante's books with standard NLP tools and neural models
Python
1
star
15

Conceptor

Tools and basic functions for the study of conceptors
Python
1
star
16

RCNLP-word2echo

Word Embeddings extraction with Echo State Network-based Reservoir Computing (Part of the RCNLP project)
Python
1
star
17

PAN-CLEF17

Code for Author profiling task of the 17th evaluation lab on digital text forensics at the CLEF 2017 conference.
Python
1
star
18

PAN18-SCD-ESN

Code for the PAN18@CLEF Style Change Detection Task with ESN
Python
1
star
19

RCNLP-author-clustering

Author Clustering with Echo State Network-based Reservoir Computing (Part of the RCNLP project)
Python
1
star