• Stars
    star
    285
  • Rank 145,115 (Top 3 %)
  • Language
    Python
  • Created over 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python script that parses post titles, self-texts, and comments on reddit and makes word clouds out of the word frequencies.

PyPI version Python 2.7 Python 3.5 License

Reddit Analysis project

Please send all requests to make a Most-Used Words (MUW) cloud to http://www.reddit.com/r/MUWs/

Feel free to post the MUWs you've made there, too.

License

Copyright 2016 Randal S. Olson.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Dependencies

You must first install the Python library if you do not have that already. Preferably, use the Anaconda Python distribution for an easy install.

Next, you can install this package. Enter the following command into the terminal:

pip install redditanalysis

You may need to put sudo in front of the above command if your system requires root access.

If you want to install the lastest development version from github first clone the package:

git clone https://github.com/rhiever/reddit-analysis.git

change into the reddit-analysis directory:

cd reddit-analysis

then run the update script:

python setup.py install

Files in this repository

redditanalysis/words/common-words.txt is a data file containing a list of words that should be considered common. Note that this list is not final and is constantly changing.

redditanalysis/words/dict-words.txt is a data file containing a list of words from a dictionary. It is only recommended to use this file (with the -x option) if you want word_freqs to pick out very uncommon words.

Usage

Once installed, run the following on your command line to produce a usage message:

word_freqs --help

This command will detail all of the command line options and arguments for the word_freqs.

Make a MUW cloud for a subreddit or redditor

To count the most-used words for a subreddit over the last month, enter the following command:

word_freqs YOUR-USERNAME /r/SUBREDDIT

Similarly, for a reddit user:

word_freqs YOUR-USERNAME /u/REDDITOR

where YOUR-USERNAME is your reddit username and SUBREDDIT / REDDITOR is the subreddit / redditor you want to make the MUW cloud for. You must provide both arguments for the script to work properly.

Why is your username required? Simply because it will be used as the user-agent when making the Reddit API request. Reddit asks its API users to use something unique as the user-agent and recomends to use the users username.

Once the script completes, it will create a file called subreddit-SUBREDDIT.csv (or user-REDDITOR.csv) to the directory you ran it in. This file contains all of the commonly-used words from the subreddit / redditor you specified in the frequencies they were used.

To make a MUW cloud out of the words, copy all of the words into http://www.wordle.net/compose and click the Go button. Ta-da, you're done!

Multiprocess

reddit-analysis supports multiprocess PRAW. This allows you to run multiple instances of reddit-analysis simultaneously and not risk getting banned for overusing the reddit API. To enable multiprocess PRAW in reddit-analysis, add the -u flag.

See the PRAW documentation for more information.

More Repositories

1

Data-Analysis-and-Machine-Learning-Projects

Repository of teaching materials, code, and data for my data analysis and machine learning projects.
Jupyter Notebook
6,107
star
2

TwitterFollowBot

A Python bot that automates several actions on Twitter, such as following users and favoriting tweets.
Python
1,309
star
3

datacleaner

A Python tool that automatically cleans data sets and readies them for analysis.
Python
1,054
star
4

optimal-roadtrip-usa

Contains maps for the article, "Computing the optimal road trip across the U.S." and similar articles
HTML
230
star
5

sklearn-benchmarks

A centralized repository to report scikit-learn model performance across a variety of parameter settings and data sets.
Jupyter Notebook
210
star
6

python-data-visualization-course

Course materials for teaching data visualization in Python.
Jupyter Notebook
169
star
7

reddit-twitter-bot

Looks up posts from reddit and automatically posts them on Twitter.
Python
137
star
8

name-age-calculator

Analyzes a name and guesses the age range of a person with that name.
HTML
43
star
9

redditviz

An interactive map of reddit: the "front page of the internet"
CSS
38
star
10

MarkovNetwork

Python implementation of Markov Networks for neural computing.
Python
36
star
11

ipython-notebook-workshop

Beginner's IPython Notebook Tutorial
19
star
12

baby-name-explorer

HTML
17
star
13

network-analysis-scripts

A bunch of useful scripts for analyzing networks.
Python
13
star
14

active-categorical-classifier

A tool that evolves small brains capable of scanning and classifying an image.
Jupyter Notebook
12
star
15

k-fold-cv-benchmark

Python
9
star
16

optimized-us-capitol-road-trip

HTML
9
star
17

crowd-machines

Jupyter Notebook
8
star
18

xrff2csv

A Python tool that converts XRFF files to CSV format.
Python
7
star
19

edd

A tool that evolves small brains capable of scanning and classifying an image.
C++
7
star
20

rhiever.github.io

Dr. Randal Olson's personal website
HTML
5
star
21

Collective-Cognition-Increases-Accuracy

Code for the model in the paper, "Accurate decisions in an uncertain world: collective cognition increases true positives while decreasing false positives."
Python
5
star
22

rhiever-bot

Bot that monitors /r/MUWs and runs the MUW script.
Python
4
star
23

big-ten-twitter-network

Interactive visualization of the Big Ten football teams on Twitter
JavaScript
3
star
24

biped-hyperneat

ODE implementation of a walking biped robot with HyperNEAT evolving the neural controller
PHP
3
star
25

dissertation-topic-network

Dissertation topic network
3
star
26

big-data-hw

2
star
27

Intro-to-Evolutionary-Modeling

Material for teaching biologists to work with digital evolutionary models.
2
star
28

rmagic-tutorial

A brief tutorial showing how Rmagic can be used in IPython Notebook.
2
star
29

marriage-divorce-stats

144 years of marriage and divorce in 1 chart
HTML
1
star
30

EvoRoboCodeGECCO2013

Description of our EvoRoboCode competition submission to GECCO 2013.
1
star
31

drug-alcohol-mentions

1
star
32

2014-01-30-mit

Software Carpentry bootcamp at Massachusetts Institute of Technology on January 30-31, 2014
Python
1
star
33

betting-game

Game Theory: betting game
C++
1
star
34

temp-repo

HTML
1
star
35

eos-old

Evolution of Swarming Platform
C++
1
star
36

ipython-example

Example notebook showing how to do statistics in IPython Notebook.
Python
1
star
37

AMT-biped-analysis

1
star
38

eos-active-perception

EOS with agents who have to actively perceive the environment with a fine-grained retina.
C++
1
star