• Stars
    star
    819
  • Rank 55,659 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created about 10 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

How do the different communities talk?

Programming language subreddits and their choice of words

While reading about various programming languages, I developed a hunch about how often different languages are mentioned by other communities and about the average conversational tones used by relative members.

To examine if it was just selective perception on my site, an unconscious confirmation of stereotypes, or a valid observation I collected and analysed some data, i.e. all comments (about 300k) written to submissions (about 40k) in respective programming language subreddits from 2013-08 to 2014-07 using PRAW and SQLite.

In this article I will present some selected results. (If you want you can also download the code I wrote/used as well as the raw data generated by it.)

Mutual mentions

The following chord graph (click it for an interactive version) shows how often a programming language is mentioned in communities (subreddits) not belonging to them:

(mutual mentions)

(The size of a language is set by how often the others talk about it in sum. One connection represents the mutual mentions of two communities. The widths on each end is determined by the relative frequency of the mentionee being referenced by the respective other community. So PHP talks more about SQL than SQL talks about PHP. The labels of some smaller communities might be missing in the graph due to some opaque d3.js behavior ¯\_(ツ)_/¯.)

The "big" languages are the ones most talked about, yawn.

Sure, measuring programming language popularity accurately is nearly impossible, but if we still simply take some values from TIOBE it gets interesting, because one can see how much is talked about a language relatively to how much it is supposedly used.

mentions relative to tiobe

Here was the first time I said "Ha! I knew it!".

haskell tweet

(No Haskell bash intended. I love it and its little web cousin Elm and use them for projects and also write articles about it.)

Word usage

If we now divide the number of comments in a subreddit containing a chosen word by the overall subreddit comment count (and multiply by 10000 to have a nice integer value), we get more ... well, diagrams. But most results like the obsession with abstract concepts by the Haskell people and the consideration of hardware issues by people using C and C++ are not that surprising.

abstract concepts

hardware

Cursing

This part here is quite comforting, because a conjecture many of us probably have is confirmed.

cursing

Happiness

To finish with something positive: The lispy guys seem to be the most cheerful people.

happy

But what is up with the Visual Basic community? They are neither angry nor happy. They just ... are? :)

Other subjects

On editgym.com/subreddits-and-their-choice-of-words you can find more analyses of this kind applied to different topics/subreddits like gaming, music, sports, operating systems, etc.

Disclaimer

As you probably already noticed, this is not hard science. It was just a small fun project and contains several possibilities for errors. I tried to only choose big communities and frequent words so that there is at least a bit of statistical significance. (btw If you remove this constraint Elm is the most happy and coolest language. ^_-) But potential errors in my parser and interpretation (e.g. no taking negations into account etc.) are not to exclude fully as well. ;)

Also, positive correlation (e.g cursing <-> PHP) does not imply one causing the other. But if somebody wants to repeat this experiment to confirm/refute the results with more fancy tools like nltk or something, I would be happy if you could drop me an email.

More Repositories

1

FunctionalPlus

Functional Programming Library for C++. Write concise and readable C++ code.
C++
2,103
star
2

articles

thoughts on programming
Python
1,576
star
3

frugally-deep

A lightweight header-only library for using Keras (TensorFlow) models in C++.
C++
1,059
star
4

img2xls

Convert images to colored cells in an Excel spreadsheet.
Python
212
star
5

undictify

Python library providing type-checked function calls at runtime
Python
98
star
6

Breakout

A clone of the classical game for your browser.
Elm
57
star
7

Maze

Test your mouse precision skills with this simple maze game.
Elm
40
star
8

treebomination

convert a scikit-learn decision tree into a Keras model
Python
39
star
9

enterprython

Python library providing type-based dependency-injection
Python
32
star
10

Demoscene-Concentration

The classical memory game with old school demoscene effects.
Elm
29
star
11

All-Colors

Create (hopefully beautiful) images from many different colors.
C++
23
star
12

RedditTimeMachine

Check out what was hot on reddit days/weeks/months ago.
Elm
22
star
13

Barcode-Generator

Generate EAN/UPC-A barcodes in your browser.
JavaScript
7
star
14

EditGym

Text editing training
Elm
6
star
15

divine-or-benign

The holy Turing test
Elm
4
star
16

yo_dawg_ml_model_architecture

decision trees with other model as nodes
Python
3
star
17

HackerRank-solutions

This repo is just a container for me to manage my solutions to the challenges on HackerRank.com
Haskell
3
star
18

bouncing-spheres

A very simplistic raytracer - implemented in Rust
Rust
3
star
19

rill

Python library providing simple text-stream processing functionality
Python
2
star
20

Multitouch-Transformation-Demo

small Demonstration of calculating and applying different transformation types by user input (1, 2, 3 and 4 fingers)
Elm
2
star
21

pick-and-gloat

Use your thinking and reaction to compete with friends.
Elm
1
star
22

Dron

Tron/Snake game
C++
1
star
23

Behagolit

a toy programming language experiment
Python
1
star