• Stars
    star
    314
  • Rank 133,353 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

analyze text with empath

Empath is a tool for analyzing text across lexical categories (similar to LIWC), and also generating new lexical categories to use for an analysis. See our paper.

You can install in python via pip:

pip install empath

Then in a python shell, import like this:

from empath import Empath
lexicon = Empath()

Analyze text over all pre-built categories:

lexicon.analyze("he hit the other person", normalize=True)
# => {'help': 0.0, 'office': 0.0, 'violence': 0.2, 'dance': 0.0, 'money': 0.0, 'wedding': 0.0, 'valuable': 0.0, 'domestic_work': 0.0, 'sleep': 0.0, 'medical_emergency': 0.0, 'cold': 0.0, 'hate': 0.0, 'cheerfulness': 0.0, 'aggression': 0.0, 'occupation': 0.0, 'envy': 0.0, 'anticipation': 0.0, 'family': 0.0, 'crime': 0.0, 'attractive': 0.0, 'masculine': 0.0, 'prison': 0.0, 'health': 0.0, 'pride': 0.0, 'dispute': 0.0, 'nervousness': 0.0, 'government': 0.0, 'weakness': 0.0, 'horror': 0.0, 'swearing_terms': 0.0, 'leisure': 0.0, 'suffering': 0.0, 'royalty': 0.0, 'wealthy': 0.0, 'white_collar_job': 0.0, 'tourism': 0.0, 'furniture': 0.0, 'school': 0.0, 'magic': 0.0, 'beach': 0.0, 'journalism': 0.0, 'morning': 0.0, 'banking': 0.0, 'social_media': 0.0, 'exercise': 0.0, 'night': 0.0, 'kill': 0.0, 'art': 0.0, 'play': 0.0, 'computer': 0.0, 'college': 0.0, 'traveling': 0.0, 'stealing': 0.0, 'real_estate': 0.0, 'home': 0.0, 'divine': 0.0, 'sexual': 0.0, 'fear': 0.0, 'monster': 0.0, 'irritability': 0.0, 'superhero': 0.0, 'business': 0.0, 'driving': 0.0, 'pet': 0.0, 'childish': 0.0, 'cooking': 0.0, 'exasperation': 0.0, 'religion': 0.0, 'hipster': 0.0, 'internet': 0.0, 'surprise': 0.0, 'reading': 0.0, 'worship': 0.0, 'leader': 0.0, 'independence': 0.0, 'movement': 0.2, 'body': 0.0, 'noise': 0.0, 'eating': 0.0, 'medieval': 0.0, 'zest': 0.0, 'confusion': 0.0, 'water': 0.0, 'sports': 0.0, 'death': 0.0, 'healing': 0.0, 'legend': 0.0, 'heroic': 0.0, 'celebration': 0.0, 'restaurant': 0.0, 'ridicule': 0.0, 'programming': 0.0, 'dominant_heirarchical': 0.0, 'military': 0.0, 'neglect': 0.0, 'swimming': 0.0, 'exotic': 0.0, 'love': 0.0, 'hiking': 0.0, 'communication': 0.0, 'hearing': 0.0, 'order': 0.0, 'sympathy': 0.0, 'hygiene': 0.0, 'weather': 0.0, 'anonymity': 0.0, 'trust': 0.0, 'ancient': 0.0, 'deception': 0.0, 'fabric': 0.0, 'air_travel': 0.0, 'fight': 0.0, 'dominant_personality': 0.0, 'music': 0.0, 'vehicle': 0.0, 'politeness': 0.0, 'toy': 0.0, 'farming': 0.0, 'meeting': 0.0, 'war': 0.0, 'speaking': 0.0, 'listen': 0.0, 'urban': 0.0, 'shopping': 0.0, 'disgust': 0.0, 'fire': 0.0, 'tool': 0.0, 'phone': 0.0, 'gain': 0.0, 'sound': 0.0, 'injury': 0.0, 'sailing': 0.0, 'rage': 0.0, 'science': 0.0, 'work': 0.0, 'appearance': 0.0, 'optimism': 0.0, 'warmth': 0.0, 'youth': 0.0, 'sadness': 0.0, 'fun': 0.0, 'emotional': 0.0, 'joy': 0.0, 'affection': 0.0, 'fashion': 0.0, 'lust': 0.0, 'shame': 0.0, 'torment': 0.0, 'economics': 0.0, 'anger': 0.0, 'politics': 0.0, 'ship': 0.0, 'clothing': 0.0, 'car': 0.0, 'strength': 0.0, 'technology': 0.0, 'breaking': 0.0, 'shape_and_size': 0.0, 'power': 0.0, 'vacation': 0.0, 'animal': 0.0, 'ugliness': 0.0, 'party': 0.0, 'terrorism': 0.0, 'smell': 0.0, 'blue_collar_job': 0.0, 'poor': 0.0, 'plant': 0.0, 'pain': 0.2, 'beauty': 0.0, 'timidity': 0.0, 'philosophy': 0.0, 'negotiate': 0.0, 'negative_emotion': 0.0, 'cleaning': 0.0, 'messaging': 0.0, 'competing': 0.0, 'law': 0.0, 'friends': 0.0, 'payment': 0.0, 'achievement': 0.0, 'alcohol': 0.0, 'disappointment': 0.0, 'liquid': 0.0, 'feminine': 0.0, 'weapon': 0.0, 'children': 0.0, 'ocean': 0.0, 'giving': 0.0, 'contentment': 0.0, 'writing': 0.0, 'rural': 0.0, 'positive_emotion': 0.0, 'musical': 0.0}

Or over a specific set of categories:

lexicon.analyze("he hit the other person", categories=["violence"])
# => {'violence': 1.0}

By default, Empath will return raw counts, but you can ask it to normalize over words in the document.

lexicon.analyze("he hit the other person", categories=["violence"], normalize=True)
# => {'violence': 0.2}

You can create new lexical categories for analysis using word embeddings in our VSM:

lexicon.create_category("colors",["red","blue","green"])
# => ["blue", "green", "purple", "purple", "green", "yellow", "red", "grey", "violet", "gray", "blue", "orange", "white", "pink", "yellow", "black", "brown", "brown", "red", "aqua", "turquoise", "blue_color", "colored", "color", "same_shade", "violet", "gray", "grey", "teal", "nice_shade", "coloured", "forest_green", "colored", "different_shade", "colour", "sparkly", "reddish", "beautiful_shade", "greenish", "indigo", "darker_shade", "emerald", "lovely_shade", "tints", "crimson", "dark_purple", "pink", "emerald", "sapphire", "golden", "lighter_shade", "lime_green", "coloured", "bright", "same_color", "specks", "red", "golden_color", "different_shades", "chocolate_brown", "orange", "bluish", "green", "deep_purple", "magenta", "green_color", "dark_shade", "bright_orange", "milky", "lilac", "light_brown", "sparkling", "golden_brown", "silvery", "baby_blue", "blood_red", "pink", "teal", "blue", "yellowish", "turquoise", "same_colour", "sparkly", "aquamarine", "black_color", "white", "cerulean", "perfect_shade", "dark", "speckled", "charcoal", "greyish", "midnight_blue", "emerald_green", "deep_brown", "ocean_blue", "flecks", "amber", "pinkish", "jet_black"]

Then analyze with those categories:

lexicon.analyze("My favorite color is blue", categories=["colors"], normalize=True)
# => {'colors': 0.4}

Right now Empath has three different models you can use to create categories: fiction, nytimes, and reddit. (I'm working on integrating all the different models soon). For now, they have different strengths and weaknesses in terms of generating categories. Nytimes would be better for something like the cold war:

lexicon.create_category("cold_war", ["cold_war"], model="nytimes")
# => ["cold_war", "the_cold_war", "the_Cold_War", "war", "Soviet_threat", "the_end_of_the_cold_war", "Communism", "world_war", "Soviet_empire", "Soviet_power", "Communism", "gulf_war", "Soviet_bloc", "the_Soviet_Union", "communism", "superpowers", "nuclear_age", "nuclear_war", "Soviet_system", "evil_empire", "Soviets", "wars", "arms_race", "Indochina", "detente", "Iran-Iraq_war", "Persian_Gulf_war", "American_power", "new_world_order", "American_involvement", "wartime", "American_foreign_policy", "American_occupation", "the_Soviet_Union's", "Soviet_Communism", "nuclear_arms_race", "the_Korean_War", "military_power", "Persian_Gulf_war", "great_powers", "Marshall_Plan", "the_Second_World_War", "Communist_rule", "the_Warsaw_Pact", "Soviet_military", "Reagan_years", "Reagan_era", "Cuban_missile_crisis", "world_wars", "postwar_period", "Communist_world", "military-industrial_complex", "perestroika", "superpower", "new_war", "Desert_Storm", "space_race", "Mikhail_Gorbachev", "Communist_system", "World_War_II", "nation-building", "the_Vietnam_War", "dictatorship", "South_Vietnam", "Iron_Curtain", "diplomacy", "old_Soviet_Union", "military_buildup", "containment", "German_unification", "Balkans", "gulf_crisis", "revolution", "last_war", "Soviet_era", "dictatorships", "warfare", "glasnost", "Soviet_state", "Communist_regimes", "domestic_politics", "Khrushchev", "American_diplomacy", "postwar_era", "Soviet_economy", "peacetime", "Korean_peninsula", "Allies", "Soviet-American_relations", "cold_war_era", "space_program", "Soviet_occupation", "arms_control", "Soviet_leaders", "World_War_I", "Western_alliance", "military_strategy", "quagmire", "regime", "fascism"]

You can adjust the size of the requested categories. You may not always get a bigger category when you ask for it because we're still filtering on a minimum cosine similarity.

lexicon.create_category("cold_war", ["cold_war"], model="nytimes", size=300)
# => ["cold_war", "the_cold_war", "the_Cold_War", "war", "Soviet_threat", "the_end_of_the_cold_war", "Communism", "world_war", "Soviet_empire", "Soviet_power", "Communism", "gulf_war", "Soviet_bloc", "the_Soviet_Union", "communism", "superpowers", "nuclear_age", "nuclear_war", "Soviet_system", "evil_empire", "Soviets", "wars", "arms_race", "Indochina", "detente", "Iran-Iraq_war", "Persian_Gulf_war", "American_power", "new_world_order", "American_involvement", "wartime", "American_foreign_policy", "American_occupation", "the_Soviet_Union's", "Soviet_Communism", "nuclear_arms_race", "the_Korean_War", "military_power", "Persian_Gulf_war", "great_powers", "Marshall_Plan", "the_Second_World_War", "Communist_rule", "the_Warsaw_Pact", "Soviet_military", "Reagan_years", "Reagan_era", "Cuban_missile_crisis", "world_wars", "postwar_period", "Communist_world", "military-industrial_complex", "perestroika", "superpower", "new_war", "Desert_Storm", "space_race", "Mikhail_Gorbachev", "Communist_system", "World_War_II", "nation-building", "the_Vietnam_War", "dictatorship", "South_Vietnam", "Iron_Curtain", "diplomacy", "old_Soviet_Union", "military_buildup", "containment", "German_unification", "Balkans", "gulf_crisis", "revolution", "last_war", "Soviet_era", "dictatorships", "warfare", "glasnost", "Soviet_state", "Communist_regimes", "domestic_politics", "Khrushchev", "American_diplomacy", "postwar_era", "Soviet_economy", "peacetime", "Korean_peninsula", "Allies", "Soviet-American_relations", "cold_war_era", "space_program", "Soviet_occupation", "arms_control", "Soviet_leaders", "World_War_I", "Western_alliance", "military_strategy", "quagmire", "regime", "fascism", "socialism", "Vietnam", "totalitarianism", "new_Europe", "American_leadership", "long_war", "World_War_II.", "colonial_rule", "the_Persian_Gulf_war", "atom_bomb", "NATO_alliance", "world_affairs", "military_threat", "home_front", "Western_Europe", "Eastern_Europe", "German_reunification", "glasnost", "Stalin", "Iraq_war", "Reagan_Presidency", "military_might", "American_policy", "colonialism", "major_war", "East-West_relations", "Soviet_history", "Soviet_rule", "Russians", "the_Gulf_War", "Atlantic_alliance", "the_Bay_of_Pigs", "democracies", "coups", "old_order", "Islamic_world", "Soviet_leadership", "unification", "Stalinism", "nuclear_threat", "Vietnam_era", "the_Afghan_war", "Gorbachev_era", "the_Vietnam_war", "American_President", "American_military_power", "Western_powers", "American_Government", "Soviet_domination", "foreign_policy", "military_establishment", "new_thinking", "Communist_regime", "Communist_era", "militarism", "isolationism", "the_Persian_Gulf", "first_gulf_war", "upheavals", "Saddam_Hussein's", "reunification", "Second_World_War", "Reagan_Administration", "Eastern_Europe's", "disintegration", "empires", "American_strategy", "civil_war", "Soviet_society", "Western_democracies", "common_enemy", "Communist_state", "Korean_Peninsula", "New_Deal", "the_Marshall_Plan", "Berlin_wall", "American_influence", "American_president", "Communist_dictatorship", "political_struggle", "the_Reagan_Administration", "American_public_opinion", "military_victory", "American_policy_makers", "Central_Europe", "modern_history"]

More Repositories

1

TypedJS

Lightweight program specifications for testing JavaScript
JavaScript
223
star
2

iris-agent

A extensible conversational agent for data science tasks
TeX
122
star
3

Gajure

A framework for implementing genetic algorithms in Clojure.
Clojure
63
star
4

ProofFrontend

JavaScript
12
star
5

Proof-Search

Haskell
11
star
6

empath-outofdate

HTML
10
star
7

Clogger

A basic blog framework designed around Compojure
Clojure
8
star
8

mobile-for-compojure

Middleware for handling mobile devices in compojure applications
Clojure
7
star
9

codex

Collecting and aggregating information about Ruby ASTs
Ruby
5
star
10

Atom-Feed-for-Clojure-and-Compojure

A simple template for creating atom feeds with Clojure and Compojure
Clojure
4
star
11

meta

a community-aware domain specific language for Python
Python
4
star
12

code_extraction

Python
4
star
13

cs1120

An example Django app for UVa's CS1120
Python
3
star
14

augur-nlp-mining

Python
3
star
15

wordy

analyze text patterns in webpages
Clojure
3
star
16

ejhfast.github.com

Blog
HTML
3
star
17

CS376

Arduino interface code
JavaScript
3
star
18

5k

Ruby
2
star
19

UXMockups

Ruby
2
star
20

faml

Finite Automata for Ocaml
OCaml
2
star
21

OurApp

An application
JavaScript
2
star
22

BarefootCS

Ruby
2
star
23

HackLikeMe

Connecting developers and designers
Ruby
2
star
24

integratedCS

Objective-C
2
star
25

CustomerManagmentApp

TBD
Ruby
2
star
26

iphoneCS

iphone project for sherriff's class
Objective-C
2
star
27

acl2-backend

Backend proof checker for education project
Common Lisp
2
star
28

fa-ruby

Build Finite Automata in Ruby
Ruby
1
star
29

MotivateCSS

Design for motivation site
Ruby
1
star
30

oldblog

placeholder for old blog
Ruby
1
star
31

NLP-Challenge

Solution to Joseph Turian's NLP problem
Ruby
1
star
32

fiction-bias

HTML
1
star
33

fahs

Learning Haskell with Finite Automata
Haskell
1
star
34

node-proof-server

Nodejs wrapper for acl2 proof checker
Common Lisp
1
star
35

virtual_assistant

natural language assistant for OSX
Python
1
star
36

maria-web

web interface for deep learning peptide presentation
JavaScript
1
star
37

Clustering-in-Haskell

A simple clustering implementation
Haskell
1
star
38

Fortune

A toy Sinatra app that generates unix fortunes
Ruby
1
star
39

Jokes

Agent Based Joke Simulation
Java
1
star
40

Bite

Minimalist Web Framework for Clojure
Clojure
1
star
41

Markov

Markov text creation
Ruby
1
star
42

elrond-rust

Key creation and transaction signing for the Elrond network in pure Rust
Rust
1
star
43

Narcissist

Track a user's twitter-based narcissism quotient
Ruby
1
star