• Stars
    star
    11
  • Rank 1,694,829 (Top 34 %)
  • Language
    Go
  • License
    MIT License
  • Created almost 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Package assocentity returns the mean distance from tokens to an entity and its synonyms

assocentity

Go Report Card

Package assocentity is a social science tool to analyze the relative distance from tokens to entities. The motiviation is to make conclusions based on the distance from interesting tokens to a certain entity and its synonyms. Visit this website to see an usage example.

Features

  • Provide your own tokenizer
  • Provides a default NLP tokenizer (by Google)
  • Define aliases for entities
  • Provides a multi-OS, language-agnostic CLI version

Installation

$ go get github.com/ndabAP/assocentity/v14

Prerequisites

If you want to analyze human readable texts you can use the provided Natural Language tokenizer (powered by Google). To do so, sign-up for a Cloud Natural Language API service account key and download the generated JSON file. This equals the credentialsFile at the example below. You should never commit that file.

A possible offline tokenizer would be a white space tokenizer. You also might use a parser depending on your purposes.

Example

We would like to find out which adjectives are how close in average to a certain public person. Let's take George W. Bush and 1,000 NBC news articles as an example. "George Bush" is the entity and synonyms are "George Walker Bush" and "Bush" and so on. The text is each of the 1,000 NBC news articles.

Defining a text source and to set the entity would be first step. Next, we need to instantiate our tokenizer. In this case, we use the provided Google NLP tokenizer. Finally, we can calculate our mean distances. We can use assocentity.Distances, which accepts multiple texts. Notice how we pass tokenize.ADJ to only include adjectives as part of speech. Finally, we can take the mean by passing the result to assocentity.Mean.

// Define texts source and entity
texts := []string{
	"Former Presidents Barack Obama, Bill Clinton and ...", // Truncated
	"At the pentagon on the afternoon of 9/11, ...",
	"Tony Blair moved swiftly to place his relationship with ...",
}
entities := []string{
	"Goerge Walker Bush",
	"Goerge Bush",
	"Bush",
}
source := assocentity.NewSource(entities, texts)

// Instantiate the NLP tokenizer (powered by Google)
nlpTok := nlp.NewNLPTokenizer(credentialsFile, nlp.AutoLang)

// Get the distances to adjectives
ctx := context.TODO()
dists, err := assocentity.Distances(ctx, nlpTok, tokenize.ADJ, source)
if err != nil {
	// Handle error
}
// Get the mean from the distances
mean := assocentity.Mean(dists)

The NLPTokenizer has a built-in retryer with a strategy that went well with the Google Language API limitations. It can't be disabled or configured.

Tokenization

A Tokenizer is something that produces tokens with a given text. While a Token is the smallest possible unit of a text. The interface with the method Tokenize has the following signature:

type Tokenizer interface {
	Tokenize(ctx context.Context, text string) ([]Token, error)
}

A Token has the following properties:

type Token struct {
	PoS  PoS    // Part of speech
	Text string // Text
}

// Part of speech
type PoS int

For example, given the text:

text := "Punchinello was burning to get me"

The result from Tokenize would be a slice of tokens:

[]Token{
	{
		Text: "Punchinello",
		PoS:  tokenize.NOUN,
	},
	{
		Text: "was",
		PoS:  tokenize.VERB,
	},
	{
		Text: "burning",
		PoS:  tokenize.VERB,
	},
	{
		Text: "to",
		PoS:  tokenize.PRT,
	},
	{
		Text: "get",
		PoS:  tokenize.VERB,
	},
	{
		Text: "me",
		PoS:  tokenize.PRON,
	},
}

CLI

There is also a language-agnostic terminal version available for either Windows, Mac (Darwin) or Linux (only with 64-bit support) if you don't have Go available. The application expects the text from "stdin" and accepts the following flags:

Flag Description Type Default
entities List of comma separated entities, example: -entities="Max Payne,Payne" string
google-svc-acc-key Google Clouds NLP JSON service account file, example: -google-svc-acc-key=~/google-svc-acc-key.json string
op Operation to execute, default is mean string mean
pos List of comma separated part of speeches, example: -pos=noun,verb,pron string any

Example:

echo "Relax, Max. You're a nice guy." | ./bin/assocentity_linux_amd64_v14.0.0-0-g948274a-dirty -gog-svc-loc=/home/max/.config/assocentity/google-service.json -entities="Max Payne,Payne,Max"

The output is written to "stdout" in appropoiate formats.

Projects using assocentity

  • entityscrape - Distance between word types (default: adjectives) in news articles and persons

Author

Julian Claus and contributors.

License

MIT

More Repositories

1

vue-go-example

Vue.js and Go example project
JavaScript
711
star
2

vue-sails-example

NOT MAINTAINED Vue.js with Sails.js example project with many features
JavaScript
297
star
3

vue-command

A fully working, most feature-rich Vue.js terminal emulator. Now with Vue.js 3 support!
Vue
240
star
4

nuxt-express-example

NOT MAINTAINED Nuxt.js with Express.js example project
JavaScript
23
star
5

vue-best-practises

NOT MAINTAINED These recommendations should give you assistance to use Vue.js in a progressive and future-orientated way
20
star
6

vuex-cli-scaffold

NOT MAINTAINED Scaffold vuex actions, mutations, getters and the state
JavaScript
18
star
7

plotly-js-material-design-theme

NOT MAINTAINED Plotly.js charts with Material Design theme
JavaScript
14
star
8

ping-pong

Retro game pong written in Go and Websocket as transport
Go
12
star
9

mongodb-pipeline-factory

NOT MAINTAINED Generate MongoDB pipelines with ease
TypeScript
4
star
10

entityscrape

Social experiment: Mean distance between part of speeches in news articles and entities
Go
3
star
11

open-source-base

A base application for open source projects
3
star
12

joi-intersection

NOT MAINTAINED Validate against intersections of arrays and array singletons
JavaScript
3
star
13

ndabAP

2
star
14

lorenz-curve

NOT MAINTAINED Get the lorenz curve of a given data set
JavaScript
2
star
15

mongo-db-grundlagen

This German presentation shows an introduction into MongoDB
2
star
16

vue-presentation

German presentation why to prefer Vue.js over React and Angular
CSS
1
star
17

vue-test-utils-props-update-issue

HTML
1
star
18

esm-babel-plugin-module-resolver-issue

See https://github.com/standard-things/esm/issues/810
JavaScript
1
star
19

visual-studio-code-settings

1
star