• Stars
    star
    531
  • Rank 82,924 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 9 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A little library for text analysis with RNNs.

Passage

A little library for text analysis with RNNs.

Warning: very alpha, work in progress.

Install

via Github (version under active development)

git clone http://github.com/IndicoDataSolutions/passage.git
python setup.py develop

or via pip

sudo pip install passage

Example

Using Passage to do binary classification of text, this example:

  • Tokenizes some training text, converting it to a format Passage can use.
  • Defines the model's structure as a list of layers.
  • Creates the model with that structure and a cost to be optimized.
  • Trains the model for one iteration over the training text.
  • Uses the model and tokenizer to predict on new text.
  • Saves and loads the model.
from passage.preprocessing import Tokenizer
from passage.layers import Embedding, GatedRecurrent, Dense
from passage.models import RNN
from passage.utils import save, load

tokenizer = Tokenizer()
train_tokens = tokenizer.fit_transform(train_text)

layers = [
	Embedding(size=128, n_features=tokenizer.n_features),
	GatedRecurrent(size=128),
	Dense(size=1, activation='sigmoid')
]

model = RNN(layers=layers, cost='BinaryCrossEntropy')
model.fit(train_tokens, train_labels)

model.predict(tokenizer.transform(test_text))
save(model, 'save_test.pkl')
model = load('save_test.pkl')

Where:

  • train_text is a list of strings ['hello world', 'foo bar']
  • train_labels is a list of labels [0, 1]
  • test_text is another list of strings

Datasets

Without sizeable datasets RNNs have difficulty achieving results better than traditional sparse linear models. Below are a few datasets that are appropriately sized, useful for experimentation. Hopefully this list will grow over time, please feel free to propose new datasets for inclusion through either an issue or a pull request.

Note: None of these datasets were created by indico, nor should their inclusion here indicate any kind of endorsement

Blogger Dataset: http://www.cs.biu.ac.il/~koppel/blogs/blogs.zip (Age and gender data)

More Repositories

1

finetune

Scikit-learn style model finetuning for NLP
Python
700
star
2

Enso

Enso: An Open Source Library for Benchmarking Embeddings + Transfer Learning Methods
Python
95
star
3

IndicoIo-node

A Node.js wrapper for the Indico API
JavaScript
62
star
4

SuperCell

Public tutorials and code that accompanies articles
Jupyter Notebook
40
star
5

Foxhound

Scikit learn inspired library for gpu-accelerated machine learning
Python
38
star
6

ImageSimilarity

Demo using image_features api to sort images based on similarity.
JavaScript
29
star
7

plotlines

Exploring the shapes of stories using indico sentiment analysis APIs
28
star
8

IndicoIo-ruby

A simple Ruby Wrapper for the indico set of APIs
Ruby
13
star
9

clothing_similarity

Final and skeleton code for the clothing similarity walkthrough
Python
10
star
10

Indico-Solutions-Toolkit

A library to assist in integrating the Indico IPA platform
Python
9
star
11

Doc2Dict

Code accompanying Doc2Dict paper
Python
9
star
12

IndicoIo-PHP

A simple PHP Wrapper for the Indico API
PHP
7
star
13

IndicoIo-R

A simple R Wrapper for the indico set of APIs
R
6
star
14

ClusterRSS

A small app for clustering the content of RSS feeds
JavaScript
6
star
15

ImageFeaturesClassifier

Using indico's imagefeatures API and scikit-learn to produce a solve an image classification task
Python
5
star
16

spaCy

Clone of spaCy for confidence levels
Python
5
star
17

indico-client-python

Indico IPA client library
Python
5
star
18

virga

Template-based adaptable sidecar app generation and plugins for deployment alongside Indico's IPA.
Python
5
star
19

indi-flask

A template for building flask apps that use indico
Python
4
star
20

TwitterSentiment

A demo of indico's sentiment API
JavaScript
4
star
21

IntercomBot

Bot for triaging incoming intercom requests and assigning them to the right people.
Python
4
star
22

tf_cod

Terraform repository for Clusters on Demand (COD)
HCL
3
star
23

SentimentDemo

Tracking how sentiment changes throughout a novel.
Python
3
star
24

indico-client-java

Indico IPA java client
Kotlin
2
star
25

content_recommendation

A simple script to recommend content to a user based on things that they say.
Python
2
star
26

groundtruth

Ground Truth Analysis Tooling
Python
2
star
27

KNNQuery

Constant time nearest neighbors querying. Hopefully.
2
star
28

indico-pretrained-uipath-demo

Demo project showing off the use of indico's pretrained api activities for UIPath
Visual Basic
2
star
29

asyncio-chainable

Python
2
star
30

indico-blueprism-custom-actions

Indico Custom Actions for Blue Prism
C#
2
star
31

Custom-Workflow-Template

template for custom workflow
Python
1
star
32

IndicoEditor

Use indico's APIs to explore trends and patterns in your writing.
Python
1
star
33

indicoio-mathematica

Mathematica package for accessing predictive APIs from indico.io
Mathematica
1
star
34

indico-tf-ops

C++
1
star
35

RSSCustomization

RSS feed customization using the indico text tags API.
Python
1
star
36

LaunchAcademy

Source code for lessons at launch academy
Ruby
1
star
37

indico-uipath-custom-activities

C#
1
star
38

raw_requests

Python
1
star
39

indico-ui

Indico UI Theme for Atom
CSS
1
star
40

indico-client-csharp

C#
1
star
41

xpdf_modified

Modified version of the xpdf library allowing json dumps
C++
1
star
42

IndicoIo-LoaderIo

Python script to generate load tests for indico clouds
Python
1
star
43

ContentRecommendation

Proof of concept for a content recommendation system using the indico text tags API.
Python
1
star