• Stars
    star
    104
  • Rank 330,604 (Top 7 %)
  • Language
    Python
  • License
    Other
  • Created about 12 years ago
  • Updated over 11 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The goal of this project is to implement a Question Answering (QA) system that answers causal type questions. We use Wikipedia as a knowledge base, extracting answers to user questions from the articles.

Cause of Why

The goal of this project is to implement a Question Answering (QA) system that answers causal type questions. We use Wikipedia as a knowledge base, extracting answers to user questions from the articles.

Currently we are focused on getting the system's engine working, so the user interface is on the back burner right now. Please stay tuned for lots of updates!

Causal Questions

Causal questions are generally why-questions. They ask for a reason or a cause, such as "Why do birds sing?". This differs from other QA systems, which usually try to answer factoid questions, such as "Where is the Louvre located?".

Required Libraries

This project uses several libraries that either need to be installed or need to be present in the project's lib/ directory. The following is a list of the required libraries, as well as at least one way (source) to obtain the library.

nltk

Natural Language Processing (NLP) functions such as sentence segmentation, word tokenization, and more.

nltk resources

In addition, you will need to download several nltk resources using nltk.download() after you have the nltk library installed.

  • 'taggers/maxent_treebank_pos_tagger/english.pickle'

gensim

Some useful Information Retrieval (IR) algorithms including string to vector functions and similarity queries such as TF-IDF. Also implements topic modelling such as Latent Semantic Analysis.

unidecode

Converts unicode strings to closest ASCII equivalent.

Tornado

Provides a web server interface.

WikiExtractor.py

Converts text from MediaWiki markup format to plain text.

Optional Libraries

PyMongo

Tools for interacting with MongoDB databases. This is useful for working with indices that can't be held entirely in memory, which is not a problem for a smaller corpus like the Simple English Wikipedia but is an issue for larger corpora like the full English Wikipedia.

MongoDB

Since the PyMongo library is just an interface, we need an instance of the actual database itself running.

  • Pick the version for your platform.
  • If using Windows 7 or higher get the Windows 2008+ build.
  • Tested with version: 2.2.1

Start the database process before running the application.

More Repositories

1

wikipedia-extractor

This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wikiextractor --- Extracts and cleans text from Wikipedia database dump and stores output in a number of files of similar size in a given directory.
Python
258
star
2

twitter-corpus

Collects all tweets from the sample Public stream using Twitter's streaming API, and saves them to a file for later use as a corpus.
Python
46
star
3

infertweet

Infer information from Tweets. Useful for human-centered computing tasks, such as sentiment analysis, location prediction, authorship profiling and more!
Python
10
star
4

hue-log

Log to a journal the state of each Philips Hue light as they change throughout the day.
Python
8
star
5

haikupy

An English language haiku generator that uses the 5-7-5 syllable pattern.
Python
7
star
6

infer

A machine learning toolkit for classification and assisted experimentation.
Python
4
star
7

rotalh

Get a running count of occurrences from a stream. Intended to replace `sort | uniq -c` when the input is a stream.
Haskell
4
star
8

haskell-pre-commit-hooks

Haskell related hooks for use with the http://pre-commit.com/ framework.
3
star
9

dentonpolice

Scrapes mug shot and inmate information from the City Jail Custody Report page for Denton, TX and posts some of the info (including mug shot) to Twitter.
Python
3
star
10

inferhotspot

Infer information about local hotspots.
Python
2
star
11

simplewsd

An English word sense disambiguation library using WordNet.
Python
2
star
12

rotal

Get a running count of occurrences from a stream. Intended to replace `sort | uniq -c` when the input is a stream.
Python
2
star
13

codingame

CodinGame puzzles, AI bots, and contests.
Haskell
1
star
14

alexa-coffee-maker

Alexa skill for Amazon Echo that helps with coffee questions at home.
Python
1
star