• Stars
    star
    131
  • Rank 275,867 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 10 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Uses NLP and wikipedia to try to generate trivia questions

wikipedia-question-generator

This project is no longer maintained. It is MIT licensed, so you're welcome to take the code and use it yourself.

Uses Natural Language Processing and Wikipedia content to try to generate Mad Libs-style game questions. Powers the web app at http://wikitrivia.atbaker.me.

Built for TrackMaven Monthly Challenge meetup in December 2014.

I also made a short presentation about the project. See this YouTube video for an idea of the kind of game these questions are meant to support.

wikipedia-question-generator is open source under the MIT License.

Sample usage

Running the command:

$ wikitrivia 'Tony Bennett'

yields:

[
  {
    "question": "Bennett is also an accomplished __________, having created works\u2014under the name Anthony Benedetto\u2014that are on permanent public display in several institutions.",
    "answer": "painter", "title": "Tony Bennett",
    "similar_words": ["classic", "classicist", "constructivist", "decorator", "draftsman", "etcher", "expressionist", "illustrator"]
  }
  {
    "question": "He is the __________ of the Frank Sinatra School of the Arts in ..."
  }
]

Quickstart

wikipedia-question-generator is a Python 3 project that uses the fantastic click package to expose itself as a shell command.

You can use the project locally (and quickly) through Docker or a local installation of Python 3.4.

Installing with Docker

If you just want to run the tool, and don't want to modify it, just pull the latest image from Docker Hub:

$ docker pull atbaker/wikipedia-question-generator

Then, run the image with:

$ docker run atbaker/wikipedia-question-generator --help
Usage: wikitrivia [OPTIONS] [TITLES]...

  Generates trivia questions from wikipedia articles. If no titles are
  supplied, pulls from these sample articles:

  'Tony Bennett', 'Python (programming language)', 'Scabbling', 'Ukrainian
  Women's Volleyball Super League'

Options:
  --output FILENAME  Output to JSON file
  --help             Show this message and exit.

To make running the container less cumbersome, you can alias the docker run command:

$ alias wikitrivia='docker run atbaker/wikipedia-question-generator'
$ wikitrivia --help
Usage: wikitrivia [OPTIONS] [TITLES]...

If you want to contribute to the tool, you can clone the repo and use Fig to get started quickly.

Installing with Python 3.4

Clone the repo, and then use pyvenv-3.4 (or virtualenv) to create a new virtual environment. Then, install the requirements and the NLTK corpora:

$ pyvenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python -m textblob.download_corpora

Install the command line tool so you can use the tool easily:

$ pip install -e .

Now you can run the tool with the command wikitrivia.

Advanced usage

By default, the tool will scrape the hard-coded sample articles listed in the --help and return its results to stdout.

Scraping a specific article

You can point the tool to a specific Wikipedia page by specifying its title:

$ wikitrivia 'William Shatner'

Be sure to include multi-word titles in quotes, or the tool will treat each word as a separate title.

Scraping multiple articles

You can scrape multiple articles at once by providing multiple titles:

$ wikitrivia 'Leonard Nimoy' 'George Takei' 'Nichelle Nichols'

Outputting to JSON

If you want to take this data elsewhere, you can output the results to a JSON file:

$ wikitrivia --output scotty.json 'James Doohan'

If you're using docker run, by default this will save scotty.json inside the container. Either mount the current directory with the -v option or just use fig instead, which mounts the directory as a volume automatically.

Methodology

Though I tried a few different approaches when developing this tool, in the end I had the most success with a rather simple methodology.

Finding the right ___________'s

  1. Only consider sentences in the summary section of an article. Sentences from the body often didn't make sense out of context.
  2. Never use the first sentence of the summary. It's usually too straightforward to make interesting trivia.
  3. Don't use a sentence that starts with an adverb. They usually depend too heavily on the idea of the previous sentence to make sense out of context.
  4. Blank out the first common noun in the sentence (e.g. 'painter', 'infantryman'). Proper nouns (e.g. 'Frank Sinatra', 'The White House') usually seemed too easy to guess when given the title of the article and the other words in the sentence.
  5. If that noun is part of a noun phrase, blank out the last two words of the phrase. Blanking out just one word seemed too easy if the phrase was recognizable.

Creating decoy answers

For sentences where just one word was blanked out, I also used WordNet to find similar words to the answer (the blanked out word). These words provide decoy answers during the trivia game.

My approach is to find the hypernym of the answer, and then select other hyponyms of that hypernym.

In the example in the "Sample usage" section, the correct answer is painter. The hypernym of painter is artist. The hyponyms I found for artist appear in the similar_words array in the output: "classic", "classicist", "constructivist", "decorator", "draftsman", "etcher", "expressionist", "illustrator".

Clearly there's still much room for improvement in all respsects of the methodology, but overall I was impressed with how far I could get with TextBlob, NLTK, and an introductory understanding of NLP.

More Repositories

1

spin-docker

A lightweight RESTful docker PaaS
Python
102
star
2

docker-tutorial

Introductory and advanced docker tutorials. Presented at PyOhio on 7/26/14
Python
53
star
3

hypermasher

A Node.js application which shows users a stream of the latest Hyperlapse videos set to chill music
JavaScript
21
star
4

docker-workshop

A two-hour Docker workshop for DockerDC
15
star
5

intro-to-docker

Links to all the source code and solutions I reference in my O'Reilly Introduction to Docker video tutorial
11
star
6

five-ways-to-deploy

The source code for my PyCon 2017 talk "5 ways to deploy you Python web app in 2017"
Python
10
star
7

imgur-uploader

A simple command line client for uploading files to Imgur
Python
10
star
8

docker-django

A simple Django application for use during my Docker tutorials
Python
8
star
9

db-conservatory

A front end for spin-docker which allows users to provision databases on demand
CSS
7
star
10

django-class-based-views-tutorial

A short tutorial about a unique Django feature: Class-based views
Python
7
star
11

wikitrivia

A trivia game based on NLP-extracted Wikipedia questions
JavaScript
6
star
12

sd-django

A Django + Gunicorn Dockerfile compatible with spin-docker
Python
6
star
13

zero-to-production

A sample Flask application to accompany my Zero to Production
Python
6
star
14

tweetcheck

A dead-simple review process for your organization's tweets
Python
5
star
15

mongo-example

A Dockerfile for a MongoDB server
Shell
3
star
16

nginx-example

A Dockerfile for a simple static website served by Nginx
HTML
3
star
17

sd-postgres

A PostgreSQL Dockerfile compatible with spin-docker
Shell
3
star
18

flask-example

A sample Flask app
Python
2
star
19

speck

Speck is an open source, privacy-first local AI agent application
Python
2
star
20

django-example

A Dockerfile for a sample Django application
Python
1
star
21

redis-example

A Dockerfile for a redis server
Shell
1
star
22

resume

A resume built for the web that's not half bad as a PDF
HTML
1
star
23

co-organizer

A small web app to help meetup organizers get attendees into their events
Python
1
star
24

flask-tutorial

Python
1
star
25

twiml-boomerang

A microservice for your Twilio fallback URLs that gives your original request another chance
Python
1
star
26

autopilot-text-adventure

A small companion repository for my Twilio Autopilot holiday text adventure app
1
star
27

docker-flask

A simple Flask application for use in my Docker tutorials
Python
1
star
28

sd-mongo

A MongoDB Dockerfile compatible with spin-docker
Shell
1
star