• Stars
    star
    123
  • Rank 290,145 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 10 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

cut-up poetry generation over large corpora
                               o
       _   ,_    __   ,   __,      __
     |/ \_/  |  /  \_/ \_/  |  |  /
     |__/    |_/\__/  \/ \_/|_/|_/\___/
    /|
    \|

prosaic

being a prose scraper & cut-up poetry generator

by vilmibm

using nltk

and licensed under the GPL.

what is prosaic?

prosaic is a tool for cutting up large quantities of text and rearranging it to form poetic works.

prerequisites

  • postgresql 9.0+
  • python 3.5+
  • linux (it probably works on a mac, i donno)
  • you might need some -dev libraries and/or gcc to get nltk to compile

database setup

Prosaic requires a postgresql database. Once you've got postgresql installed, run the following to create a database prosaic can access (assumes you're on linux; refer to google to perform steps like this on osx/windows):

sudo su postgres
createuser prosaic -P
# at password prompt, type prosaic and hit enter
createdb prosaic -O prosaic

quick start

sudo pip install prosaic
prosaic source new pride_and_prejudice pandp.txt
prosaic source new hackers hackers_screenplay.txt
prosaic corpus new pride_and_hackers
prosaic corpus link pride_and_hackers pride_and_prejudice
prosaic corpus link pride_and_hackers hackers
prosaic poem new -cpride_and_hackers -thaiku

and so I warn you.
We will know where we have gone
ALL: HACK THE PLANET

See the full tutorial for more detailed instruction. There is also a cli reference.

use as a library

This is a little complex right now; I'm working on a simpler API.

from io import StringIO
from prosaic.cfg import DEFAULT_DB
from prosaic.models import Database, Source, Corpus, get_session
from prosaic.parsing import process_text
from prosaic.generate import poem_from_template

db = Database(**DEFAULT_DB)

source = Source(name='some_name')
process_text(db, source, StringIO('some very long string of text'))

session = get_session(db)
corpus = Corpus(name='sweet corpus', sources=[source])
session.add(corpus)
session.commit()

# poem_from_template returns raw line dictionaries from the database:
poem_lines = poem_from_template([{'syllables': 5}, {'syllables':7}, {'syllables':5}], 
                                db,
                                corpus.id)

# pull raw text out of each line dictionary and print it:
for line in poem_lines:
  print(line[0])

use on the web

there was a web wrapper at prosaic.party but it had some functionality and performance issues and I've taken it down for now.

write a template

Templates are currently stored as json files (or passed from within code as python dictionaries) that represent an array of json objects, each one containing describing a line of poetry.

A template describes a "desired" poem. Prosaic uses the template to approximate a piece given what text it has in its database. Running prosaic repeatedly with the same template will almost always yield different results.

You can see available templates with prosaic template ls, edit them with prosaic template edit <template name>, and add your own with prosaic template new <template name>.

The rules available are:

  • syllables: integer number of syllables you'd like on a line
  • alliteration: true or false; whether you'd like to see alliteration on a line
  • keyword: string containing a word you want to see on a line
  • fuzzy: you want to see a line that happens near a source sentence that has this string keyword.
  • rhyme: define a rhyme scheme. For example, a couplet template would be: [{"rhyme":"A"}, {"rhyme":"A"}]
  • blank: if set to true, makes a blank line in the output. for making stanzas.

example template

[{"syllables": 10, "keyword": "death", "rhyme": "A"},
 {"syllables": 12, "fuzzy": "death", "rhyme": "B"},
 {"syllables": 10, "rhyme": "A"},
 {"syllables": 10, "rhyme": "B"},
 {"syllables": 8, "fuzzy": "death", "rhyme": "C"},
 {"syllables": 10, "rhyme": "C"}]

full CLI reference

Check out the CLI reference documentation.

how does prosaic work?

prosaic is two parts: a text parser and a poem writer. a human selects text files to feed to prosaic, who will chunk the text up into phrases and tag them with metadata. the human then links each of these parsed text files to a corpus.

once a corpus is prepared, a human then writes (or reuses) a poem template (in json) that describes a desired poetic structure (number of lines, rhyme scheme, topic) and provides it to prosaic, who then uses the weltanschauung algorithm to randomly approximate a poem according to the template.

my personal workflow is to build a highly thematic corpus (for example, thirty-one cyberpunk novels) and, for each poem, a custom template. I then run prosaic between five and twenty times, each time saving and discarding lines or whole stanzas. finally, I augment the piece with original lines and then clean up any grammar / pronoun agreement from what prosaic emitted. the end result is a human-computer collaborative work. you are, of course, welcome to use prosaic however you see fit.

developing

Patches are more than welcome if they come with tests. Tests should always be green in master; if not, please let me know! To run the tests:

# assuming you have pip install'd prosaic from source into an activated venv:
cd test
py.test

changelog

  • 6.1.1
    • fix error handling; this was preventing sources from being made.
  • 6.1.0
    • default to a system-wide nltk_data directory; won't download and install to ~ if found. the path is /usr/share/nltk_data. this is probably only useful on systems where prosaic is installed globally for multiple users (like on tilde.town).
    • not tied to a release, but the readme has database setup instructions now.
  • 6.0.0
    • I guess I forgot to change-log 5.x, oops
    • process_text now takes a read()able thing instead of a string and a database config object as first param
    • parsing is faster but at the expense of less precision
    • slightly saner DB engine handling
  • 4.0.0
    • Port to postgresql + sqlalchemy
    • Completely rewrite command line interface
    • Add a --verbose flag and muzzle the logging that used to happen unless it's present
    • Support a configuration file (~/.prosaic/prosaic.conf) for specifying database connections and default template
    • Rename some modules
    • Remove some vestigial features
  • 3.5.4 - update nltk dependence so prosaic works on python 3.5
  • 3.5.3 - mysterious release i don't know
  • 3.5.2 - handle weird double escaping issues
  • 3.5.1 - fix stupid typo
  • 3.5.0 - prosaic now respects environment variables PROSAIC_DBNAME, PROSAIC_DBPORT and PROSAIC_DBHOST. These are used if not overriden from the command line. If neither environment variables nor CLI args are provided, static defaults are used (these are unchanged).
  • 3.4.0 - flurry of improvements to text pre-processing which makes output much cleaner.
  • 3.3.0 - blank rule; can now add blank lines to output for marking stanzas.
  • 3.2.0 - alliteration support!
  • 3.1.0 - can now install prosaic as a command line tool!! also docs!
  • 3.0.0 - lateral port to python (sorry hy), but there are some breaking naming changes.
  • 2.0.0 - shiny new CLI UI. run hy __init__.hy -h to see/explore the subcommands.
  • 1.0.0 - it works

further reading

More Repositories

1

gh-screensaver

full terminal animations
Go
187
star
2

puppet-tilde

a puppet module for setting up a tilde style server
HTML
97
star
3

gh-user-status

set and get github user statuses
Go
78
star
4

lovecraftcorpus

H.P. Lovecraft's collected writings
61
star
5

tildemush

a modern mush tailored to tilde town
Python
42
star
6

gh-contribute

Small gh extension that suggests issues to work on in a given GitHub repository
Go
29
star
7

murepl

Multi-user Clojure REPL. Live-programmable MUD. Web-based.
Clojure
22
star
8

node-cmudict

nodejs wrapper around the CMU Pronouncing Dictionary
JavaScript
21
star
9

gh-oblique

a gh extension that prints out an oblique strategy
Go
19
star
10

done

simple, elegant command line todo tool.
Python
16
star
11

node-prosaic

[v old and outdated] automated cut-up poetry over large corpora
CoffeeScript
15
star
12

dreamtv

the tv you've always dreamed of
Go
14
star
13

asciibooth

ascii photobooth CLI program that can output to dot matrix printer and or html gallery
Shell
14
star
14

smudge

a novel executable for the ritualistic conflagration and dissipation of plain text files
Go
12
star
15

gh-chat

chatting in the github cli
Go
8
star
16

weltanschauung

senior capstone project
Perl
8
star
17

hermeticum

Go
8
star
18

gh-extension-tutorial

tutorial for making GitHub CLI extensions in Go
8
star
19

randomwaite

use random internet content to assemble tarot cards
Python
7
star
20

nodeunit-b

convenient unit testing of browser-based javascript (no browser required)
JavaScript
7
star
21

node-roguelike

towards an ascii roguelike engine in node
JavaScript
6
star
22

gh-ask

gh-ask
Go
5
star
23

gh-mergeconflict

play a game about open source triage
Go
5
star
24

pdrss

simple rss feed reader for puredata
Python
4
star
25

gh-ext-dol

gh-ext-dol
4
star
26

birthday

simple program for finding users' birthdays on a social unix server
C
4
star
27

vilmibm

4
star
28

kindred

minimal, DIY static blog engine
JavaScript
4
star
29

deletethis

4
star
30

tildetown-admin

administrative web app for tilde.town
Python
3
star
31

javahaiku

generate haiku from java classnames
Java
3
star
32

tofuroll

a simple framework for making commandline applications
Python
3
star
33

veganguide

a reusable website for hosting vegan city guides.
JavaScript
3
star
34

oauth.js

javascript library for oauth (fork from googlecode version)
JavaScript
3
star
35

gh-echo

simple test program for gh extensions
Shell
3
star
36

tabtweet

tab-completion for twitter
JavaScript
3
star
37

werm

an externally programmable web framework with a text-based UI
JavaScript
3
star
38

jsmeetup20110719

some sample code for a javascript ATL meetup
JavaScript
3
star
39

gh-dungeon

an incomplete idea
Go
2
star
40

prauto

Go
2
star
41

prosaicweb

a web front-end to the prosaic cut-up poetry tool
Python
2
star
42

jsintro

a talk given at CMGd in May 2011
JavaScript
2
star
43

octoprose

JavaScript
2
star
44

bitrot

a set of tools for causing bit decay. like sugar for your software teeth.
Perl
2
star
45

townstats

Go
2
star
46

test4200

2
star
47

marimodemo

several examples and working demos for marimo
JavaScript
2
star
48

seitanwings

wings made of seitan
Python
1
star
49

hedwig

Python
1
star
50

hudba

node + music
JavaScript
1
star
51

spoke-words

creating creators
Python
1
star
52

classicrt

chromium extension to add "Classic RT" button to Twitter
JavaScript
1
star
53

shadowfun

tooling for running shadowrun 3rd ed games
Rust
1
star
54

tags2json

a (very) small python script for outputting json representation of audio tracks
Python
1
star
55

jslibs_tools_20120516

talk on js libraries and tools
JavaScript
1
star
56

gh-whoami

gh-whoami
Shell
1
star
57

cyberbog

cyberbog
Go
1
star
58

qux

1
star
59

tarot

simple js tarot tool
JavaScript
1
star
60

nodeunit-jsdom

a simple nodeunit setUp wrapper for testing front code
JavaScript
1
star
61

coffeescript-intro-20120502

being a talk given on 2012 May 2 at CMGDST
CoffeeScript
1
star
62

rhymes

Python
1
star
63

Wintermute

A Node JavaScript IRC bot framework
JavaScript
1
star
64

pydx2016

slides for my two pydx 2016 talks
1
star
65

seathree

translates things on twitter
Clojure
1
star
66

longway

wow! games sure have come a long way since
Go
1
star
67

akeley

mocking utility
JavaScript
1
star
68

shakesbot

shakespeare play performer for twitter
Ruby
1
star
69

haikuthegibson

random haiku from Hackers script
Python
1
star
70

gurgle

command line googling
Python
1
star
71

trunkless

Go
1
star
72

drift

convert text files into snow drifts
Go
1
star