• This repository has been archived on 08/Mar/2021
  • Stars
    star
    420
  • Rank 103,194 (Top 3 %)
  • Language
    Jupyter Notebook
  • Created over 11 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tutorial and introduction into programming with Python for the humanities and social sciences

Python Programming for the Humanities

UPDATE March 8, 2021: This repository is no longer under development. Together with Mike Kestemont and Allen Riddell, I have published a much more comprehensive book about processing humanities data with Python. For more information see the website of the publisher: https://press.princeton.edu/books/hardcover/9780691172361/humanities-data-analysis

Join the chat at https://gitter.im/fbkarsdorp/python-course

The programming language Python is widely used within many scientific domains nowadays and the language is readily accessible to scholars from the Humanities. Python is an excellent choice for dealing with (linguistic as well as literary) textual data, which is so typical of the Humanities. In this book you will be thoroughly introduced to the language and be taught to program basic algorithmic procedures. The book expects no prior experience with programming, although we hope to provide some interesting insights and skills for more advanced programmers as well. The book consists of 10 chapters. Chapter 5 and Chapter 6 are still in draft status and not ready for use.

  • Chapter 1 starts with the very basics where we will try to whet your appetite. You will be asked to do many short quizzes to test whether you really understand the material.
  • Chapter 2 will introduce you to the task of text processing. You will learn how to read files from your computer, how to clean them and and how to compute a word frequency distribution.
  • Chapter 3 deals with preprocessing text. You will be introduced to some elementary tools to analyse your data.
  • Chapter 4 is a more theoretical chapter that explains some basic programming principles, common practices and where to find documentation.
  • In Chapter 5 things are becoming increasingly difficult. First, you will write a program to compute the readability of texts. Next, you will implement the basic algorithm that is behind authorship attribution!
  • In Chapter 6 we will introduce you to the concept of Object Oriented Programming. You will implement a network structure with which you can analyze relations between people on Twitter.
  • From Chapter 7 onwards, we will start working on more real applications. In Chapter 7 we will work on systems for searching through collections of text. We introduce you to the field of Information Retrieval and build a simple information retrieval system. This chapter furthers your knowledge about Object Oriented Programming.
  • In Chapter 8 we create a complete web application to search through your own library of PDF files. This will be our first real application ready for use by end-users. The chapter introduces you to many modules available in the standard library as well as third-party modules.
  • Chapter 9 will introduce you to some of the more advanced techniques used in automatic classification. We will implement a naive Bayes classifier, show you a number of evaluation metrics and strategies, and briefly address the question of parameter optimization.
  • Chapter 10 focuses on hierarchical clustering, one of the important methods for unsupervized learning. We explain the basic methods for doining hierarchical clustering and create a simple implementation in Python.

This document describes the installation procedure for all the software needed for the Python class. If you're stuck anywhere in the installation procedure, please do not hesitate to contact Folgert Karsdorp ([email protected]).

Sublime text

We advise you to install a good text editor, Sublime Text 2 for example. However, you are absolutely free to use your own favorite editor. For Sublime Text 2, go to http://www.sublimetext.com/, download the version for your operating system and install.

In the course, we will be using software that works best with Google Chrome. Firefox 6 (or above) and Safari will also work. Internet Explorer is not supported.

We will be using Python 3 for our course. Lower versions are more or less supported, but not recommended.

Installation

All platforms

We strongly advise you to install the Anaconda Python Distribution. This distribution contains all the necessary modules and packages needed for this course. It is available for all platforms and provides a simple installation procedure. You can download it from: http://continuum.io/downloads. More detailed installation instructions can be found here: http://docs.continuum.io/anaconda/install.html

Anaconda's default installation is Python 2.7. However, we will use Python 3 in this course. To install all necessary packages for Python 3, type

conda create -n py34 python=3.4 anaconda

followed by

source activate py34

at the command line. If you work on a Windows machine, use the following command instead:

activate py34

(If this doesn't work, have a look here: http://continuum.io/blog/anaconda-python-3). After that, you can start the course by double-clicking the file start-windows.bat (if you are working on Windows) or start-unix.sh if you work with Linux or start-osx.command if you work on Mac OS X.

Windows

Download and install the Anaconda Python Distribution (see above).

Double click the file start-windows.bat.

If everything goes right, this should open your browser (preferably Google Chrome or Firefox) on a page http://127.0.0.1:8888/ (or something similar) which says `IP[y]: Notebook'. If for some reason, the notebook is opened by Internet Explorer, copy the URL and paste that in either Google Chrome or Firefox.

OS X

Only take these steps if you know what you are doing. Otherwise, simply download and install the Anaconda Python Distribution (see above). After that, double click the file start-osx.command.

First, you will need to install Xcode from the App Store. After you have successfully installed Xcode, open Xcode and go to Xcode -> preferences -> Downloads. Now click on the install button next to commandline tools.

Open spotlight and type in `terminal' to open the terminal application. (You can also go to your applications folder and then to utilities where you'll find the terminal.app)

Cd to the folder where you downloaded or saved the file mac-installer.sh (probably in ~/Downloads) by using

cd /folder/of/mac-installer.sh 

Run the installer with the following command. The installer will download some packages and will request your password to install them.

. mac-installer.sh

To check your installation, relaunch the terminal.app. Then type:

ipython3 notebook --matplotlib=inline

If everything went well, this should open your browser (best with Google Chrome or Firefox) on the page http://127.0.0.1:8888/ which says IP[y]: Notebook.

Linux (Ubuntu/Debian)

Only take these steps if you know what you are doing. Otherwise, simply download and install the Anaconda Python Distribution.

First, open a terminal, then type

# Debian 8 / Ubuntu 16.04
$ sudo apt-get install python3 ipython3 ipython3-notebook numpy scipy matplotlib 

or

# Debian 9 / Ubuntu 17.04
$ sudo apt-get install python3 jupyter-notebook numpy scipy matplotlib 

If you run another Linux distribution, similar packages should be available. Finally execute the file start-unix.sh.

Static Notebooks

This is a fall-back method.

Chapter 1 - Getting started

Chapter 2 - First steps into text processing

Chapter 3 - Text Analysis

Chapter 4 - Programming principles

Chapter 5 - Building NLP applications

Chapter 6 - Objected Oriented Programming

Chapter 7 - Searching large Collections of Text

Chapter 8 - Practical: Searching your own PDF library

Chapter 9 - Learning from Examples

Chapter 10 - Learning without Supervision

Contributors

  • Folgert Karsdorp
  • Maarten van Gompel
  • Matt Munson

More Repositories

1

doc2vec

Tutorial and review of word2vec / doc2vec
CSS
104
star
2

diachronic-text-analysis

Diachronic text analysis in Python
Jupyter Notebook
27
star
3

melodic-similarity

Source code for "Learning Similarity Metrics for Melody Retrieval"
Python
27
star
4

tmi

Flask Interface to Thompson's Motif Index
CSS
16
star
5

spotify-chart

Python script to query https://spotifycharts.com
Python
14
star
6

alignment

Simple Python library for doing (multiple) sequence alignment
Python
14
star
7

twitter-workshop

Workshop materials for scraping Twitter with Python
Jupyter Notebook
13
star
8

pdfbrowser

Simple Flask webservice to search through your PDF collection using Whoosh
Python
11
star
9

.emacs.d

my emacs config
Emacs Lisp
9
star
10

pysofia

Python interface to sofia-ml
C++
9
star
11

manseeks

Simple yet fast concordancer in an Electron app
TypeScript
9
star
12

mbmp

Memory-based Morphological Parsing
Python
7
star
13

homebrew-lamachine

Brew formulas for installing NLP software developed by the Language Machines research group
Ruby
5
star
14

dreams

models and experiments with dream bank data
Python
3
star
15

unseen-species-tutorial

Tutorial about applying unseen species models to cultural data with Copia
Jupyter Notebook
3
star
16

storypy

A collection of scripts in Python to deal with stories (mainly folktales)
Python
3
star
17

pevo

Implementations of various models of Cultural Evolution in Python.
Python
3
star
18

cadence-detection

Scripts and experiments for cadence detection in Dutch folk songs
Python
2
star
19

dynamic-time-warping

Simple implementation of the Dynamic Time Warping algorithm in Python (and Cython)
C
2
star
20

textnet

Bootstrap Story Networks
Python
2
star
21

concy

Simple Concordance Tool
Python
2
star
22

PLM

Parsimonious Language Models in Python
Python
2
star
23

python-for-lunch

Brief tutorial on Python for the Humanities
Python
2
star
24

retelling-stories

Source files of my dissertation "Retelling Stories. A Computational-Evolutionary Perspective".
TeX
2
star
25

story-network-data

Data accompanying the paper on story networks
2
star
26

animacy-detection

Some experiments with animacy detection in Dutch Folktales
Python
2
star
27

nnfit

Classifying Evolutionary Forces in Language Change
Jupyter Notebook
2
star
28

matchmaker

Scripts accompanying the experiments about matchmaking in 17th century French plays
Python
1
star
29

MotifRetrieval

Python
1
star
30

folktale-nlp

Java
1
star
31

deepflow

Python
1
star
32

academic-privilege

Python
1
star
33

roodkapje

Annotation web app in Flask for project about Little Red Riding Hood
Python
1
star
34

meertens-song-collection

Collection of songs from the Dutch Song Database of the Meertens Institute
1
star
35

unseen-semantic-diversity

Using measures of functional diversity to estimate a lower-bound of the unseen variation in semantic embedding
Jupyter Notebook
1
star
36

dtwsom

Self Organizing Map with Dynamic Time Warping
Python
1
star
37

rhyme

rhyme detection library
Python
1
star