• Stars
    star
    1
  • Language
    HTML
  • License
    GNU General Publi...
  • Created almost 4 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An automated survey of literature and curricula surrounding ethics in data science. WIP.

More Repositories

1

text-matcher

A simple text reuse detection CLI tool.
Python
125
star
2

chapterize

A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books for computational text analysis.
Python
92
star
3

course-computational-literary-analysis

Course materials for Introduction to Computational Literary Analysis, taught at UC Berkeley in Summer 2018, 2019, and 2020, at Columbia University in Fall 2020, and again at UC Berkeley in Summer 2021 and 2022.
Jupyter Notebook
87
star
4

workshop-text-analysis-spacy

Materials for the workshop Advanced Text Analysis with SpaCy and Scikit-Learn, given at NYU during NYCDH Week 2017, at PyData NYC in Nov. 2017, and at Columbia University in 2018 and 2019.
Jupyter Notebook
82
star
5

corpus-db

A textual corpus database for the digital humanities.
Jupyter Notebook
59
star
6

dotfiles

My personal dotfiles, using Nix Flakes to configure my system(s).
Nix
36
star
7

macro-etym

A tool for analyzing the word histories of a text.
Python
31
star
8

gitenberg-experiments

Scripts for scraping metadata from Project Gutenberg books, via GITenberg.
Jupyter Notebook
19
star
9

corpus-list

A structured list of text corpora, created for use with a corpus downloader.
13
star
10

late-style-PCA

An attempt to experimentally test Edward Said's claims about late style using computational text analysis and principal component analysis.
Jupyter Notebook
10
star
11

allusion-detection

Computational intertextuality detection in Python. Fuzzy string matching, approximate string matching.
Jupyter Notebook
9
star
12

cenlab

A corpus of English-language novels combining the ~250 novels of the Corpus of English Novels with the Txtlab corpus of English novels.
Jupyter Notebook
9
star
13

md2mla

A script and accompanying templates to make an MLA-style paper from a markdown file. Requires Pandoc and LaTeX (xetex)..
TeX
8
star
14

milton-analysis

Text analysis of Paradise Lost and other poems by John Milton.
Jupyter Notebook
7
star
15

workshop-word-embeddings

Materials for a workshop in word embeddings, for NYC-DH Week, February 2019
Jupyter Notebook
7
star
16

workshop-dataviz-2017

An Introduction to Text Analysis and Visualization, Art of Data Visualization Week, April 2017, Columbia University
Jupyter Notebook
7
star
17

dissertation

A dissertation in computational literary analysis, called "The Eye of Modernism: Visual Imaginations of British literature, 1880-1930"
Jupyter Notebook
7
star
18

template-research-paper

A template for a research paper, which compiles to many file formats.
TeX
6
star
19

template-dissertation

A template for a modern, best-practices dissertation.
Haskell
5
star
20

jonreeve.com

My personal website, jonreeve.com, written in Haskell, using Ema.
TeX
5
star
21

course-cic-compling

Course materials for the course Computing in Context section in Computational Linguistics. Dept. of Computer Science, Columbia University, Fall 2021. Work-in-progress.
Jupyter Notebook
5
star
22

course-computational-literary-analysis-readings

Syllabus and course readings for Introduction to Computational Literary Analysis, a course taught at UC-Berkeley in Summer 2018, 2019, and 2020, and at Columbia University in Fall 2020.
Haskell
5
star
23

shakespeare-dialog-extractor

An application to extract dialog from Shakespeare plays, as encoded into TEI by the Folger Library.
Python
4
star
24

book-computational-literary-analysis

A textbook for the course, Introduction to Computational Literary Analysis. WIP
Jupyter Notebook
4
star
25

docmap

A project for creating new themes and customization functionality for the Omeka content management system.
PHP
4
star
26

conference-joyce-digital

Website and materials for the conference Joyce in the Digital Age, held at Columbia University on October 1st, 2017.
4
star
27

free-indirect-discourse-model

Modeling free indirect discourse in literature, using AI.
Jupyter Notebook
3
star
28

plato-analysis

Analyses of Platonic dialogues, including a Socratic dialogue generator.
Jupyter Notebook
3
star
29

course-word-embeddings

Course materials for "Meaningful Text Analysis with Word Embeddings," taught at the Digital Humanities Summer Institute, June 2021.
TeX
3
star
30

text-to-time-series

Experiments in text analysis, generating time series from texts.
Jupyter Notebook
2
star
31

occupations-experiment

Experiments in quantifying occupations as they're represented in fiction.
Jupyter Notebook
2
star
32

template-course-website

A website for a university course. Semantic by default.
TeX
2
star
33

sops

Research materials (literature review, bibliography) for the project A Safer Online Public Square
HTML
2
star
34

dissertation-prospectus

My ever-protean dissertation prospectus.
TeX
2
star
35

character-attribution

Probabilistic attribution of character voices in fiction.
Jupyter Notebook
2
star
36

htrc-experiments

Text analysis experiments with Hathi Trust Research Center literary datasets.
Jupyter Notebook
2
star
37

corpus-SHC

A fork of Martin Mueller's Shakespeare His Contemporaries corpus, originally located at https://github.com/martinmueller39/SHC, divided into submodules as an experiment.
1
star
38

html2tei

A tool to extract structured data from novels (starting with Project Gutenberg HTML files)
HTML
1
star
39

sent2tree

Alternative visualizations for SpaCy-parsed sentences, using ETE3.
Python
1
star
40

sentence-trees

Experiments with sentences as trees.
Jupyter Notebook
1
star
41

pg-srp

Stable Random Projections (SRP) of Project Gutenberg texts, for similarity tests
Jupyter Notebook
1
star
42

course-data-ethics

Draft syllabus for a course in data science ethics. WIP.
Jupyter Notebook
1
star
43

course-nyu-pit

Course materials for the New York University Institute in Public Interest Technology (NYU-PIT)
Jupyter Notebook
1
star
44

org-autolinks-mode

An emacs minor mode for automatically linking to org files, after typing the name of the file.
Emacs Lisp
1
star
45

hs-tei-transform

Experiments in transforming TEI XML, using Haskell
Haskell
1
star
46

workshop-intro-haskell

An introduction to functional programming in Haskell. A workshop given in October 2020 at Columbia University.
1
star
47

david-copperfield

An annotated edition of David Copperfield
HTML
1
star
48

dataviz-workshop

Materials for a workshop in text analysis and visualization, originally given at Columbia University in April 2016.
Jupyter Notebook
1
star
49

chaucer-macro-etym

Macro-etymological analyses of the Canterbury Tales.
Jupyter Notebook
1
star
50

corpus-mansfield-garden-party-TEI

A TEI edition of Katherine Mansfield's short story "The Garden Party."
Jupyter Notebook
1
star
51

persistent-homology

Experiments with NLP and persistent homology.
Jupyter Notebook
1
star
52

course-university-writing

Draft materials for the course "University Writing with Readings in the Data Sciences," taught at Columbia University in the fall of 2017. Students, please refer to CourseWorks instead of this repository.
HTML
1
star
53

course-multilingual-technologies

Course website for Multilingual Technologies and Language Diversity, taught at Columbia University by Prof. Smaranda Muresan and Dr. Isabelle Zaugg
Haskell
1
star