• Stars
    star
    179
  • Rank 214,039 (Top 5 %)
  • Language
    R
  • License
    Other
  • Created almost 11 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Predict Gender from Names Using Historical Data

gender

CRAN_Status_Badge CRAN_Downloads

Guidelines and warnings

This package attempts to infer gender (or more precisely, sex assigned at birth) based on first names using historical data, typically data that was gathered by the state. This method has many limitations, and before you use this package be sure to take into account the following guidelines.

  1. Your analysis and the way you report it should take into account the limitations of this method, which include its reliance of data created by the state and its inability to see beyond the state-imposed gender binary. At a minimum, be sure to read our article explaining the limitations of this method, as well as the review article that is critical of this sort of methodology, both cited below.

  2. Do not use this package to study individuals: it is at most useful for studying populations in the aggregate.

  3. Resort to this method only when the alternative is not a more nuanced and justifiable approach to studying gender, but where the alternative is not studying gender at all. For instance, for many historical sources this approach might be the only way to get a sense of the sex ratios in a population. But ask whether you really need to use this method, whether you are using it responsibly, or whether you could use a better approach instead.

Blevins, Cameron, and Lincoln A. Mullen, “Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction,” Digital Humanities Quarterly 9, no. 3 (2015). http://www.digitalhumanities.org/dhq/vol/9/3/000223/000223.html

Mihaljević, Helena, Marco Tullney, Lucía Santamaría, and Christian Steinfeldt. “Reflections on Gender Analyses of Bibliographic Corpora.” Frontiers in Big Data 2 (August 28, 2019): 29. https://doi.org/10.3389/fdata.2019.00029.

Description

Data sets, historical or otherwise, often contain a list of first names but seldom identify those names by gender. Most techniques for finding gender programmatically rely on lists of male and female names. However, the gender associated with names can vary over time. Any data set that covers the normal span of a human life will require a historical method to find gender from names. This R package uses historical datasets from the U.S. Social Security Administration, the U.S. Census Bureau (via IPUMS USA), and the North Atlantic Population Project to provide predictions of gender for first names for particular countries and time periods.

Installation

You can install this package from CRAN:

install.packages("gender")

The first time you use the package you will be prompted to install the accompanying genderdata package. Alternatively, you can install this package for yourself.

# install.packages("remotes")
remotes::install_github("lmullen/genderdata")

Using the package

The gender() function takes a character vector of names and a year or range of years and uses various datasets to predict the gender of names. Here we predict the gender of the names Madison and Hillary in 1930 and again in the 2000s using Social Security data.

library(gender)
gender(c("Madison", "Hillary"), years = 1930, method = "ssa")
#> # A tibble: 2 × 6
#>   name    proportion_male proportion_female gender year_min year_max
#>   <chr>             <dbl>             <dbl> <chr>     <dbl>    <dbl>
#> 1 Hillary               1                 0 male       1930     1930
#> 2 Madison               1                 0 male       1930     1930
gender(c("Madison", "Hillary"), years = c(2000, 2010), method = "ssa")
#> # A tibble: 2 × 6
#>   name    proportion_male proportion_female gender year_min year_max
#>   <chr>             <dbl>             <dbl> <chr>     <dbl>    <dbl>
#> 1 Hillary          0.0055             0.994 female     2000     2010
#> 2 Madison          0.0046             0.995 female     2000     2010

See the package vignette for a fuller introduction and suggestions on how to use the gender() function efficiently with large datasets.

vignette(topic = "predicting-gender", package = "gender")

To read the documentation for the datasets, install the genderdata package then examine the included datasets.

library(genderdata)
data(package = "genderdata")

Citation

If you use this package, I would appreciate a citation.

citation("gender")
#> 
#> To cite the 'gender' package, you may either cite the package directly
#> or cite the journal article which explains its method:
#> 
#>   Lincoln Mullen (2021). gender: Predict Gender from Names Using
#>   Historical Data. R package version 0.6.0.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {gender: Predict Gender from Names Using Historical Data},
#>     author = {Lincoln Mullen},
#>     year = {2021},
#>     note = {R package version 0.6.0},
#>     url = {https://github.com/lmullen/gender},
#>   }
#> 
#> For the journal article, please cite:
#> 
#> Cameron Blevins and Lincoln Mullen, "Jane, John ... Leslie? A
#> Historical Method for Algorithmic Gender Prediction," _Digital
#> Humanities Quarterly_ 9, no. 3 (2015):
#> <http://www.digitalhumanities.org/dhq/vol/9/3/000223/000223.html>.

More Repositories

1

dh-r

Computational Historical Thinking: With Applications in R
TeX
60
star
2

jekyll-ebook

A Ruby script/gem to create EPUB books from Jekyll posts and pages using Pandoc
Ruby
48
star
3

ocr-makefile

A Makefile to run OCR on a batch of PDFs
Makefile
40
star
4

jekyll_figure

A Liquid figure tag for Jekyll sites
Ruby
33
star
5

slavery-map

The Spread of U.S. Slavery, 1790-1860
CSS
25
star
6

rmd-notebook

The (very nearly) simplest possible web notebook using R Markdown
HTML
25
star
7

civil-procedure-codes

Analysis repository for "The Spine of American Law: Digital Text Analysis and U.S. Legal Practice"
HTML
19
star
8

bibkeys

A Ruby utility to list all the citation keys in a BibTeX file
Ruby
16
star
9

geochecker

An R package to check the accuracy of geocoded coordinates using a Shiny gadget
R
14
star
10

dotfiles

My dotfiles
TeX
13
star
11

americas-public-bible

Code, data, and website for "America's Public Bible: A Commentary"
R
12
star
12

historical-us-boundaries

Map of changing US political boundaries in D3.js
JavaScript
10
star
13

academic-article-latex

An academic article LaTeX class
5
star
14

CV

My CV in LaTeX and Pandoc
TeX
5
star
15

lincolnmullen.com

My personal website
HTML
5
star
16

cchc

America's Public Bible for Computing Cultural Heritage in the Cloud
Go
5
star
17

jsr

The website of the Journal of Southern Religion, powered by Jekyll
HTML
4
star
18

omeka_client

A REST client for the Omeka API
Ruby
4
star
19

ats-corpus

A corpus of historical texts for the purpose of detecting similar documents and text reuse
R
4
star
20

legal-modernism

Law and legal practice modernized in the nineteenth-century United States. We are studying and visualizing the history of the modernization of American law.
Jupyter Notebook
4
star
21

gender-article

Article on predicting gender by Cameron Blevins and Lincoln Mullen
HTML
3
star
22

acadpaper

A LaTeX class for an academic paper
TeX
3
star
23

asch-2015-talk

Congregationalists map in d3.carto
JavaScript
3
star
24

omekaR

An Omeka API client in R
R
2
star
25

ocrquality

An R package to measure OCR quality
R
2
star
26

plugin-CatalogSearch

A plugin for Omeka that uses the subject field in an Omeka item to generate links to searches in catalogs, such as Archive Grid and the Library of Congress.
PHP
2
star
27

mullenMisc

An R package of functions I use across projects
R
2
star
28

chronam-ocr-debatcher

Turn a batch of OCR files from Chronicling America into a CSV that can be imported into a database
Go
2
star
29

nghis-simplifier

Simplify NGHIS shapefiles for, among other uses, CartoDB
Makefile
2
star
30

plugin-HonorThyContributors

An Omeka plugin to give credit to contributors to your Omeka site
PHP
1
star
31

plugin-AddItem

A plugin for Omeka that adds an "add item" link to the admin bar
PHP
1
star
32

paulist-missions

Mapping nineteenth-century Paulist missions in D3
JavaScript
1
star
33

universe

My R packages as built by R-Universe
1
star
34

sex-ratios-map

Sex ratios in US Counties
CSS
1
star