• Stars
    star
    609
  • Rank 73,614 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 9 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

根据姓名来判断性别

NGender

根据中文姓名猜测其性别

  • 不到20行纯Python代码(核心部分)
  • 无任何依赖库
  • 兼容python3, python2, pypy
  • 82%的准确率
  • 可用于猜测性别
  • 也可用于判断名字的男性化/女性化程度

使用

pip install ngender

或者(OSX)

brew install https://raw.githubusercontent.com/observerss/homebrew/61b3623967dc9507958dfb517e7f746baa96dcf1/Library/Formula/ngender.rb

然后在命令行中

$ ng 赵本山 宋丹丹
name: 赵本山 => gender: male, probability: 0.9836229687547046
name: 宋丹丹 => gender: female, probability: 0.9759486128949907

当然也可以在Python程序中用

>>> import ngender
>>> ngender.guess('赵本山')
('male', 0.9836229687547046)

>>> ngender.guess('宋丹丹')
('female', 0.9759486128949907)

>>> %timeit guess('宋丹丹')
100000 loops, best of 3: 4.01 µs per loop

原理

数学

贝叶斯公式: P(Y|X) = P(X|Y) * P(Y) / P(X)

当X条件独立时, P(X|Y) = P(X1|Y) * P(X2|Y) * ...

应用到猜名字上

P(gender=男|name=本山) 
= P(name=本山|gender=男) * P(gender=男) / P(name=本山)
= P(name has 本|gender=男) * P(name has 山|gender=男) * P(gender=男) / P(name=本山)

计算

  1. 文件charfreq.csv是怎么来的?

    曾经有个东西叫开房记录.avi(雾),里面有名字和性别, 2000w条, 统计一下得出

  2. 怎么算 P(name has 本|gender=男)?

    “本”在男性名字中出现的次数 / 男性字出现的总次数

  3. 怎么算 P(gender=男)?

    男性名出现的次数 / 总次数

  4. 怎么算 P(name=本山)?

    不用算, 在算概率的时候会互相约去

>>> ngender.guess('李胜男')
('male', 0.851334658742)

虽然两个字都很偏男性,但是结合起来就是女性名

More Repositories

1

textfilter

敏感词过滤的几种实现+某1w词敏感词库
Python
2,036
star
2

pygodaddy

3rd Party Client Library for Manipulating Go Daddy DNS Records.
Python
37
star
3

sinal2

Sina Level2 Data Fetcher 新浪Level2数据客户端
Python
34
star
4

aioutils

Python3 Asyncio Utils
Python
33
star
5

rproxy

a twisted-based reverse proxy
Python
30
star
6

python.ts

Run Python code in NodeJS
C++
11
star
7

distlimiter

distributed limiter 平滑分布式限速器
Python
7
star
8

ximg

image sharing app with django,tornado,mongodb/gridfs
JavaScript
6
star
9

paste2image

a tornado+mongodb web app similar to paste2.org; the difference: paste text, get image
Python
6
star
10

fasion-examples

紧跟时代潮流(-.-)的一些Example Code
JavaScript
5
star
11

socksgo

A minimal Socks5 Server that can switch outgoing IP
Go
4
star
12

pyv8-binaries

pyv8 linux binary (it's pain to compile all boost/v8/pyv8s)
3
star
13

bsc

Python binding for libbsc, a fast yet capable compressor
C
3
star
14

koi

3rd party Python client library for Aliyun
Python
2
star
15

yamo

Yet Another MongoDB ORM
Python
2
star
16

paq

Python binding for paq9a
C++
2
star
17

pyconchina-2014-talk

Practical Machine Learning in Python
JavaScript
2
star
18

shared-living

Python
2
star
19

socks5

a minimal async socks server that can switch outgoing ip
Python
2
star
20

wedding-hjcgjq

我的婚礼项目, 用sphinx产生各种文档,用pelican生成静态网页
JavaScript
2
star
21

detour2

detour2
Go
2
star
22

observerss.github.io

Personal Resume
HTML
1
star
23

config

Personal User Configs
1
star
24

pystorm

python native implement of famous "storm" realtime computing framework
Python
1
star
25

whichip

discover (IoT) device's IP in local network
JavaScript
1
star
26

aabills

iphone app for going dutch(arithmetic average your bills)
Objective-C
1
star