• Stars
    star
    139
  • Rank 262,954 (Top 6 %)
  • Language
    HTML
  • License
    MIT License
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Evaluate existing engine of resume parse for Chinese 对各种简历解析工具的测评

resume-parse-evaluation

Evaluate existing engine of resume parse for Chinese 对各种简历解析工具的测评

paddler ocr 相關的有一個table structure extract 不知道能不能套上來

另外就是https://github.com/jamesturk/scrapeghost 看到一個gpt的應用 不知道有沒有可行性

https://github.com/hxu296/nlp-resume-parser

从简历中提取感兴趣的字段

Background

一般来讲,不同的候选者和公司所选择的招聘渠道的不同,我们会收到不同类型的简历:

1.1 各大招聘网站上的网页版或网站上提供下载的简历

应聘者在各网站上,包括外部的网站和自己公司的网站,按网站提供的固定模板填写信息,形成网页版或从网页下载的其它格式简历(统称网页版简历)。

针对网页版简历的基础解析技术,各公司准确率不相上下,难度在于针对每一个固定模板做细致的分析,简历解析技术的准确度依赖于勤奋和经验。 智联、51、拉勾、猎聘、

1.2 应聘者按照自己的想法和偏好,制作word、pdf、png,甚至是 excel 格式的电子文档简历。

这种简历格式五花八门,由于不是直接以网页代码的模式呈现,且没有相对固定的模板和关键词字段,就给系统识别带来了挑战。对于系统来讲,要寻找规律并进行判断和识别就会有一定的难度,这一类的简历解析无法做到百分百。

1.3 简历文件的格式

doc, docx, xls, xlsx, mht, mhtml, html, htm, txt, pdf, rtf, eml, wps, xml, dotx, msg, jpeg, jpg, gif, png, bmp 等格式,基本上覆盖了招聘市场上99%的简历格式。

1.4 简历中的数据

中英文混合,大概有100余个字段,涵盖基本信息、联系方式、期望职位、教育经历、工作经历、 项目经历、技能、语言能力、证书、自我评价等字段信息。

Tools

商业化软件和解决方案

公司名称 网站 网友评价 测试demo 支持格式 价格 部署方式 其他
捕鱼科技 http://www.buyu-tech.com/ http://www.cv-parser.com/ -- Y 不支持图片 pdf、doc、docx、html、htm、txt、zip、mht、ppt等 --- SaaS和本地部署
云解析 http://youyun.com 他家的demo效果很一般,稍微特殊一点的情况就处理不好; Y 21种简历格式 支持图片 saas
cv-extract 北京有本科技 http://k18.com.cn 这家做解析有一定的时间了,改头换面也包装了几家分店。总体效果还可以,但还存在不少的问题; Y 支持图片 SaaS和本地部署
北京云湾科技有限公司 http://resumesdk.com 这家比较低调,感觉不太善于宣传,但总体效果是三家中最好的,特别是能很好解决不同网站模板包括自由格式的简历,而且上手很简单方便。 Y 支持图片 SaaS和本地部署
德士达科技公司 http://www.daxtra.cn/ 简历解析Daxtra 做得相当不错,我听说他们香港有设点也在国内推广,听说是爱丁堡大学教授开发的算法。蛮多外国大公司和猎头用他们的。 N -- -- --
杭州少世科技有限公司 www.littleparser.com 小析简历解析,免费试用现在,国际大数据团队研发的 Y 不支持图片 -- SaaS --
山卡拉 http://cv-extract.com/ -- Y 不可用 不支持图片 -- SaaS ---
CV Tech 简历 http://www.jianlijiexi.com/ -- Y 支持图片 -- SaaS和本地部署 ---
大易 http://www.dayee.com/wt/dayee/dayeePageresume -- N 不支持图片 -- -- --
麦穗简历洞察 https://www.mesoor.com/resume-insight.html -- N 不支持图片 -- -- --

开源库

Benchmark resumes

从网络上收集一些公开获取的简历数据,不适之处请联系删除

Todos

  • xxxxx.

Prospective project tree:

  ├── README.md
  ├── resume-samples
  |   ├── pdf
  |   |   └── ca-warn-2013
  |   |       ├── 001.csv
  |   |       ├── 002.csv
  |   |       └── 003.csv
  |   ├── word
  |   |   └── ca-warn-2013
  |   |       ├── 001.csv
  |   |       ├── 002.csv
  |   |       └── 003.csv
  |   ├── html
  |   |   └── ca-warn-2013
  |   |       ├── 001.csv
  |   |       ├── 002.csv
  |   |       └── 003.csv
  |   ├── txt
  |   |   └── ca-warn-2013
  |   |       ├── 001.csv
  |   |       ├── 002.csv
  |   |       └── 003.csv
  |   ├── excel
  |   |   └── ca-warn-2013
  |   |       ├── 001.csv
  |   |       ├── 002.csv
  |   |       └── 003.csv
  |   ├── mdht
  |   |   └── ca-warn-2013
  |   |       ├── 001.csv
  |   |       ├── 002.csv
  |   |       └── 003.csv      
  ├── results
  |   ├── pdf
  |   |   └── 捕鱼
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云解析
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv     
  |   |   └── 有本科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云湾科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv             
  |   ├── html
  |   |   └── 捕鱼
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云解析
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv     
  |   |   └── 有本科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云湾科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv 
  |   ├── txt
  |   |   └── 捕鱼
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云解析
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv     
  |   |   └── 有本科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云湾科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv 
  |   ├── excel
  |   |   └── 捕鱼
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云解析
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv     
  |   |   └── 有本科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云湾科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv 
  |   ├── word
  |   |   └── 捕鱼
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云解析
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv     
  |   |   └── 有本科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云湾科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv 
  |   ├── mdht
  |   |   └── 捕鱼
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云解析
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv     
  |   |   └── 有本科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv
  |   |   └── 云湾科技
  |   |   |   ├── 001.csv
  |   |   |   ├── 002.csv
  |   |   |   └── 003.csv 

Example test suite and results

java -jar \
    bins/tabula-0.9.1-jar-with-dependencies.jar --pages all \
    pdfs/nypd-weekly-stats.pdf \
    > results/tabula-java/nypd-weekly-stats.csv

java -jar \
    bins/tabula-0.9.1-jar-with-dependencies.jar --pages all \
    pdfs/menlo-park-sunridge-cad-interface.pdf \
    > results/tabula-java/menlo-park-sunridge-cad-interface.csv

More Repositories

1

awesome-ocr

A curated list of promising OCR resources
1,649
star
2

awesome-microservice

A curated list of Microservice resources
485
star
3

healthcaredatastandard

healthcare data standard in China
373
star
4

youtube-auto-upload

Schedule and Publish contents erverywhere.Bulk auto video upload and Scheduling & Publishing Effortless for You & Your Entire Team. batch headless upload all major social networks using this ultimate social media scheduler. Fret less, save time, and generate more leads!
HTML
88
star
5

fhir-cn

FHIR中文版 the Chinese translation of FHIR
71
star
6

awesome-web-data-extractor

A curated list of promising Web Data Extractors resources
21
star
7

awesome-ipa

A curated list of awesome Intellegient RPA Robotic Process Automation resources.
20
star
8

healthdata

一些数据 卫生统计年鉴等等
17
star
9

wanghaisheng.github.io

我的博客
HTML
17
star
10

awesome-health-data-anonymity

医疗数据的匿名化研究
11
star
11

OHDSI-Research

对OHDSI的研究
HTML
11
star
12

clinical-decision-support-book

Survey of the State of the Art in structural clinical knowledge
CSS
9
star
13

ocr-arxiv-daily

Python
9
star
14

best-practices-of-api-creation-for-hit

本文主要介绍的是 ` API automatic building and creation 云原生架构下接口自动化构建在医疗信息化行业的应用实战`
PLpgSQL
9
star
15

autovideopublish

automatically upload 100 videos to youtube
Python
8
star
16

cdisc-standard

A collection of CDISC related standards in English and Chinese.
HTML
7
star
17

awesome-http-api

A curated list of http API or Restful API design related material
5
star
18

build-his-by-yourself

how to build all kinds of hospital information systems from scratch by your own staff
5
star
19

old-clinical-decision-support

Deprecated projects and material
Objective-C++
5
star
20

fhir-in-action

example and tutorial for fhir spec
5
star
21

ocr-paper-collector

Python
4
star
22

get-tiktok-user-video-list

scrape tiktok/douyin video list from specific user or keyword
Python
4
star
23

HealthKit

ios 8 HealthKit exampless
Objective-C
4
star
24

tiktok-trending-api-data-archive

Shell
3
star
25

awesome-dsl

A curated list of DSL resources — Edit
3
star
26

tinnitus-dtx

耳鸣的数字疗法
3
star
27

social-media-monitor-weekly-report

根据设定的品牌、关键词获取主流社交媒体上的一周动态 包括但不限于视频、发布视频的帐号 、评论数量、评论内容、评论热度,和简单的统计分析
Python
3
star
28

youtube-automation-toolkit

DIGITAL Command Language
3
star
29

diet-treatment-tcm

中医食疗
2
star
30

fhirplace

Open-source FHIR server
Java
2
star
31

hit-best-practices

基于微信群“HIT最有价值专家群”的精华内容整理而成,希望能给大家带来帮助
2
star
32

Principles-of-Health-Interoperability-HL7-and-SNOMED

试译稿
2
star
33

xiaogaojie

video index and comments collection b站评论 youtube评论收集
2
star
34

live-streaming-transcript-dataset

收集直播话术
Python
2
star
35

textile-defect-detection-ai

textile defect detection using ai
2
star
36

hospital-in-china

国内所有医疗机构的基本信息 省份 城市 等级 特色 说明
2
star
37

ace-attorney-story-video-auto-generation

Jupyter Notebook
2
star
38

datacenter4hospital

2
star
39

ocr-baby

📄 The official documentation site for OCR
JavaScript
2
star
40

3d-vision-paper-daily

Python
2
star
41

blockchain-in-healthcare

Patientory PokitDok
2
star
42

Artificial-Intelligence-and-National-Security

2017年7月,美国哈佛大学肯尼迪学院贝尔福科学与国际事务中心发布了题为《人工智能与国家安全的报告》,分析了人工智能(AI)技术对国家安全的潜在影响,并提出了3点目标和11个发展建议。报告全文132页。
2
star
43

supplements-tell

1
star
44

Cognitive-Behavioral-Therapy

认知行为疗法
1
star
45

shopify-order-alert-wechaty

TypeScript
1
star
46

Kokichi-Sugihara

杉原厚吉的collection
1
star
47

lion-digital-downloads

Digital downloads store using NextJS and Stripe and Supabase.
JavaScript
1
star
48

YouTube-Podcast

Template to transform your youtube channel into a Podcast hosted on Anchor.fm
1
star
49

clone-tools-in-top-1m-domain

TypeScript
1
star
50

azure_func_pywebio_wsgi_starter

Python
1
star
51

awesome-cp

有关小儿脑瘫的一切
1
star
52

fhirbase

Relational Storage for FHIR
PLpgSQL
1
star
53

healthcare-solution-operation-system

cloud-native-healthcare-solution
1
star
54

handbook-of-zhichuang-treatment

收集文献 秘方
1
star
55

awesome-healthcare-interoperability

Healthcare Interoperability
1
star
56

search-in-app

豌豆荚应用内搜索
1
star
57

Scientific-Advertising

translation of the book <Scientific Advertising> in Chinese
1
star
58

lawn-mowing-video-website

JavaScript
1
star
59

ridiculous-web5

1
star
60

himss-research

对HIMSS的研究 http://wanghaisheng.github.io/himss-research/
HTML
1
star
61

ComputationalHealthcare

Healthcare data processing and analysis library powering Computational Healthcare.
Python
1
star
62

make-rick-and-morty-style-video

HTML
1
star
63

make-xianjian-video

HTML
1
star
64

imageschi

Shell
1
star
65

astro-python-gui

how to build astro and python gui app
Astro
1
star
66

all-cities-around-the-world-with-same-latitude

和自己家乡处在同一纬度的城市有哪些呢
1
star
67

ai-chip-paper-and-showcase

以蜡笔小芯为虚拟人形象,将论文、案例以视频形式在b站、douyin传播
1
star
68

tcm-master

国医大师
1
star
69

awesome-hipaa

A curated list of HIPAA related material
1
star
70

Gemini

同款检测临时服务
Python
1
star
71

cda-in-action

cda R2 clinical document architecture in action
1
star
72

url2video-pdf

Python
1
star
73

newborn-and-healthcare

这里有新生儿护理保健的一切
1
star
74

awesome-wearable-device

curated list of resources about wearable device
1
star
75

chs-drg

整理过后的chs-drgs
1
star
76

building-great-team

learn from master
1
star
77

Alopecia-solution

脱发
1
star
78

awesome-walter-schloss

https://www.valuewalk.com/walter-schloss/
1
star
79

track-2b-customer-true-requirements

政绩 业绩 成绩
1
star
80

from-data-to-insight

insight means Tell a story to answer a question with your data.
1
star
81

tiktoka-studio-relivator-demo

TypeScript
1
star
82

Sync-YouTube-Podcast

Periodicly check YouTube RSS for new entries
Python
1
star
83

A-Survey-on-Wearable-Technology-History-State-of-the-Art-and-Current-Challenges

A Survey on Wearable Technology: History, State-of-the-Art and Current Challenges
1
star
84

copycat-account-detect

detect if there is similar social media account or your top video post by others
1
star
85

tiktoka-studio-gui

Python
1
star
86

tiktoka-studio-app-tauri-nextjs

TypeScript
1
star
87

run-a-profitable-hospital

医疗服务也是服务行业,如何借鉴其他行业的经验,形成一套 推高业务收入增长,降低成本和提升用户满意度的方法论
1
star
88

app-review-csv-to-webgal-scripts

Python
1
star
89

build-docker-for-serverless-deploy-starter

Shell
1
star
90

worker-kit-email

Develop transactional emails with SvelteKit on CloudFlare Workers
TypeScript
1
star
91

openresty-tutorial

notes through learning openresty
1
star
92

longtail-keywords-expand-tools-GUI

Python
1
star
93

WebGAL_Live_Demo

the demo page of WebGAL
HTML
1
star
94

truth-of-modern-business-model

商业模式-利益相关者的交易结构
1
star
95

common-lib

常用代码和库
Java
1
star
96

brandninja

Svelte
1
star
97

subscription-based-saas

📰 Anime.news is a subscription-based news application made with Next.js using Typescript, Prismic and FaunaDB.
TypeScript
1
star
98

ai-chips-community-growth-strategy

时值国产化芯片如雨后春笋爆发式涌现的时代,借鉴开源软件生态社区构建的思路,整理了自己对于这种软硬结合类产品的生态社区发展的一点点想法
1
star
99

twitch-bot-worker

Cloudflare Worker for Twitch Bots. This worker implements these services, age, fivem and subrecord.
JavaScript
1
star
100

mirth-connect-chinese

mirth connect docs in Chinese 更多有关mirth connect的中文文档 方便大家使用和学习
1
star