• Stars
    star
    250
  • Rank 162,397 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 7 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

keras project that parses and analyze english resumes

keras-english-resume-parser-and-analyzer

Deep learning project that parses and analyze english resumes.

The objective of this project is to use Keras and Deep Learning such as CNN and recurrent neural network to automate the task of parsing a english resume.

Overview

Parser Features

  • English NLP using NLTK
  • Extract english texts using pdfminer.six and python-docx from PDF nad DOCX
  • Rule-based resume parser has been implemented.

Deep Learning Features

  • Tkinter-based GUI tool to generate and annotate deep learning training data from pdf and docx files
  • Deep learning multi-class classification using recurrent and cnn networks for
    • line type: classify each line of text extracted from pdf and docx file on whether it is a header, meta-data, or content
    • line label classify each line of text extracted from pdf and docx file on whether it implies experience, education, etc.

The included deep learning models that classify each line in the resume files include:

  • cnn.py

    • 1-D CNN with Word Embedding
    • Multi-Channel CNN with categorical cross-entropy loss function
  • cnn_lstm.py

    • 1-D CNN + LSTM with Word Embedding
  • lstm.py

    • LSTM with category cross-entropy loss function
    • Bi-directional LSTM/GRU with categorical cross-entropy loss function

Usage 1: Rule-based English Resume Parser

The sample code below shows how to scan all the resumes (in PDF and DOCX formats) from a [demo/data/resume_samples] folder and print out a summary from the resume parser if information extracted are available:

from keras_en_parser_and_analyzer.library.rule_based_parser import ResumeParser
from keras_en_parser_and_analyzer.library.utility.io_utils import read_pdf_and_docx


def main():
    data_dir_path = './data/resume_samples' # directory to scan for any pdf and docx files
    collected = read_pdf_and_docx(data_dir_path)
    for file_path, file_content in collected.items():

        print('parsing file: ', file_path)

        parser = ResumeParser()
        parser.parse(file_content)
        print(parser.raw) # print out the raw contents extracted from pdf or docx files

        if parser.unknown is False:
            print(parser.summary())

        print('++++++++++++++++++++++++++++++++++++++++++')

    print('count: ', len(collected))


if __name__ == '__main__':
    main()

IMPORTANT: the parser rules are implemented in the parser_rules.py. Each of these rules will be applied to every line of text in the resume file and return the target accordingly (or return None if not found in a line). As these rules are very naive implementation, you may want to customize them further based on the resumes that you are working with.

Usage 2: Deep Learning Resume Parser

Step 1: training data generation and annotation

A training data generation and annotation tool is created in the demo folder which allows resume deep learning training data to be generated from any pdf and docx files stored in the demo/data/resume_samples folder, To launch this tool, run the following command from the root directory of the project:

cd demo
python create_training_data.py

This will parse the pdf and docx files in demo/data/resume_samples folder and for each of these file launch a Tkinter-based GUI form to user to annotate individual text line in the pdf or docx file (clicking the "Type: ..." and "Label: ..." buttons multiple time to select the correct annotation for each line). On each form closing, the generated and annotated data will be saved to a text file in the demo/data/training_data folder. each line in the text file will have the following format

line_type   line_label  line_content

line_type and line_label has the following mapping to the actual class labels

line_labels = {0: 'experience', 1: 'knowledge', 2: 'education', 3: 'project', 4: 'others'}
line_types = {0: 'header', 1: 'meta', 2: 'content'}

Step 2: train the resume parser

After the training data is generated and annotated, one can train the resume parser by running the following command:

cd demo
python dl_based_parser_train.py

Below is the code for dl_based_parser_train.py:

import numpy as np
import os
import sys 


def main():
    random_state = 42
    np.random.seed(random_state)

    current_dir = os.path.dirname(__file__)
    current_dir = current_dir if current_dir is not '' else '.'
    output_dir_path = current_dir + '/models'
    training_data_dir_path = current_dir + '/data/training_data'
    
    # add keras_en_parser_and_analyzer module to the system path
    sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
    from keras_en_parser_and_analyzer.library.dl_based_parser import ResumeParser

    classifier = ResumeParser()
    batch_size = 64
    epochs = 20
    history = classifier.fit(training_data_dir_path=training_data_dir_path,
                             model_dir_path=output_dir_path,
                             batch_size=batch_size, epochs=epochs,
                             test_size=0.3,
                             random_state=random_state)


if __name__ == '__main__':
    main()

Upon completion of training, the trained models will be saved in the demo/models/line_label and demo/models/line_type folders

The default line label and line type classifier used in the deep learning ResumeParser is WordVecBidirectionalLstmSoftmax. But other classifiers can be used by adding the following line, for example:

from keras_en_parser_and_analyzer.library.dl_based_parser import ResumeParser
from keras_en_parser_and_analyzer.library.classifiers.cnn_lstm import WordVecCnnLstm

classifier = ResumeParser()
classifier.line_label_classifier = WordVecCnnLstm()
classifier.line_type_classifier = WordVecCnnLstm()
...

(Do make sure that the requirements.txt are satisfied in your python env)

Step 3: parse resumes using trained parser

After the trained models are saved in the demo/models folder, one can use the resume parser to parse the resumes in the demo/data/resume_samples by running the following command:

cd demo
python dl_based_parser_predict.py

Below is the code for dl_based_parser_predict.py:

import os
import sys 


def main():
    current_dir = os.path.dirname(__file__)
    current_dir = current_dir if current_dir is not '' else '.'
    sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
    
    from keras_en_parser_and_analyzer.library.dl_based_parser import ResumeParser
    from keras_en_parser_and_analyzer.library.utility.io_utils import read_pdf_and_docx
    
    data_dir_path = current_dir + '/data/resume_samples' # directory to scan for any pdf and docx files

    def parse_resume(file_path, file_content):
        print('parsing file: ', file_path)

        parser = ResumeParser()
        parser.load_model('./models')
        parser.parse(file_content)
        print(parser.raw)  # print out the raw contents extracted from pdf or docx files

        if parser.unknown is False:
            print(parser.summary())

        print('++++++++++++++++++++++++++++++++++++++++++')

    collected = read_pdf_and_docx(data_dir_path, command_logging=True, callback=lambda index, file_path, file_content: {
        parse_resume(file_path, file_content)
    })

    print('count: ', len(collected))


if __name__ == '__main__':
    main()

Configure to run on GPU on Windows

  • Step 1: Change tensorflow to tensorflow-gpu in requirements.txt and install tensorflow-gpu
  • Step 2: Download and install the CUDA® Toolkit 9.0 (Please note that currently CUDA® Toolkit 9.1 is not yet supported by tensorflow, therefore you should download CUDA® Toolkit 9.0)
  • Step 3: Download and unzip the cuDNN 7.4 for CUDA@ Toolkit 9.0 and add the bin folder of the unzipped directory to the $PATH of your Windows environment

More Repositories

1

keras-anomaly-detection

Anomaly detection implemented in Keras
Python
364
star
2

keras-text-summarization

Text summarization using seq2seq in Keras
Python
283
star
3

keras-face

face detection, verification and recognition using Keras
Python
139
star
4

js-graph-algorithms

Package provides javascript implementation of algorithms for graph processing
JavaScript
135
star
5

keras-video-classifier

Keras implementation of video classifier
Python
112
star
6

java-reinforcement-learning

Package provides java implementation of reinforcement learning algorithms such Q-Learn, R-Learn, SARSA, Actor-Critic
Java
103
star
7

cpp-spline

Package provides C++ implementation of spline interpolation
C++
96
star
8

keras-text-to-image

Translate text to image in Keras using GAN and Word2Vec as well as recurrent neural networks
Python
62
star
9

js-simulator

General-purpose discrete-event multiagent simulation library for agent-based modelling and simulation
JavaScript
59
star
10

lua-algorithms

Lua algorithms library that covers commonly used data structures and algorithms
Lua
59
star
11

lua-graph

Graph algorithms in lua
Lua
57
star
12

keras-chatbot-web-api

Simple keras chat bot using seq2seq model with Flask serving web
Python
53
star
13

mxnet-audio

Implementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet
Python
51
star
14

cs-pdf-to-image

a simple library to convert pdf to image for .net
C#
40
star
15

keras-audio

keras project for audio deep learning
Python
39
star
16

cs-expert-system-shell

C# implementation of an expert system shell
C#
36
star
17

keras-recommender

Recommender built using keras
Python
35
star
18

keras-malicious-url-detector

Malicious URL detector using keras recurrent networks and scikit-learn classifiers
Python
34
star
19

js-regression

Package provides javascript implementation of linear regression and logistic regression
JavaScript
27
star
20

keras-sentiment-analysis-web-api

Web api built on flask for keras-based sentiment analysis using Word Embedding, RNN and CNN
Python
26
star
21

java-ssd-object-detection

Image SSD object detection in Java using Tensorrflow
Java
25
star
22

spring-websocket-android-client-demo

Demo on how to integrate spring websocket on the server with android client
Java
23
star
23

keras-question-and-answering-web-api

Question answering system developed using seq2seq and memory network model in Keras
Python
22
star
24

keras-fake-news-generator-and-detector

Fake news generator and detector using keras
Python
21
star
25

java-magento-client

Java client for communicating with Magento site
Java
21
star
26

spring-boot-spark-integration-demo

Demo on how to integrate Spring Data JPA, Apache Spark and GraphX with Java and Scala mixed codes
Java
18
star
27

keras-video-object-detector

Object detector in videos using keras and YOLO
Python
17
star
28

java-reinforcement-learning-flappy-bird

Demo of java-reinforcement-learning library using flappy bird
Java
16
star
29

keras-language-translator-web-api

A simple language translator implemented in Keras with Flask serving web
Python
15
star
30

cs-hidden-markov-models

HIdden Markov Models using C#
C#
14
star
31

keras-chinese-resume-parser-and-analyzer

keras project that parses and analyze chinese resumes
Python
13
star
32

java-dynamic-programming

Solving dynamic programming problems in Java
Java
13
star
33

spring-boot-excel-upload-demo

Demo project on how upload and process csv and excel file in the spring boot
Java
12
star
34

java-decision-forest

Package implements decision tree and isolation forest
Java
12
star
35

java-tensorflow-samples

Java sample codes on how to integrate with tensorflow
Java
12
star
36

mxnet-sentiment-analysis

Sentiment Analysis implemented using Gluon and MXNet
Python
11
star
37

keras-search-engine

A simple document and image search engine implemented in keras
Python
11
star
38

mxnet-recommender

Collaborative Filtering NN and CNN based recommender implemented with MXNet
Python
11
star
39

unity-tensorflow-samples

Unity project that loads pretrained tensorflow pb model files and use them to predict
Python
11
star
40

java-clustering

Package provides java implementation of various clustering algorithms
Java
11
star
41

java-audio-embedding

Audio classifier, encoder, and search engine in Java
Java
10
star
42

js-recommender

Package provides java implementation of content collaborative filtering for recommend-er system
JavaScript
10
star
43

pyalgs

Package pyalgs implements algorithms in Robert Sedgwick's Algorithms using Python
Python
10
star
44

spring-websocket-csharp-client-demo

Demo of connecting C# client to spring web application via websocket
Java
10
star
45

java-lda

Package provides java implementation of the latent dirichlet allocation (LDA) for topic modelling
Java
9
star
46

scrapy-projects

Projects using selenium, requests, bs4, and scrapy for web scraping on google images, google trends and others
Python
9
star
47

cs-moea

Multi-Objective Evolutionary Algorithms implemented in .NET
C#
9
star
48

java-local-outlier-factor

Package implements a number local outlier factor algorithms for outlier detection and finding anomalous data
Java
9
star
49

js-stats

Package provides the javascript implementation of various statistics and distribution
JavaScript
8
star
50

java-genetic-programming

Genetic-programming framework for various genetic programming paradigms such as linear genetic programming, tree genetic programming, gene expression programming, etc
Java
8
star
51

java-adaptive-resonance-theory

Package provides java implementation of algorithms in the field of adaptive resonance theory (ART)
Java
7
star
52

java-outliers

Package provide java implementation of outlier detection using normal distribution for multi-variate datasets
Java
7
star
53

java-basic-blockchain

Proof-of-concept blockchain implementation in Java
Java
7
star
54

cs-fuzzy-logic

Package provides C# implementation of fuzzy logic system
C#
6
star
55

java-libsvm

Package provides the direct java conversion of the origin libsvm C codes as well as a number of adapter to make it easier to program with libsvm on Java
Java
6
star
56

java-reinforcement-learning-tic-tac-toe

Demo of reinforcement learning using tic-tac-toe
Java
6
star
57

spring-security-csrf-android-demo

Demo on how to communicate android with spring security and CSRF enabled
Java
6
star
58

spark-ml-genetic-programming

Package provides java implementation of big-data genetic programming for Apache Spark
Java
6
star
59

java-ann-mlp

Package provides java implementation of multi-layer perceptron neural network with back-propagation learning algorithm
Java
6
star
60

java-text-embedding

Word embedding in Java
Java
5
star
61

java-data-frame

Package provides the core data frame implementation for numerical computation
Java
5
star
62

java-tensorflow-music

Music classification, music search, music recommender and music encoder implemented in Tensorflow and Java
Java
5
star
63

cs-feedback-control

A simple control system framework that provide tools for feedback controllers such as PID controller, kalman filters, fuzzy controller
C#
5
star
64

java-statistical-inference

Opinionated statistical inference engine with fluent api to make it easier for conducting statistical inference with little or no knowledge of statistical inference principles involved
Java
5
star
65

keras-image-to-image

Transform one image to another image in Keras using GAN
Python
4
star
66

cs-tree-genetic-programming

tree-based genetic programming implemented using C#
C#
4
star
67

keras-gan-models

Some generative adversarial network models that I studied
Python
4
star
68

java-glm

Generalized linear models for regression and classification problems
Java
4
star
69

cs-ffmpeg-mp3-converter

Convert audio file of other formats to mp3 using ffmpeg in .NET
C#
4
star
70

php-magento2-api-extensions

Some useful Magento2 API extensions
PHP
4
star
71

cs-grammatical-evolution

Grammatical evolution implemented using C#
C#
4
star
72

java-leetcode

My daily LeetCode solutions
Java
4
star
73

android-code-view

A code viewer with code syntax highlight for Android
Java
4
star
74

java-machine-learning-web-api

A simple machine learning web server that caters for small datasets
Java
4
star
75

java-regex-cultivator

Regex generator which use genetic programming evolve grok and and to automatically discover regex given a set of texts having similar structure
Java
4
star
76

spring-websocket-angular-4-demo

Demo on how to integrate spring websocket with angular 4 application
TypeScript
3
star
77

android-magento-client

android client for communicating with magento
Java
3
star
78

keras-timeseries-web-api

recurrent neural networks for timeseries prediction in Keras
Python
3
star
79

java-naive-bayes-classifier

Package provides java implementation of naive bayes classifier
Java
3
star
80

vagrant-magento-2.16

Vagrantfile for magento 2 and Ubuntu
ApacheConf
3
star
81

spring-boot-auth2-slingshot

The original spring-boot-slingshot project that is extended with Auth2 for login using Facebook and Google
Java
3
star
82

spark-opt-moea

Distributed Multi-Objective Evolutionary Computation Framework for Spark
Java
3
star
83

cpp-steering-behaviors

OpenGL Demo for Game Agent Steering + Flocking + Swarm Behaviors
C
3
star
84

js-svm

Package provides javascript implementation of support vector machines
JavaScript
3
star
85

cs-linear-genetic-programming

Linear Genetic Programming implemented in C#
C#
3
star
86

java-som

Package provides java implementation of self-organizing feature map (Kohonen map)
Java
3
star
87

spring-security-csrf-angular-4-demo

Demo on how to integrate angular 4 application with spring application that has spring security and CSRF enabled
Java
3
star
88

spring-websocket-java-client-demo

Demo on how to integrate spring websocket on the server with java client
Java
3
star
89

unity-magento-client

Magento client implemented in Unity3D
C#
3
star
90

cpp-mfc-fractal-art-iec-lgp

Interactive Evolutionary Computation for Fractal Arts using Linear Genetic Programming and MFC
C
2
star
91

cs-swarm-intelligence

Swam intelligence for numerical optimization implemented in .NET
C#
2
star
92

mxnet-text-to-image

Text to Image translation using Generative Adversarial Network and MXNet
Python
2
star
93

mxnet-vqa

Yet Another Visual Question Answering in MXNet
Python
2
star
94

keras-image-captioning

Image captioning using recurrent network and convolutional network in Keras
Python
2
star
95

cs-optimization-continuous-solutions

Local searches for continuous optimization implemented in C#
C#
2
star
96

spring-security-csrf-unity-client-demo

Java
2
star
97

cs-optimization-binary-solutions

Local search optimization for binary-coded solutions implemented in C#
C#
2
star
98

spring-boot-slingshot

slingshot project with spring boot and spring security and spring data jpa
Java
2
star
99

mxnet-image-to-image

Image to Image translation using MXNet and GAN
Python
2
star
100

cs-ipico-reader

C# IPICO Reader
C#
2
star