• Stars
    star
    414
  • Rank 104,550 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A super lightweight image processing algorithm for detection and extraction of overlapped handwritten signatures on scanned documents using OpenCV and scikit-image.

"Signature Extraction" based connected component analysis

A design and implementation of a super lightweight algorithm for "overlapped handwritten signature extraction from scanned documents" using OpenCV and scikit-image on python. Please contact if you need professional signature detection & recognition & segmentation & counting project with the super high accuracy!


  • Input = The scanned document
  • Output = The signatures exist on the input

TODOs:

  • "Outliar Removal" module will be improved to boost the signature extraction algorithm.
  • CNN based "Signature Recognition" module will be developed.
  • "Signature Spoofing Detection" algorithm will be developed.
  • "Signature Detector (bounding box) & Counter" module will be developed.
  • "Accuracy of detection on SigSA: On-line Handwritten Signature Database" will be calculated and shared.

Demo of a Real-life Application of Signature Extraction Algorithm

You can find a sample project that is developed on top of "signature extractor" algorithm to extract the signatures on the digital photo of the document. Here are the functionalities of this sample project:

Sample Test Results of Signature Extraction Algorithm

- Sample result#1:

Explanation: For this case, the signature extraction algorithm can extract the 3 different handwritten signatures successfully. Just a very small portion of the signature, which is located at top-left, is lost because this part is not connected with the whole signature line so the algorithm interprets it is not a part of the signature.

- Sample result#2:

Explanation: For this case, signature extraction algorithm can extract 2 handwriteetn signatures from the whole textual data but it can not remove the lines, that are located at bottom-center, because the signature has big connected pixels so the algorithm sees them as signatures.

- Sample result#3:

Explanation: Some parts of the signatures are lost because they are not connected with the big connected components so the algorithm sees that they are not a part of signatures. They can cathc by setting the threshold value to a bigger value.

Theory

Main pipe-line

Theoritical Background

As already mentioned that the algorithm can extract the signatures from scanned documents based on "connected component analysis" so what is connected component algorithm then?: In image processing, a connected components algorithm finds regions of connected pixels which have the same value!

You can find more detailed information about the connected component analysis in here.

Thus, the connected components can be found and labelled by a cool functionality that is provided by scikit-image library! But why do we need it? Please just check the scanned documents, you can see that the biggest connect components are belongs the handwritten signatures! If we can get the biggest connected components, we can get the signatures from whole documents! However, we can also get the undesired lines or different shapes that have big connected components, right? So we also need a threshold value to get rid of them...

Calculating the threshold value to get rid of the outliars:

I've calculated the threshold value to detect the outliars (any lines, shapes and texts are not a part of the signatures) via performing many experiments. I've got an equation ,are calculated based experiement results, which are works pretty good for most of the scanned documents are a4 sized.

Detect and remove the small size outliars:

Here the code parts that start on signature_extractor.py - line#60:

# experimental-based ratio calculation, modify it for your cases
# a4_small_size_outliar_constant is used as a threshold value to remove connected outliar connected pixels
# are smaller than a4_small_size_outliar_constant for A4 size scanned documents
a4_small_size_outliar_constant = ((average/constant_parameter_1)*constant_parameter_2)+constant_parameter_3
print("a4_small_size_outliar_constant: " + str(a4_small_size_outliar_constant))

I determined the equation (x stands for scanned document size such as A4 or A0):

  • ax_small_size_outliar_constant = ((average/constant_parameter_1) * constant_parameter_2) + constant_parameter_3

based full of my experiments. You can modify it for your cases and also the scanned document size such as for A0 and so on... Just configure the constants:

  • constant_parameter_1
  • constant_parameter_2
  • constant_parameter_3

perform many experiments with the different parameter values till get the highest accuracy!

Detect and remove the big size outliars:

Here the code parts that start on signature_extractor.py - line#66:

# experimental-based ratio calculation, modify it for your cases
# a4_big_size_outliar_constant is used as a threshold value to remove outliar connected pixels
# are bigger than a4_big_size_outliar_constant for A4 size scanned documents
a4_big_size_outliar_constant = a4_small_size_outliar_constant*constant_parameter_4
print("a4_big_size_outliar_constant: " + str(a4_big_size_outliar_constant))

I determined the equation (x stands for scanned document size such as A4 or A0):

  • ax_big_size_outliar_constant = ax_small_size_outliar_constant*constant_parameter_4

based full of my experiments. You can modify it for your cases and also the scanned document size such as for A0 and so on... Just configure the constant:

  • constant_parameter_4

perform many experiments with the different parameter values till get the highest accuracy!

Installation

1.) Python and pip

Python is automatically installed on Ubuntu. Take a moment to confirm (by issuing a python -V command) that one of the following Python versions is already installed on your system:

  • Python 3.3+

The pip or pip3 package manager is usually installed on Ubuntu. Take a moment to confirm (by issuing a pip -V or pip3 -V command) that pip or pip3 is installed. We strongly recommend version 8.1 or higher of pip or pip3. If Version 8.1 or later is not installed, issue the following command, which will either install or upgrade to the latest pip version:

$ sudo apt-get install python3-pip python3-dev # for Python 3.n

2.) scikit-image

On all other systems, install it via shell/command prompt:

pip install scikit-image

If you are running Anaconda or miniconda, use:

conda install -c conda-forge scikit-image

See details in here.


  • After completing these 2 installation steps that are given at above, you can test the project by this command:

    python3 signature_extractor.py
    

Citation

If you use this code for your publications, please cite it as:

@ONLINE{hse,
    author = "Ahmet Özlü",
    title  = "Overlapped handwritten signature extraction from scanned documents",
    year   = "2018",
    url    = "https://github.com/ahmetozlu/signature_extractor"
}

Author

Ahmet Özlü

License

This system is available under the MIT license. See the LICENSE file for more info.

More Repositories

1

tensorflow_object_counting_api

🚀 The TensorFlow Object Counting API is an open source framework built on top of TensorFlow and Keras that makes it easy to develop object counting systems!
Python
1,305
star
2

vehicle_counting_tensorflow

🚘 "MORE THAN VEHICLE COUNTING!" This project provides prediction for speed, color and size of the vehicles with TensorFlow Object Counting API.
Python
851
star
3

color_recognition

🎨 Color recognition & classification & detection on webcam stream / on video / on single image using K-Nearest Neighbors (KNN) is trained with color histogram features by OpenCV.
Python
301
star
4

augmented_reality

💎 "Marker-less Augmented Reality" with OpenCV and OpenGL.
C++
204
star
5

defi_yield_farming

🚜 🌾 A DeFi app, which provides staking & farming functions are deploy-able with a website, for yield farming.
JavaScript
142
star
6

face_recognition_crop

Multi-view face recognition, face cropping and saving the cropped faces as new images on videos to create a multi-view face recognition database.
Python
141
star
7

vehicle_counting

Vehicle detection, tracking and counting by blob detection with OpenCV on c++.
C++
116
star
8

vehicle_counting_hog_svm

Vehicle detection, tracking and counting by SVM is trained with HOG features using OpenCV on c++.
C++
96
star
9

aipa

AIPA (A.I. Personal Assistant): Speech, Vision, Machine Learning and IoT based intelligent personal assistant for Ubuntu based Linux distributions.
Python
48
star
10

real_time_circle_detection_android

Real time circle detection and tracking by Hough Circle Transform with OpenCV on Android OS.
Java
44
star
11

human_computer_interaction

Fist, palm and hand detection & tracking for intelligent human-computer interaction game character movement control with OpenCV on Java (Processing sketchbook).
Processing
44
star
12

unity_calendar_ui

This is a well defined-basic "Calendar UI" for Unity.
C#
24
star
13

arlo_traffic_analysis

Vehicle detection, tracking, counting and speed prediction on videos with OpenCV.
Python
19
star
14

nonlinear_regression_keras

Training of a neural network for nonlinear regression prediction with TensorFlow and Keras API.
Jupyter Notebook
15
star
15

tcp_socket_programming_android_client

TCP Socket Programming implementation of client side on Android.
Java
12
star
16

decentralized_banking_system

🏦 Decentralized banking system is a DApp built on Ethereum blockchain with smart contract on solidity.
Solidity
9
star
17

object_detection_chooch

A sample project for super fast real time object detection and counting using CHOOCH AI API and OpenCV.
Python
9
star
18

smart_contract_helloworld

🔥 Smart contract hello world tutorial: write and deploy your first Ethereum smart contract!
Solidity
9
star
19

unity_camera_movements

"Main Camera" movements scripts for Unity (for Desktop and Gear VR platforms).
C#
8
star
20

tcp_socket_programming_c_server

TCP Socket Programming implementation of server side in c programming language.
C
8
star
21

the_smart_contract_burger_store

🍔 🍟 A real life case study for developing Ethereum smart contract for businesses: Building a smart contract to manage a burger store.
JavaScript
8
star
22

tensorflow_blob_analysis

A sample project to perform blob analysis on images with TensorFlow and Keras to detect blobs and analyze their shape features such as the presence, number, area, position, length, and direction of lumps.
Python
8
star
23

family_tree_warehouse_app

Ask relation, add/delete/update person, get information of any person, print the family tree, control under 18 age marriage and more...
Python
7
star
24

EncryptedMessengerApplication

Encrypted Messenger App: Sending and receiving messages (over TCP Socket) with high security (powered by RSA and AES).
Java
5
star
25

android_power_monitor

🔋 Android power monitor hardware with Arduino.
C++
4
star
26

java_native_interface

Java Native Interface to call "Java Code" from "C++ Program".
C++
3
star
27

family_tree

Ask relation, add/delete/update person, get information of any person, print the family tree, control under 18 age marriage and more...
Prolog
3
star
28

pattern_matching

Implementation and performance comparison of Boyer Moore, Horspool and Brute Force in c programming language.
C
1
star
29

ahmetozlu.github.io

1
star
30

ethereum-smart-contract-samples

💡 Small smart contract examples to understand basics and some details of developing Ethereum smart contracts!
Solidity
1
star
31

SimpleShell

This program is about a simple shell. The shell accepts user commands and then executes each command in a separate process.
C
1
star
32

sample_spring_microservice_case_study

🦅 An inventory management project for providing a sample microservice case study which contains Spring Cloud Eureka server & client, feign client, error handling & fault tolerance, API gateway, Spring Actuator, distributed log trace with Zipkin, centralized configuration and vault integration with Spring Cloud Config.
Java
1
star