• Stars
    star
    157
  • Rank 238,399 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created over 12 years ago
  • Updated over 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An elementary captcha decoder written in python

captcha-decoder

Build Status

This module takes a captcha (image) as input, attempts to partition it into discrete segments, each (it hopes) containing a single symbol, and then runs basic vector space search to determine the similarity of each symbol against known characters (whose reference images are included). The objective of this project is to (a) make bboyte's code more accessible and (b) illustrate, in a readable way, the fundamentals of captcha cracking. It's primary goal is clarity and makes no claims or attempts at efficiency, accuracy, or practicality.

This work is a derivation of an original work by @boyter [email protected], http://www.boyter.org/decoding-captchas/ (see origin tutorial at https://web.archive.org/web/20121012023114/http://www.wausita.com/captcha/)

Installation

On ubuntu, libjpeg-dev and libpng-dev may be system requirements for the Python Pillow (PIL) library

sudo apt-get install libjpeg-dev
sudo apt-get install libpng-dev

Next, fetch and build the decaptcha library

pip install git+https://github.com/mekarpeles/captcha-decoder.git

Usage

The decaptcha library comes with a command line utility called decaptca. Running the command with -h will show a list of options. The argument can be provided a filepath or a url:

usage: decaptcha [-h] [-v] [-l LIMIT] [-c CHANNELS] [-t THRESHOLD] [--min MIN]
                 [--max MAX] [-o TOLERANCE]
                 [<img>]

Python captcha cracking utility

positional arguments:
  <img>                 Enter the filesystem path or url of a captcha image

optional arguments:
  -h, --help            show this help message and exit
  -v                    Displays the decaptcha version
  -l LIMIT, --limit LIMIT
                        Package url
  -c CHANNELS, --channels CHANNELS
                        The number of prominant color channels to keep
  -t THRESHOLD, --threshold THRESHOLD
                        Accuracy threshold for matching decimal [0-1]
  --min MIN             Filter out colors darker than this [0-256]
  --max MAX             Filter out colors light than this [0-256]
  -o TOLERANCE, --tolerance TOLERANCE
                        Pixel tolerance for character segmentation. Higher is
                        more lenient/greedy, lower is strict.

Example

$ decaptcha http://www.mondor.org/img/capex.jpg  --min 0 --max 20 --limit 5 --channels 5 --tolerance 7

Character 0 of 7:
        t (confidence of 0.839150063096)
        e (confidence of 0.827405543276)
Character 1 of 7:
        0 (confidence of 0.834057656228)
        l (confidence of 0.771064160322)
Character 2 of 7:
        t (confidence of 0.309437274354)
        e (confidence of 0.303227199152)
Character 3 of 7:
Character 4 of 7:
        t (confidence of 0.267644230239)
        7 (confidence of 0.266067912114)
Character 5 of 7:
        0 (confidence of 0.834057656228)
        l (confidence of 0.789422830806)
Character 6 of 7:
        t (confidence of 0.835510535512)
        e (confidence of 0.835221298415)

Further Reading

The following implementations and techniques are recommended as more practical and accurate alternatives for this project:

  1. http://www.codeproject.com/Articles/106583/Handwriting-Recognition-Revisited-Kernel-Support-V

More Repositories

1

pypc

The Python3 Package Creator
Python
24
star
2

papers

`Papers` is an open community for sharing and discussing academic papers
Python
23
star
3

math.mx

A comprehensive graph of mathematical domains and topics
JavaScript
20
star
4

waltz

Waltz lets you design web.py apps in 3/4 time
Python
12
star
5

iiif2

An implementation of the IIIF Image API 2.0 Specification
Python
10
star
6

lazydb

LazyDB is a simple shelve-based flatfile database for Python
Python
7
star
7

parseli

Parseli cooks public LinkedIn profile pages into json
Python
7
star
8

commonbook

My Life's Commonbook
Python
6
star
9

archive.org

A redesign concept of archive.org
CSS
6
star
10

practical-python_ebook

Python
5
star
11

quintet

Quintet is the story of Leina Grey, a young girl who founds an academy of prodigies to shield her from society.
4
star
12

tincan

Turn your unsmart phone into a smartphone
Python
3
star
13

nginxmon

pyinotify nginx error.log monitoring tool, complete with email alerts
Python
3
star
14

front-end-guide-2019

Quickstart guide to understanding modern front-end technologies and how to get started without researching forever
3
star
15

enpy

English syntax for Python
Python
2
star
16

dotfiles

Misc configs like screen.rc, etc
Shell
2
star
17

sendr

Semantic Email Client
Python
2
star
18

michaelkarpeles.com

Organizing my life's work
HTML
2
star
19

responsible-citizens-handbook

Responsible Citizens Handbook is a guide to understanding power and organizing movements
2
star
20

ePublish

ePublish is a thin s3 API which lets publishers/vendor register an eBook patrons curl payment and get a file download back
1
star
21

openlibrary.press

Explorations into open access publishing
1
star
22

iiif.directory

A community search engine for IIIF
Python
1
star
23

We-See-You

Live latent semantic analysis via the open graph protocol and more!
Python
1
star
24

chaitin

Python WebServer written with ws4py
Python
1
star
25

Flask-Routing

Alternative web.py style routing for Flask
Python
1
star
26

browser

An attempt at emulating a browser within a browser.
1
star
27

vipy

Vipy ncurses text editor
Python
1
star
28

peanutsforgood.org

Peanuts For Good website
HTML
1
star
29

mapthebrain.org

A website dedicated to curating an interactive mapping the parts of the brain
1
star
30

wordguess.app

Word Guess is a language-learning game where you guess answers to TabooGPT's hints in a target language
Python
1
star
31

todo.rip

The front-end for the todo.rip application
JavaScript
1
star
32

fromscrat.ch

A wiki which lets you collaboratively build anything from scratch
1
star
33

whyloop

Recursive explanations down to axioms.
1
star
34

un404.website

Bring a missing website back to life
JavaScript
1
star
35

waltz-example

Waltz is a web.py framework for designing web apps in 3/4 time
Python
1
star
36

booktrails

A simple website for creating curriculum: directed sequences of books
1
star
37

dungeons

Simple MUD telnet server in Python (twisted)
Python
1
star
38

xr

A multi-user collaborative exercise tracking application
Python
1
star
39

python-barista

Barista helps you manage your use of the Flask
Python
1
star
40

api.todo.rip

api.todo.rip is the API behind todo.rip: The rolling todo application
Python
1
star
41

notesclub

Book Notes Club
Ruby
1
star
42

Deque

A set of macros for offering a flexible Dequeue solution + functional programming in C
C
1
star