• Stars
    star
    211
  • Rank 186,867 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created over 7 years ago
  • Updated about 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tool to extract news articles from newspaper and give the context about the news

Sharingan

Build Status Docker Automated build

Sharingan is a tool built on Python 3.6 using OpenCV 3.2 to extract news content as text from newspaper’s photo and perform news context extraction.

For more details and explanation, please refer the blog post here: http://vipul.xyz/2017/03/sharingan-newspaper-text-and-context.html

How it works?

News Extraction

Capture Image

Alt

Canny Edge Detection

Alt

Dilation

Alt

Contour Detection

Alt

Contour Approximation and Bound Box

Alt

Manual Mode

Alt

Alt

Context Extraction

The segmentation done above gives the following result after context extraction:

    [‘residential terraces’, ‘busy markets’, ‘Puppies’, ‘inhumane conditions’, ‘popular e-commerce sites’, ‘Sriramapuram’, ‘Russell Market’, ‘issue licences’,
    ‘meeting conditions’, ‘positive impact’, ‘pet owners’, ‘R. Shantha Kumar’, ‘welfare ofïŹcer’, ‘Animal Welfare Board’, ‘India’]
    [‘Kittie’]
    [‘Compassion Unlimited’]
    [‘public spaces’, ‘Animal’, ‘rights activists’, ‘civic body’, ‘Bengaluru’],
    [‘BENGALURU’, ‘Bruhat Bengaluru Mahanagar Palike’, ‘Dane’, ‘English Mastiff’, ‘Bulldog’, ‘Boxer’, ‘Rottweiler’, ‘Bernard’, ‘Shepherd’, ‘Retriever’,
    ‘draft guidelines’, ‘sterilisation’, ‘pet dogs ’, ‘Owners’]

Installation

Installing OpenCV 3.2 from source Python 3.6

  • wget https://github.com/Itseez/opencv/archive/3.2.0.zip

  • unzip 3.2.0.zip

  • cd opencv-3.2.0

  • mkdir release && cd release

      cmake -DBUILD_TIFF=ON \
          -DBUILD_opencv_java=OFF \
          -DWITH_CUDA=OFF \
          -DENABLE_AVX=ON \
          -DWITH_OPENGL=ON \
          -DWITH_OPENCL=ON \
          -DWITH_IPP=OFF \
          -DWITH_TBB=ON \
          -DWITH_EIGEN=ON \
          -DWITH_V4L=ON \
          -DWITH_VTK=OFF \
          -DBUILD_TESTS=OFF \
          -DBUILD_PERF_TESTS=OFF \
          -DCMAKE_BUILD_TYPE=RELEASE \
          -DBUILD_opencv_python2=OFF \
          -DCMAKE_INSTALL_PREFIX=$(python3.6 -c "import sys; print(sys.prefix)") \
          -DPYTHON3_EXECUTABLE=$(which python3.6) \
          -DPYTHON3_INCLUDE_DIR=$(python3.6 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
          -DPYTHON3_PACKAGES_PATH=$(python3.6 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") ..
    
  • The output of above will be similar to this: output

  • make -j4

  • make install

Setting up Sharingan

  • git clone [email protected]:vipul-sharma20/sharingan.git
  • pip install -r requirements.txt

IMPORTANT: You will require some corpora and trained models for the code to run. You can refer to: http://www.nltk.org/data.html

  • Interactive Method:

      In [1]: import nltk
    
      In [2]: nltk.download()
    

Docker

Try out the code on Jupyter Notebook

  • docker build -t sharingan-docker .
  • docker run -p 8888:8888 -it sharingan-docker

Thanks to

I am no wizard. Big thanks to people who came up with these solutions and posts:

The Name?

See here: Sharingan

LICENSE

This project is licensed under MIT License:

Copyright (c) 2017-2018: Vipul Sharma

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

This project uses following external libraries, which have their own licenses:

More Repositories

1

document-scanner

An OpenCV based document scanner
Python
794
star
2

gesture-opencv

watch http://goo.gl/fui2MH
Python
123
star
3

slate

Self hosted Slack app for daily standups
Python
79
star
4

tayuya

Python library to generate guitar tabs from MIDI files
Python
77
star
5

rubiks-cube-opencv

Solution of rubiks cube using python-opencv
Python
33
star
6

summrizer

A script to get summary of text content
Python
32
star
7

gesture-pacman

playing pacman with gestures
Python
31
star
8

nltk-api-server

API server for NLTK
Python
23
star
9

vim-browser-tabs

Vim plugin to fuzzy search tabs opened in all the browser windows and switch.
Vim Script
19
star
10

midi-macro

Use your MIDI controller (pads, knobs, sliders, keys etc.) to trigger macros
Go
15
star
11

vim-cricket

Vim plugin to get scores and commentary of live cricket matches
Vim Script
12
star
12

nvim-jira

A Neovim Jira plugin
Lua
12
star
13

async-go-py

Async: Coroutines vs Goroutines
Python
11
star
14

slack-doc

Slack app to export conversation threads to documents and more.
Python
8
star
15

nvim-speech

Neovim plugin to record and convert speech to text.
Python
8
star
16

nvim-config

My neovim config
Lua
8
star
17

audio-mouse

Control mouse with audio input
Python
8
star
18

blue-object-tracker-opencv

This script tracks blue objects using python-opencv
Python
4
star
19

vipul.xyz

JavaScript
4
star
20

docker-opencv3-python3

Docker image for OpenCV 3.2 with Python 3.6 🐳
3
star
21

hiruko

Threads vs Python's Global Interpreter Lock
Python
3
star
22

vim-registers

Fetch, select and search vim registers
Vim Script
3
star
23

Face-detect-OpenCV

Realtime face detection using OpenCV, python using haarcascade classifier
Python
3
star
24

takamaru

Customized notifications and newsletters for Reddit and GitHub
Python
3
star
25

tensorflow-neural-networks

Code examples of Neural Networks using Tensorflow
Python
3
star
26

compressr

A text compression script
Python
3
star
27

stack-analyze

Python
3
star
28

tayuya-server

Server application to serve https://github.com/vipul-sharma20/tayuya
JavaScript
2
star
29

docker-dev

My minimal dev environment in a Docker container
Dockerfile
2
star
30

twinkle

CLI to configure and control ws2812x LED effects connected via Raspberry Pi.
Python
2
star
31

hide-fb-posts

Hide Facebook news feed posts
JavaScript
2
star
32

python-linkedin

extract user's linkedin profile using linked connection API
Python
2
star
33

watcher

Utility tool to run commands on file changes
Python
2
star
34

nvim-outline

Plugin to search and edit Outline wikis in Neovim
Python
2
star
35

tweet-analysis

Python
2
star
36

my-vimrc

vimrc which I use in my machine
Vim Script
2
star
37

online-store

Django RESTful API for an online store
Python
1
star
38

made-in-x

An analysis on where the products I own are manufactured in: https://vipul.xyz/2020/10/made-in-x
Python
1
star
39

bikochu

Python
1
star
40

face-study

Study of face data set
Python
1
star
41

pi-bluetooth

Shell
1
star
42

python-neural-networks

Code examples of Neural Networks in Python
Python
1
star
43

python-design-patterns

Python
1
star
44

node-summrizer

Web application for text summarization
1
star
45

web-crawler-data-extraction

A web crawler made in python using Scrapy framework which crawls www.bloomberg.com and scrapes all the articles of the website
Python
1
star
46

rpi-nfs-server

Shell script to setup NFS server on raspberry pi with a USB Hard Disk.
Shell
1
star
47

go-playground

My Go playground
Go
1
star
48

django-HackerNews

A web application inspired by hacker news. www.loremipsum.co.in
HTML
1
star
49

flickr_feed

Flickr public feed flask app
Python
1
star
50

djacket

Mirror of https://github.com/Djacket/djacket with my own tweaks
Python
1
star
51

mysql-etl

ETL processing tool for MySQL in Python
Python
1
star
52

python-datastructures

Implementing data structures using python
Python
1
star
53

faadlaunda

Go
1
star
54

zoomcar-challenge

Django app for zoomcar challenge on hackerearth
JavaScript
1
star
55

hdfc-analytics

Analyze HDFC account statement
Python
1
star