• Stars
    star
    134
  • Rank 270,143 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Little tools to download and then weed through images, delete and classify them into groups for building deep learning image datasets (based on crawler and tkinter)

FastClass

Version Python Style GitHub stars

A little set of tools to batch download images and weed through, delete and classify them into groups for building deep learning image datasets.

I wrote up a small blog post on my site www.christianwerner.net.

Installation

pip install git+https://github.com/cwerner/fastclass.git#egg=fastclass

The installer will also place the executables fcc and fcd in your $PATH.

The package currently contains the follwing tools:

Download images

Use fcd to crawl search engines (Google, Bing, Baidu, Flickr) and pull all images for a defined set of queries. In addition, files are renamed, scaled and checked for duplicates.

You provide queries and terms that should be excluded when naming the category folders. There is an example (guitars.csv) provided in the repository.

Usage

Call the script from the commandline. If you omit any input parameters it will show you the help page.

Usage: fcd [OPTIONS] INFILE

Options:
  -c, --crawler [ALL|GOOGLE|BING|BAIDU|FLICKR]
                                  selection of crawler (multiple invocations
                                  supported)  [default: ALL] (Note: BAIDU and FLICKR are not included in ALL option)
  -k, --keep                      keep original results of crawlers  [default:
                                  False]
  -m, --maxnum                    maximum number of images per crawler [default: 1000]
  -s, --size INTEGER              image size for rescaling  [default: 299]
  -o, --outpath TEXT              name of output directory  [default: dataset]
  -h, --help                      Show this message and exit.

  ::: FastClass fcd :::

  ...an easy way to crawl the net for images when building a dataset for
  deep learning.

  Example: fcd -c GOOGLE -c BING -s 224 example/guitars.csv

If you specify the -k, --keep flag a second folder called outpath.raw containing the original/ unscled images will be created.

Search file format

The csv file currently requires two columns (columns are seperated by a comma (,)) and each row defines a image class you want to download (see the guitars.csv file in the example folder). The first row contains a header which will be skipped.

Column 1 contains the search terms. You can specify multiple searchterms using space between them. If you want to require a search term enclose it in quotation marks (") (you can use the normal query syntax you'd normally use in a google search - i.e. filetype:jpg). In column 2 you can specify terms that should not be included in the final class names. An example would be that you want to add guitar to your search terms to help the search but don't need that term in the final folder class names. If you do not want to specify this column you can leave it blank (i.e., end the line with a comma).

Clean image sets

Once downloaded use fcc to quickly inspect the loaded files and rate or classify them. You can also mark them for deletion.

FastClass cleaner: fcc

Usage

Call the script from the commandline. If you omit any input parameters it will show you the help page.

Usage: fcc [OPTIONS] INFOLDER [OUTFOLDER]

  FastClass fcc

Options:
  --nocopy TEXT  disable filecopy for cleaned image set  [default: False]
  -h, --help     Show this message and exit.

  ::: FastClass fcc ::: ...a fast way to cleanup/ sort your images when
  building a dataset for deep learning.

  Note: In the application use the following keys: <1>, <2>, ... <9> for
  class assignments or quality ratings <space> assigns <1> <d> to mark a
  deletion <x> to terminate the app/ write output

  Use the buttons to navigate back and forth without changing the
  classification. The current classification of an image is given in the
  title bar (X indicated a mark for deletion). The counter in the titlebar
  gives number of classified images vs the total number in the input folder.

  In the output csv file 1,2 depcit class assignments/ ratings,  -1
  indicates files marked for deletion (if not excluded with -d).

Flickr Crawler

The Flickr crawler requires an API key. FastClass looks for the key in an environment variable called FLICKR_API_KEY. Request one from the Flickr API key application page.

FLICKR_API_KEY=asdf1234asdf456 fcd -c FLICKR my_project.csv

More Repositories

1

covid19

Dashboard for the COVID19 spread
Python
24
star
2

guitars-app

A web app to classify guitar models using a Convolutional Neural Net (CNN)
Python
16
star
3

deadtrees

Semantic Segmentation model for the detection of dead trees from ortho photos.
Python
8
star
4

xarray-101

A quick tutorial into geo-data processing with python (mostly xarray)
Jupyter Notebook
4
star
5

kit_micmor_summerschool_2019

Notebooks and info for the MiCMOR 2019 SummerSchool "Environmental Data Science: From Data Exploration to Deep Learning", IMK-IFU KIT Campus Alpin, Sept. 4 - 13 2019, Garmisch-Partenkirchen, Germany
Jupyter Notebook
3
star
6

lpjguesstools

Python
2
star
7

yieldgap-analysis-africa

Analysis and plotting notebooks for the YieldGap analysis paper
Jupyter Notebook
1
star
8

qapandas

Some experiments for extending pandas data structures with quality indicators and other metadata
Python
1
star
9

ldndc2nc

Postprocessor for LandscapeDNDC: translates default output to netCDF files
Python
1
star
10

edgar5-analysis-africa

A small analysis for Klaus of EDGAR v5 N2O Emissions in Africa
Jupyter Notebook
1
star
11

st-folium-demo

Streamlit/ folium plotting demo
Python
1
star
12

stations

Python
1
star
13

earthshape2_paleosims

Earthshape Phase 2 LPJ-GUESS paleo vegetation sims
Jupyter Notebook
1
star
14

phili_rice_sims

Analysis of the LandscapeDNDC Philippines Rice sims
Jupyter Notebook
1
star
15

crimeanalysis

Crime analysis for courser data science course
Jupyter Notebook
1
star
16

gans-galore

A collection of GANs for experimentation and learning
Python
1
star
17

tabnet-exp

Experiments with tabular deep learning using pytorch tabnet and fast.ai
Jupyter Notebook
1
star
18

ldndctools

Preprocessing tools for LandscapeDNDC (site.xml, climate files)
Python
1
star
19

fastai_ml_examples

My attempt to convert the fast.ai (v0.7) example notebooks to v1.0
Jupyter Notebook
1
star