• Stars
    star
    139
  • Rank 262,954 (Top 6 %)
  • Language
    C
  • License
    Other
  • Created over 10 years ago
  • Updated about 10 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A show-case of a state-of-the-art image classifier on iOS devices from libccv.org

Klaus

This is a show-case of how to run a state-of-the-art image classifier from iOS devices.

Warning ahead, this is a show-case, and it is open-source. However, the ccv's Xcode project presented here will not be maintained. Using full ccv implementation: http://github.com/liuliu/ccv

Challenge

Deep learning based image classifier normally has large memory footprint. ccv's default image classifier uses around 220MiB memory, which is reasonable on desktop, but a bit too much on mobile devices. A deep learning based image classifier also comes with large data files (so-called pre-trained model), ccv's pre-trained model is about 110MiB, Caffe's is about 200MiB, and OverFeat's is about 1GiB. Delivering a mobile utility app with 100MiB data file is quite unreasonable.

In Klaus, accuracy is scarified for smaller memory footprint as well as smaller data files. Specifically, on imageNet 2012, ccv's default image classifier has top-5 missing rate at 16.17% (meaning that given an image, for total 5 guesses, ccv's default image classifier has 83.83% chances to get it right). The mobile-friendly image classifier has top-5 missing rate at 18.22%. This is achieved with a 19.3MiB pre-trained model.

Detail

The first difference of this new mobile-friendly image classifier is its full connect layer size. ccv's default image classifier follows Matt's model, and thus, the full connect layer has 4096 neurons. The new mobile-friendly image classifier has only 2048 neurons for that layer. This change effectively cut memory footprint by half, and it is where the accuracy loss comes from. The pre-trained mobile friendly model can be downloaded from: http://static.libccv.org/image-net-2012-mobile.sqlite3.

ccv's default image classifier already compresses its parameters with half-precision float-point. We've done comparison previously and it showed no loss of accuracy by going from 32-bit to 16-bit. Since full connect layer resides most model parameters, I've done more experiments on what accuracy loss we are looking at when going more aggressive on quantization.

I've using this code snippet to generate quantization table for full connect layers with K-mean algorithm: https://gist.github.com/liuliu/9117a0011a682ab231d3.

After the quantization table generated, full connect layer's model parameters are exported to a PNG file with this code snippet: https://gist.github.com/liuliu/970d97db15f47c196454.

With this code snippet: https://gist.github.com/liuliu/9c737fa53a62d7165f2c, the quantized model parameters are loaded back for analysis.

It turns out that the loss of accuracy is not much.

8-bit: 41.15% (1), 18.18% (5) (for one guess, it has 58.85% chances to get it right, for 5 guesses, it has 81.82% changes to get it right)

4-bit: 41.38% (1), 18.28% (5)

2-bit: 45.22% (1), 20.62% (5)

To strike the balance between accuracy and size, a mixed model is chose: the first full connect layer will be quantized to 4-bit, the other full connect layers will be quantized to 8-bit, which gives 18.20% top-5 missing rate.

The quantization table later is stored at half-precision inside the sqlite3 model with this code snippet: https://gist.github.com/liuliu/1f3e2c1fceb5f1b47dc5.

The full connect layer parameters are stored separately into a series of PNG files, you can use this code snippet to load it back: https://gist.github.com/liuliu/10b7b067ace070ee7e33.

The beautiful part of using PNG file to store parameters is that all optimization techniques available to PNG is now available to us. The generated PNG file can be further reduced with pngcrush -brute. In fact, 10% data file size reduced with pngcrush.

ccv's newer CPU implementation also uses NEON instruction set to speed up convolutional layer computation, thus, the forward pass can be completed around 3 seconds on iPhone 5 (averaging 10 patches).

More

This is obviously a very early presentation about how to run deep learning based image classifier on mobile devices. A few explorations based on these early results may possible. For example, memory footprint with the mobile-friendly version is still around 100MiB, however, with quantization, the full connect layers could expect 4 times smaller memory footprint when implemented properly. The full connect layers may be further deepened but with fewer neurons each layer to get better accuracy and performance.

As always, Klaus is open-sourced under BSD 3-clause license, and the pretrained model is under Creative Commons Attribution 4.0 International License. Hope you will enjoy it.

More Repositories

1

ccv

C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library
C
7,077
star
2

swift-diffusion

Swift
423
star
3

dflat

Structured Data Store for Mobile
Swift
299
star
4

s4nnc

Swift for NNC
Swift
69
star
5

rules_cuda

Smarty
20
star
6

co

Example implementation of coroutine in C
C
19
star
7

docomputersdream

http://docomputersdream.org
JavaScript
19
star
8

swift-mujoco

Swift Binding for MuJoCo: https://mujoco.org/
Swift
18
star
9

case

case is a simple test framework for c
C
17
star
10

parallable

A tiny javascript snippet to enable you write web worker simpler and easier
JavaScript
16
star
11

swift-fickling

Swift
7
star
12

rfancontrol

Fan Control Program for my 4 RTX 2080 Ti Watercooling Workstation
C++
6
star
13

cvwld

Weber's Local Descriptor
C++
6
star
14

swift-llm

Starlark
5
star
15

optic

Playground to test different approaches for CUDA kernel performance
Cuda
5
star
16

ndqi

Non-structural Data Query Interface
C
4
star
17

cubic

strip down version of libfreenect to experiment multi-Kinect devices collaboration
C
4
star
18

cvess

Extended Self-Similarity Descriptor
C++
4
star
19

kmonkey

KMonkey automatically patches your program with its llvm intermediate code.
Shell
4
star
20

l1cs

minimize L1 norm with q constraint
C++
3
star
21

pwb

Pairwise Boosting
C++
3
star
22

model.php

A super simple model class for redis interaction
PHP
3
star
23

php-sdk

PHP SDK for the Facebook API
PHP
3
star
24

swift-sentencepiece

SentencePiece's Swift wrapper.
Starlark
2
star
25

cvRBM

Restricted Boltzmann Machine implemented in 2008
C++
2
star
26

libnnc.org

HTML
2
star
27

liuliu.me

source for liuliu.me
CSS
2
star
28

cvLMNN

Large Margin Nearest Neighbors implemented around 2008
C++
1
star
29

SteamRemoteDesktop

A dummy mac app such that I can use Steam Remote Play as a remote desktop.
Objective-C
1
star
30

alter

C++
1
star
31

mopack

just for another resource packaging
C++
1
star
32

sim2real

An experimentation repo for various sim2real implementations.
Jupyter Notebook
1
star
33

dflat.io

HTML
1
star
34

nuerips-analysis

Jupyter Notebook
1
star
35

fifo.me

http://fifo.me
JavaScript
1
star