• Stars
    star
    2,391
  • Rank 19,107 (Top 0.4 %)
  • Language
  • License
    Other
  • Created over 4 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Computer Vision Basics in Microsoft Excel (using just formulas)

Computer Vision Basics in Microsoft Excel Creative Commons License CC-BY-NC-SA-4.0

By Alok Govil, Principal Engineer, Amazon (LinkedIn Profile)

Collaborator: Venkataramanan Subramanian, Principal Engineer, Amazon

Computer Vision is often seen by software developers and others as a hard field to get into. In this article, we'll learn Computer Vision from basics using sample algorithms implemented within Microsoft Excel, using a series of one-liner Excel formulas. We'll use a surprise trick that helps us demonstrate and visualize algorithms like Face Detection, Hough Transform, etc., within Excel, with no dependence on any script or a third-party plugin.

Microsoft Excel Trick for Spreadsheet Images

Figure 1: Outline of the steps to visualize a spreadsheet as an image. The spreadsheet, and thereby the image, is then manipulated step by step using formulas.

Selected Feedback Quotes

"Its amazing to see an image show up on the Excel sheet as you zoom out and numbers appear as you zoom back in."

"Very cool to see that with 'simple' Excel formulas you can do some real computer vision."

"... never thought you can explain CV through simple Excel formulas and transformations. Hats-off :)"

"... used Excel to explain the core concepts and algorithms so clear that I feel I could start working with it right now! ..."

"I've been wanting to learn how CV works for a while, and this was probably the best transition from data to visuals I've ever seen."

"Just incredible build up from small to great one step at a time."

Preview of what we will achieve

We will see how to detect a face using a toy example: (Below are screenshots of Excel spreadsheets.)

Face Detection in Microsoft Excel

Even the rectangles/lines are drawn using just formulas. :-)

We will also see how to find edges and lines:

Edge Detection in Microsoft Excel

Lines in Microsoft Excel

Expectations from the audience

No prior background in Computer Vision should be needed to follow the material. It is assumed that the audience knows Microsoft Excel basics and can read its documentation, or search online for interpreting the formulas used. Exceljet is a great resource for the latter.

Some mathematical understanding would be needed: Those who won't know what weighted average is won't be able to follow much. Understanding of partial derivatives would be helpful but not required. Most complex mathematical concept used is eigenvalues, but again the readers should be able to follow even if they do not know or remember the same.

Instructions

The crux of the material is in the Excel files (*.xlsx) available below for downloading. These are self-explanatory with notes inserted within. Please follow the sheets step-by-step. You may need to change zoom levels as per your monitor resolution.

Screenshot of an Excel file

Software requirements

The work was created using Excel 2016 on Windows; it should however open in other versions of Excel (tested with Excel 2007 on Windows, and Excel for Mac).

While the files open in LibreOffice (tested in version 6.4.0.3 (x64)), it is slow to the level of being unusable, even when using native LibreOffice Calc file format. (See Hacker News discussion on this here.) We have not tested in Apache OpenOffice.

On quick testing, it seems to work fine in WPS Office (tried on Windows 10).

Relevant Excel Formula Options

Before opening the Excel file(s), change Excel Formula Calculation to "Manual" since some calculations (Hough Transform specifically) are time-consuming (say an hour). Then trigger recalculation manually as per need.

Excel Formulas, Change to Manual

Also, uncheck "Recalculate workbook before saving", else Excel will recalculate all the formulas every time you save the files.

Don't Recalculate Workbook before Saving

Note: Be sure to revert these settings once you are done.

Those familiar with R1C1 formula reference style in Excel, or those adventurous, should try switching to it by looking in Excel options and turning it on. See the screenshot below and check the box to enable it. This changes the formulas from "D5" type format to a relative style like "R[-1]C[2]" (absolute references also allowed as "R4C5" for example), bringing it closer to programming languages and aiding understanding.

Excel R1C1 Formula Reference Style

Downloads

The full Excel file is more than 50 MB in size. The same content is also available in smaller parts.

The following may -not- be downloadable by right-clicking and saving. On left-clicking, Github will take you to preview page from where the raw *.xlsx files can be downloaded.

Contents File Description
The full Excel File Computer-Vision-Basics-in-Excel The file is, unsurprisingly, heavy for Excel. Be patient with it. :-) Even if it goes busy for an hour, Excel usually does finish up and come back.
Part 0 Computer-Vision-Basics-in-Excel-0-Introduction-and-Outline

Introduction and Outline:
Start here if following the individual parts.

Part 1 Computer-Vision-Basics-in-Excel-1-Edges-and-Lines

Edges and Lines:
One of the sheets in this file, named "Hough", is very compute-intensive.

Part 2 Computer-Vision-Basics-in-Excel-2-Keypoints-and-Descriptors

Corners/Keypoints:
We do not go into the details of these.

Part 3 Computer-Vision-Basics-in-Excel-3-Face-Detection

Face Detection:
Functional face detection demo on the specific input image using (simplified) Viola-Jones object detection framework.

Part 4 Computer-Vision-Basics-in-Excel-4-Text

Character Recognition:
A toy example that recognizes uppercase E's in the image.

Questions and Answers

Many of the following would make sense only after going through the Excel files above.

Q1: How was the image data imported into Excel?

You can follow this blog article and output data into a CSV file which Excel readily opens.

Here are two more images imported into Excel, ready for use: Einstein, Pillars.

Note that the Face Detection parameters used in the Excel files would likely fail to detect Einstein's face as the Haar-like features were fine-tuned by hand for detecting Mona Lisa's face in just that image. However, the method can again be easily fine-tuned for Einstein's face, and, when the parameters are calculated using Machine Learning, it works on most frontal-looking faces (assuming not occluded, not too small, etc.). See question #4 below for further details on this.

Q2: Are the techniques presented still relevant, or are they replaced by deep neural networks?

The techniques are still relevant. Neural networks are taking over for all complex computer vision problems, especially those unsolved by the classical techniques. For simpler operations, the classical solutions are faster to put together and are usually computationally more efficient. Also, classical techniques are still the default choice for edge devices (smartphones, web clients) though modern techniques are making an entry notably via hardware acceleration (e.g., 1, 2).

Q3: Why was the green channel of the image used, and not red or blue? How can I represent color images in Excel in this fashion?

Of the three primary color channels, red, green and blue, green contributes the most to luminosity.

Ideally, the image should be converted to grayscale first, or luminosity values should be computed (see here). This was skipped just for simplicity of explanation.

Why Green?

One way of representing color images in Excel is referenced in the answer to the question #7 below.

Q4: Why was the watermark face on the ID not detected and yet Mona Lisa's was?

We demonstrated the core concept of a popular face detection algorithm using just three Haar-like features and two stages, which were hand-crafted to detect the face of Mona Lisa in that specific image. The actual features as well as the stages are in practice calculated using Machine Learning, which commonly results in a few thousands of such features, as well as over ten stages. Then the system is able to detect over 99% of the nearly frontal looking faces (while a separate pre-trained model is available for faces looking nearly sideways in OpenCV).

The face shadow on the right would still be missed by the algorithm since such face images are not included in the training data. My educated guess further will be that to detect such shadowed faces, the algorithm described would not do a good job, and using neural networks would be recommended. Likewise, the algorithm we demonstrated is outperformed by a neural networks for "Labeled faces in the wild" dataset where faces are often partially occluded too.

Q5: In the OCR example, how did you choose the mask and its orientation?

For document OCR (as opposed to scene text recognition), the document itself is typically straightened first before character recognition is performed for the characters in the document. Therefore, the characters are expected to be nearly upright.

In the talk, a toy example was shown using a single convolutional neuron to recognize an 'E'. Neural networks use a number of layers of neurons for the task to recognize all characters of interest. The same neural network then outputs which character is present at the input. You can imagine this as having a separate simple neural network like for 'E' for recognizing each character of interest. The combined neural network would, however, have several neurons shared in the path for recognizing each character.

See also the Q&A below for more on character recognition.

Q6: How well does the OCR approach here work on different fonts?

In the talk, we used a single convolutional neuron to identify an uppercase 'E' as an example. The actual systems still commonly use neural networks (not just a single neuron) for the purpose, and that performs well across fonts and languages. Some additional details are present below:

In the talk, a single neuron was used to both scan the image and recognize the letter. Typically, scanning text of different sizes is done separately using various methods. Once every character of text is isolated, it is re-scaled to a fixed size and then a neural network is used to identify the letter.

Handwriting recognition is harder, unsurprisingly. The best performance is reached when the pen strokes data is available as a function of time (e.g., when recognizing handwriting input on a touch-screen). References are readily available online for further reading.

In the example shown in the talk, even the weights of that single neuron were hand-crafted, not actually learned using a training algorithm. Even a single neuron would do better than the demo when actually trained.

Q7: How did you come up with the idea for using Microsoft Excel for this?

About 1.5 years back, we had to give an introductory talk on Computer Vision to a wide audience within Amazon, many of whom would have been completely unfamiliar with the subject. We thought about starting from the very basics by showing that an image is essentially a 2D array of numbers (for each color channel for color images) and thought about showing these using Excel.

"Hmm! If the numbers are in Excel, I could do more with it" ... That was the "A-ha" moment. :-)

It took about seven hours to create the first fully functional version for that talk, which did not include Face Detection and Text recognition. The latter took about eight more hours for the first version.

We have since then discovered several related works that represent images in Excel using this technique:

Here are some more related works using Excel:

Q8: Can I use the materials for teaching?

Please see the License summary and details below.

Q9: Is Excel the right tool for this?

Spreadsheets are not designed for something like this, and the technique is not being recommended for any work or research. It is however helping many people understand the concepts better.

While Excel has not been designed for this, it has been designed well to have worked surprisingly well for this. :-)

Q10: Does Excel have built-in formulas for Computer Vision?!

No, we believe. On the very least, this work is not using any of them. :-)

As noted above, even the rectangles and lines used for annotations are drawn using generic formulas, i.e., not using any potential special formulas available in Excel add-ins.

Q11: Are there specialized interactive developer environments for Computer Vision?

Matlab has traditionally been used for this as it has many Computer Vision functions built-in natively or in toolboxes. Function "imshow" can be used to instantly display array data as an image.

Python- and Notebooks-based tooling is also very popular.

Q12: Is that your passport information on the slides?

Yes, however, all critical information in the image has been changed like passport number, signatures, etc., including in the machine-readable lines at the bottom of the image.

Q13: Why was the photo of Mona Lisa used? :-)

We just picked an image with no copyright limitations. :-)

Q14: Why does Hough Transform show artifacts for +/- 45ยฐ?

Please refer to the answer here: https://stackoverflow.com/questions/33983389/hough-line-transform-artifacts-at-45-degree-angle

Additional Resources and References

Books

Below are references to two freely-downloadable good books on classical Computer Vision (i.e., before deep learning came into the field):

  • Computer Vision: Algorithms and Applications, Richard Szeliski (2010): This books provides a summary of many computer vision techniques along with research results from academic papers. The diagrams in the book by themselves are worth browsing through to understand the state of the art in the field till 2010 when the book was published. The book usually does 'not' give enough detail to allow someone to implement the methods described, though appropriate references are cited.
  • Computer Vision Metrics: Survey, Taxonomy, and Analysis, Scott Krig (2014): This book provides a good top-level view of computer vision, though is often mixed on details.

For practical implementation, there are many books on OpenCV, a common Computer Vision library like Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library, Gary Bradski, Adrian Kaehler

Selected Articles and Blogs

Hacker News discussion on the work

https://news.ycombinator.com/item?id=22357374

License Summary

Copyright 2018-20 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: CC-BY-NC-SA-4.0

This work is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License. See the LICENSE file. It cannot be used for commercial training or lectures.

More Repositories

1

style-dictionary

A build system for creating cross-platform styles.
JavaScript
3,855
star
2

selling-partner-api-docs

This repository contains documentation for developers to use to call Selling Partner APIs.
1,541
star
3

smoke-framework

A light-weight server-side service framework written in the Swift programming language.
Swift
1,430
star
4

alexa-skills-kit-js

SDK and example code for building voice-enabled skills for the Amazon Echo.
1,134
star
5

ion-java

Java streaming parser/serializer for Ion.
Java
840
star
6

sketch-constructor

Read/write/manipulate Sketch files in Node without Sketch plugins!
JavaScript
538
star
7

selling-partner-api-models

This repository contains OpenAPI models for developers to use when developing software to call Selling Partner APIs.
Mustache
532
star
8

pecos

PECOS - Prediction for Enormous and Correlated Spaces
Python
501
star
9

amzn-drivers

Official AWS drivers repository for Elastic Network Adapter (ENA) and Elastic Fabric Adapter (EFA)
C
444
star
10

ion-js

A JavaScript implementation of Amazon Ion.
TypeScript
323
star
11

convolutional-handwriting-gan

ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation (CVPR20)
Python
260
star
12

xfer

Transfer Learning library for Deep Neural Networks.
Python
250
star
13

awsssmchaosrunner

Amazon's light-weight library for chaos engineering on AWS. It can be used for EC2 and ECS (with EC2 launch type).
Kotlin
247
star
14

ion-python

A Python implementation of Amazon Ion.
Python
210
star
15

amazon-pay-sdk-php

Amazon Pay PHP SDK
PHP
209
star
16

fire-app-builder

Fire App Builder is a framework for building java media apps for Fire TV, allowing you to add your feed of media content to a configuration file and build an app to browse and play it quickly.
Java
178
star
17

exoplayer-amazon-port

Official port of ExoPlayer for Amazon devices
Java
168
star
18

oss-dashboard

A dashboard for viewing many GitHub organizations at once.
Ruby
158
star
19

ion-c

A C implementation of Amazon Ion.
C
149
star
20

metalearn-leap

Original PyTorch implementation of the Leap meta-learner (https://arxiv.org/abs/1812.01054) along with code for running the Omniglot experiment presented in the paper.
Python
147
star
21

ion-go

A Go implementation of Amazon Ion.
Go
146
star
22

distance-assistant

Pedestrian monitor that provides visual feedback to help ensure proper social distancing guidelines are being observed
Python
135
star
23

auction-gym

AuctionGym is a simulation environment that enables reproducible evaluation of bandit and reinforcement learning methods for online advertising auctions.
Jupyter Notebook
135
star
24

hawktracer

HawkTracer is a highly portable, low-overhead, configurable profiling tool built in Amazon Video for getting performance metrics from low-end devices.
C++
131
star
25

trans-encoder

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
Python
131
star
26

smoke-aws

AWS services integration for the Smoke Framework
Swift
109
star
27

amazon-payments-magento-2-plugin

Extension to enable Amazon Pay on Magento 2
PHP
105
star
28

MXFusion

Modular Probabilistic Programming on MXNet
Python
102
star
29

amazon-weak-ner-needle

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data
Python
99
star
30

amazon-advertising-api-php-sdk

โ›”๏ธ DEPRECATED - Amazon Advertising API PHP Client Library
PHP
93
star
31

ion-rust

Rust implementation of Amazon Ion
Rust
86
star
32

ads-advanced-tools-docs

Code samples and supplements for the Amazon Ads advanced tools center
Jupyter Notebook
83
star
33

image-to-recipe-transformers

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning
Python
82
star
34

oss-attribution-builder

The OSS Attribution Builder is a website that helps teams create attribution documents (notices, "open source screens", credits, etc) commonly found in software products.
TypeScript
79
star
35

smoke-http

Specialised HTTP Client for service operations abstracted from the HTTP protocol.
Swift
69
star
36

amazon-ray

Staging area for ongoing enhancements to Ray focused on improving integration with AWS and other Amazon technologies.
Python
66
star
37

alexa-coho

Sample code for building skill adapters for Alexa Connected Home using the Lighting API
JavaScript
62
star
38

amazon-pay-sdk-ruby

Amazon Pay Ruby SDK
Ruby
58
star
39

amazon-pay-sdk-java

Amazon Pay Java SDK
Java
53
star
40

amazon-pay-sdk-python

Amazon Pay Python SDK
Python
53
star
41

zero-shot-rlhr

Python
51
star
42

supply-chain-simulation-environment

Python
49
star
43

amazon-pay-sdk-csharp

Amazon Pay C# SDK
C#
47
star
44

ion-dotnet

A .NET implementation of Amazon Ion.
C#
47
star
45

multiconer-baseline

Python
47
star
46

amazon-pay-api-sdk-php

Amazon Pay API SDK (PHP)
PHP
47
star
47

zeek-plugin-enip

Zeek network security monitor plugin that enables parsing of the Ethernet/IP and Common Industrial Protocol standards
Zeek
45
star
48

amazon-pay-sdk-samples

Amazon Pay SDK Sample Code
PHP
43
star
49

oss-contribution-tracker

Track contributions made to external projects and manage CLAs
TypeScript
40
star
50

amazon-s3-gst-plugin

A collection of Amazon S3 GStreamer elements.
C
40
star
51

fashion-attribute-disentanglement

Python
39
star
52

zeek-plugin-s7comm

Zeek network security monitor plugin that enables parsing of the S7 protocol
Zeek
39
star
53

milan

Milan is a Scala API and runtime infrastructure for building data-oriented systems, built on top of Apache Flink.
Scala
39
star
54

selling-partner-api-samples

Sample code for Amazon Selling Partner API use cases
Python
37
star
55

orthogonal-additive-gaussian-processes

Light-weighted code for Orthogonal Additive Gaussian Processes
Python
37
star
56

jekyll-doc-project

This repository contains an open-source Jekyll theme for authoring and publishing technical documentation. This theme is used by Appstore/Alexa tech writers and other community members. Most of the theme's files are stored in a Ruby Gem (called jekyll-doc-project).
HTML
36
star
57

smoke-dynamodb

SmokeDynamoDB is a library to make it easy to use DynamoDB from Swift-based applications, with a particular focus on usage with polymorphic database tables (tables that do not have a single schema for all rows).
Swift
34
star
58

amazon-pay-api-sdk-nodejs

Amazon Pay API SDK (Node.js)
JavaScript
34
star
59

zeek-plugin-bacnet

Zeek network security monitor plugin that enables parsing of the BACnet standard building controls protocol
Zeek
30
star
60

ss-aga-kgc

Python
30
star
61

chalet-charging-location-for-electric-trucks

Optimization tool to identify charging locations for electric trucks
Python
30
star
62

amazon-pay-api-sdk-java

Amazon Pay API SDK (Java)
Java
29
star
63

credence-to-causal-estimation

A framework for generating complex and realistic datasets for use in evaluating causal inference methods.
Python
29
star
64

sparse-vqvae

Experimental implementation for a sparse-dictionary based version of the VQ-VAE2 paper
Python
28
star
65

zeek-plugin-profinet

Zeek network security monitor plugin that enables parsing of the Profinet protocol
Zeek
28
star
66

ion-tests

Test vectors for testing compliant Ion implementations.
25
star
67

differential-privacy-bayesian-optimization

This repo contains the underlying code for all the experiments from the paper: "Automatic Discovery of Privacy-Utility Pareto Fronts"
Python
25
star
68

buy-with-prime-cdk-constructs

This package extends common CDK constructs with opinionated defaults to help create an organization strategy around infrastructure as code.
TypeScript
25
star
69

basis-point-sets

Python
24
star
70

ion-hive-serde

A Apache Hive SerDe (short for serializer/deserializer) for the Ion file format.
Java
24
star
71

zeek-plugin-tds

Zeek network security monitor plugin that enables parsing of the Tabular Data Stream (TDS) protocol
Zeek
24
star
72

ion-intellij-plugin

Support for Ion in Intellij IDEA.
Kotlin
23
star
73

ion-schema-kotlin

A Kotlin reference implementation of the Ion Schema Specification.
Kotlin
23
star
74

smoke-framework-application-generate

Code generator to generate SmokeFramework-based applications from service models.
Swift
23
star
75

emukit-playground

A web page explaining concepts of statistical emulation and making decisions under uncertainty in an interactive way.
JavaScript
22
star
76

ftv-livetv-sample-tv-app

Java
22
star
77

smoke-framework-examples

Sample applications showing the usage of the SmokeFramework and related libraries.
Swift
22
star
78

ion-hash-go

A Go implementation of Amazon Ion Hash.
Go
22
star
79

pretraining-or-self-training

Codebase for the paper "Rethinking Semi-supervised Learning with Language Models"
Python
22
star
80

tiny-attribution-generator

A small tool and library to create attribution notices from various formats
TypeScript
20
star
81

confident-sinkhorn-allocation

Pseudo-labeling for tabular data
Jupyter Notebook
20
star
82

ion-docs

Source for the GitHub Pages for Ion.
Java
19
star
83

autotrail

AutoTrail is a highly modular, partial automation workflow engine providing run time execution control
Python
19
star
84

git-commit-template

Set commit templates for git
JavaScript
19
star
85

smoke-aws-generate

Code generator to generate the SmokeAWS library from service models.
Swift
18
star
86

amazon-codeguru-profiler-for-spark

A Spark plugin for CPU and memory profiling
Java
17
star
87

smoke-aws-credentials

A library to obtain and assume automatically rotating AWS IAM roles written in the Swift programming language.
Swift
17
star
88

service-model-swift-code-generate

Modular code generator to generate Swift applications from service models.
Swift
17
star
89

amazon-pay-api-sdk-dotnet

Amazon Pay API SDK (.NET)
C#
17
star
90

sample-fire-tv-app-video-skill

This sample Fire TV app shows how to integrate an Alexa video skill in a simple, basic way.
Java
16
star
91

amazon-template-library

A collection of general purpose C++ utilities that play well with the Standard Library and Boost.
C++
16
star
92

ion-cli

Rust
15
star
93

refuel-open-domain-qa

Python
15
star
94

rheoceros

Cloud-based AI / ML workflow and data application development framework
Python
15
star
95

amazon-instant-access-sdk-php

PHP SDK to aid in 3p integration with Instant Access
PHP
14
star
96

amazon-mcf-plugin-for-magento-1

Plugin code to enable Amazon MCF in Magento 1.
PHP
14
star
97

login-with-amazon-wordpress

A pre-integrated plugin that can be installed into a Wordpress powered website to integrate with Login with Amazon.
PHP
14
star
98

amzn-ec2-ena-utilities

Python
14
star
99

firetv-sample-touch-app

This sample Android project demonstrates how to build the main UI of a Fire TV application in order to support both Touch interactions and Remote D-Pad controls.
Java
13
star
100

eslint-plugin-no-date-parsing

Disallow string parsing with new Date and Date.parse.
TypeScript
13
star