• Stars
    star
    247
  • Rank 164,117 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 3 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An alternative to convolution in neural networks

Sharpened Cosine Similarity

An alternative to convolution for neural networks

Implementations

PyTorch

Keras

Jax

Description

Sharpened cosine similarity is a strided operation, like convolution, that extracts features from an image.

It is related to convolution, but with important defferences. Convolution is a strided dot product between a signal, s, and a kernel k.

Equation for convolution

A cousin of convolution is cosine similarity, where the signal patch and kernel are both normalized to have a magnitude of 1 before the dot product is taken. It is so named because in two dimensions, it gives the cosine of the angle between the signal and the kernel vectors.

Equation for cosine similarity

The cosine is known for being broad, that is, two quite different vectors can have a moderately high cosine similarity. It can be sharpened by raising the magnitude of the result to a power, p, while maintaining the sign.

Equation for raw sharpened cosine similarity

This measure can become numerically unstable if ever the magnitude of the signal or kernel gets too close to zero. Adding a small value, q, to the signal magnitude. In practice, the kernel magnitude doesn't get too small and doesn't need this term.

Equation for sharpened cosine similarity

Background

The idea behind sharpened cosine similarity first surfaced as a Twitter thread in 2020. There's some more development in this blog post.

Tips and Tricks

These are some things that have been reported to work so far.

  • The big benefit of SCS appears to be parameter efficiency and architecture simplicity. It doesn't look like it's going to beat any accuracy records, and it doesn't always run very fast, but it's killing in this parameter efficiency leaderboard.
  • Skip the nonlinear activation layers, like ReLU and sigmoid, after SCS layers.
  • Skip the dropout layers after SCS layers.
  • Skip the normalization layers, like batch normalization or layer normalization, after SCS layers.
  • Use MaxAbsPool instead of MaxPool. It selects the element with the highest magnitude of activity, even if it's negative.
  • Raising activities to the power p generally doesn't parallelize well on GPUs and TPUs. It will slow your code down a LOT compared to straight convolutions. Disabling the p parameters results in a huge speedup on GPUs, but this takes the "sharpened" out of SCS. Regular old cosine similarity is cool, but it is its own thing with its own limitations.

Examples

In the age of gargantuan language models, it's uncommon to talk about how few parameters a model uses, but it matters when you hope to deploy on compute- or power-limited devices. Sharpened cosine similarity is exceptionally parameter efficient.

The repository scs_torch_gallery has a handful of working examples. cifar10_80_25214.py is an image classification model that gets 80% accuracy on CIFAR 10, using only 25.2k parameters. According to the CIFAR-10 Papers With Code this is somewhere around one-tenth of the parameters in previous models in this accuracy range.

Reverse Chronology

Date Milestone
2022-12-06 Paper by Skyler Wu, Fred Lu, Edward Raff, James Holt in NeurIPS 2022 ICBINB Workshop
2022-04-23 Code by Steven Walton. SCS in Compact Transformers.
2022-03-28 Code by Raphael Pisoni. Jax implementation.
2022-03-11 Code by Phil Sodmann. PyTorch Lightning demo on the Fashion MNIST data.
2022-02-25 Experiments and analysis by Lucas Nestler . TPU implementation of SCS. Runtime performance comparison with and without the p parameter
2022-02-24 Code by Dr. John Wagner. Head to head comparison with convnet on American Sign Language alphabet dataset.
2022-02-22 Code by Håkon Hukkelås. Reimplementation of SCS in PyTorch with a performance boost from using Conv2D. Achieved 91.3% CIFAR-10 accuracy with a model of 1.2M parameters.
2022-02-21 Code by Zimonitrome. An SCS-based GAN, the first of its kind.
2022-02-20 Code by Michał Tyszkiewicz. Reimplementation of SCS in PyTorch with a performance boost from using Conv2D.
2022-02-20 Code by Lucas Nestler. Reimplementation of SCS in PyTorch with a performance boost and CUDA optimizations.
2022-02-18 Blog post by Raphael Pisoni. SOTA parameter efficiency on MNIST. Intuitive feature interpretation.
2022-02-01 PyTorch code by Stephen Hogg. PyTorch implementation of SCS. MaxAbsPool implementation.
2022-02-01 PyTorch code by Oliver Batchelor. PyTorch implementation of SCS.
2022-01-31 PyTorch code by Ze Wang. PyTorch implementation of SCS.
2022-01-30 Keras code by Brandon Rohrer. Keras implementation of SCS running on Fashion MNIST.
2022-01-17 Code by Raphael Pisoni. Implementation of SCS in paired depthwise/pointwise configuration, the key element of the ConvMixer architecture.
2022-01-06 Keras code by Raphael Pisoni. Keras implementation of SCS.
2020-02-24 Twitter thread by Brandon Rohrer. Justification and introduction of SCS.

More Repositories

1

academic_advisory

Collected opinions and advice for academic programs focused on data science skills.
444
star
2

brohrer.github.io

Public web page for brohrer
HTML
98
star
3

cottonwood

A flexible neural network framework for running experiments and trying ideas.
Python
79
star
4

pacemaker

For controlling time per iteration loop in Python.
Python
46
star
5

parameter_efficiency_leaderboard

The most parameter efficient machine learning models on a few popular benchmarks
43
star
6

publications

Posts, presentations and papers I've written.
40
star
7

how-to-train-your-robot

Content, code, and resources for the book How to Train Your Robot.
Python
25
star
8

taming_matplotlib

Tutorials for getting the most out of Matplotlib
Python
23
star
9

nn_framework

A simple neural network framework
Python
22
star
10

how_optimization_works

Python
18
star
11

cottonwood_martian_images

Autoencoder-based image compression using pictures of the surface of Mars.
Python
15
star
12

what_nns_learn

Simulations to illustrate what neural networks learn.
Python
14
star
13

byo_decision_tree

Code and data to accompany the Udemy course End-to-end data science: decision trees
Python
11
star
14

how_modeling_works

Python
11
star
15

scs-gallery

Some examples of sharpened cosine similarity in working neural networks.
Python
10
star
16

lodgepole

Image and video processing toolbox
Python
10
star
17

autoencoder_visualization

Custom visualization of a deep autoencoder neural network using Matplotlib.
Python
10
star
18

sqlogging

Sqlite3-based logging for Python
Python
9
star
19

nn_optimization

A neural network framwork for demonstrating hyperparameter optimization
Python
8
star
20

ziptie

Python
6
star
21

build_your_own

A collection of "build your own" projects.
Python
5
star
22

becca_test

A suite of test worlds for BECCA.
Python
4
star
23

becca_toolbox

A collection of tools for use with Becca
Python
3
star
24

becca_viz

Visualization tools for becca
Python
3
star
25

httyr-tools

A junk drawer of snippets and utilities to accompany How to Train Your Robot code
Python
3
star
26

ponderosa

A hyperparameter optimization tool
Python
3
star
27

robot-training-game

Python
3
star
28

ab-test-peeking

Python
2
star
29

byo_dog_breed_sizer

An end-to-end machine learning project to build a tool that chooses dog breeds based on their size and build
Python
2
star
30

tyrfyi

Website for tyr.fyi domain
CSS
1
star
31

study-mr-frog

1
star
32

ziptie-paper

A white paper describing the Ziptie algorithm, and all the files needed to render it in LaTeX
TeX
1
star
33

cartographer-paper

TeX
1
star
34

cartographer

Python
1
star