• This repository has been archived on 11/May/2023
  • Stars
    star
    296
  • Rank 140,464 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

THIS PROJECT IS SUPSERSEDED BY SCALE-SIM-V2

Systolic CNN AcceLErator Simulator (SCALE Sim)

SCALE sim is a CNN accelerator simulator, that provides cycle-accurate timing, power/energy, memory bandwidth and trace results for a specified accelerator configuration and neural network architecture.

Skip to Getting Started

Motivation

SCALE sim enables research into CNN accelerator architecture and is also suitable for system-level studies.

Since deep learning based solutions have become prevalent for computer vision over the recent few years, there has been a surge in accelerator design proposals both from academia and the industry.

It is natural to assume that we will see many more accelerators being proposed as new use cases are identified in the near future. However, it is important to keep in mind that CNN use cases will likely vary, which poses a large spectrum of efficiency and performance demands on the underlying CNN accelerator design. Therefore, it is important to quickly prototype architectural ideas and iterate over different designs.

In the present, designing a new CNN accelerator usually starts from scratch. Performing a simple first order analysis often involves a team of designers writing their own simulators, tackling bugs, and iterating over multiple times.

It is not hard to see that all this effort is mostly repeated work. This reinvention of the wheel could be avoided if there were a standard tool usable across various use cases and implementation goals. SCALE Sim is our effort to make a stride in this direction.

SCALE sim outputs

At the core of SCALE sim is a cycle-accurate architecture simulator for CNN accelerators. We build the accelerator based on the systolic array architecture, similar to the one used in Google's TPU.

Given a convolution neural network topology and certain architecture parameters, SCALE sim is capable of estimating the following:

  • Run time in cycles
  • Average utilization
  • On-chip memory requirements
  • Off-chip interface bandwidth requirements

With the help of external simulators such as CACTI or DRAMPower, users can also use SCALE sim to obtain:

  • Power consumption
  • Area estimates

Usecases

Primarily, we envision two major usecases of SCALE sim, which we describe below. It is our hope that by releasing ths simulator, the community will find interesting and ingenious ways of using the tool for other research tasks that expand the original usage scope of SCALE sim.

Accelerator Design Space Exploration

SCALE sim can help the designer quickly explore accelerator design space. In this mode, SCALE expects the designer to provide a list of limits for architecture features like SRAM size, array size, and interface bandwidth; relevant to the target use case. These limits help the tool in narrowing down the search space. SCALE will work if the limits are not specified by assuming the default values, but naturally, the run time will be large.

Usually, designers can translate the use case constraints into high-level architecture constraints. For instance, a power-optimized design will not have a large on-chip memory or a big computing array. You get the gist.

System Simulation

Usually in an ML-enable application, CNN is just one stage of the end-to-end processing pipeline. Therefore, it is important to understand the application characteristics in the context of the entire Systems-on-a-chip (SoC). Existing full-system simulators such as Gem5 and GemDroid lack modles for CNN accelerators (IP), and therefore presents a roadblock in studying system-level behavior of ML-enabled applications.

Due to the modular interface design, users could choose to integrate SCALE sim with Gem5 or GemDroid for full-system simulation. This is particular helpful for researchers who do not wish to perform in-depth investigation of the CNN accelerator microarchitecture, but wish to integrate a decent CNN IP to obtain meaningful system-level characterization results.

How does it work?

SCALE sim simulates a TPU-like systolic array for CNN computation. Due to the highly regular data-flow involved in CNNs, it is easy to estimate the storage, traffic, and computation metrics without actually performing the computation. SCALE uses this property to generate a cycle accurate traffic trace, into and out of the systolic array to the on-chip SRAM.

The on-chip SRAM is emulated as a double buffer storage unit, to amortize the high bandwidth requirement on the interface size. Using the traffic traces from SRAM to the array and taking into account the double buffered SRAM, traces and metrics for the accelerator to main memory transactions are computed. Given a CNN topology, the current implementation of SCALE computes the output metrics sequentially for each layer.

Getting Started

30 seconds to SCALE sim!

Getting started is simple! SCALE-Sim is completely written in python. At the moment, it has dependencies on the following python packages. Make sure you have them in your environment.

  • os
  • subprocess
  • math
  • configparser
  • tqdm
  • absl-py

NOTE: SCALE-Sim needs python3 to run correctly. If you are using python2, you might run into typecasting errors

Custom Experiment

This experiment will run the default MLPERF_AlphaGoZero_32x32_os architechture contained inside scale.cfg. It will also run alexnet as its network topology.

  • Run the command: python scale.py
  • Wait for the run to finish

The config file inside configs contain achitecture presets.
the csv files inside toologies contain different networks

In order to change a different arichtechture/network, create a new .cfg file inside cofigs and call a new network by running python scale.py -arch_config=configs/eyeriss.cfg -network=topologies/yolo.csv Here is sample of the config file.
sample config
Architecture presets are the variable parameters for SCALE-Sim, like array size, memory etc.

The Network Topoplogy csv file contains the network that we want to test in our architechture.
SCALE-Sim accepts topology csv in the format shown below.
yolo_tiny topology

Since SCALE-Sim is a CNN simulator please do not provide any layers other than convolutional or fully connected in the csv. You can take a look at yolo_tiny.csv for your reference.

Output

Here is an example output dumped to stdout when running Yolo tiny (whose configuration is in yolo_tiny.csv): screen_out

Also, the simulator generates read write traces and summary logs at ./outputs/<topology_name>. There are three summary logs:

  • Layer wise runtime and average utilization
  • Layer wise MAX DRAM bandwidth log
  • Layer wise AVG DRAM bandwidth log
  • Layer wise breakdown of data movement and compute cycles

In addition cycle accurate SRAM/DRAM access logs are also dumped and could be accesses at ./outputs/<topology_name>/layer_wise

Detailed Documentation

For detailed insights on working of SCALE-Sim, you can refer to this paper

Citing

If you find this tool useful for your research, please use the following bibtex to cite us,

@article{samajdar2018scale,
  title={SCALE-Sim: Systolic CNN Accelerator Simulator},
  author={Samajdar, Ananda and Zhu, Yuhao and Whatmough, Paul and Mattina, Matthew and Krishna, Tushar},
  journal={arXiv preprint arXiv:1811.02883},
  year={2018}
}

Contributing

Please send a pull request.

Authors

Ananda Samajdar, Georgia Institute of Technology

Yuhao Zhu, University of Rochester

Paul Whatmough, Arm Research, Boston, MA

License

This project is licensed under the MIT License - see the LICENSE file for details.

More Repositories

1

ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
C++
2,539
star
2

arm-trusted-firmware

Read-only mirror of Trusted Firmware-A
C
1,690
star
3

CMSIS_5

CMSIS Version 5 Development Repository
C
1,327
star
4

armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
C++
1,162
star
5

ML-KWS-for-MCU

Keyword spotting on Arm Cortex-M Microcontrollers
C
1,040
star
6

astc-encoder

The Arm ASTC Encoder, a compressor for the Adaptive Scalable Texture Compression data format.
C
880
star
7

abi-aa

Application Binary Interface for the Arm® Architecture
HTML
673
star
8

vulkan_best_practice_for_mobile_developers

Vulkan best practice for mobile developers
C++
564
star
9

CMSIS-FreeRTOS

FreeRTOS adaptation for CMSIS-RTOS Version 2
C
502
star
10

optimized-routines

Optimized implementations of various library functions for ARM architecture processors
C
486
star
11

CMSIS_4

Cortex Microcontroller Software Interface Standard (V4 no longer maintained)
C
451
star
12

mango

Parallel Hyperparameter Tuning in Python
Jupyter Notebook
396
star
13

ML-examples

Arm Machine Learning tutorials and examples
C++
371
star
14

LLVM-embedded-toolchain-for-Arm

A project dedicated to building LLVM toolchain for 32-bit Arm embedded targets.
CMake
331
star
15

opengl-es-sdk-for-android

OpenGL ES SDK for Android
CSS
325
star
16

Arm-2D

2D Graphic Library optimized for Cortex-M processors
C
295
star
17

CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A
C
277
star
18

Tool-Solutions

Tutorials & examples for Arm software development tools.
C
217
star
19

EndpointAI

C++
216
star
20

SCP-firmware

Read-only mirror of System Control Processor (SCP) firmware
C
205
star
21

vulkan-sdk

Github repository for the Vulkan SDK
C
199
star
22

lisa

Linux Integrated System Analysis
Jupyter Notebook
192
star
23

HWCPipe

Hardware counters interface
C++
188
star
24

u-boot

Clone of upstream U-Boot repo with patches for Arm development boards
C
177
star
25

CMSIS-NN

CMSIS-NN Library
C
173
star
26

CMSIS-Driver

Repository of microcontroller peripheral driver implementing the CMSIS-Driver API specification
C
165
star
27

android-nn-driver

C++
151
star
28

CMSIS_6

CMSIS version 6 (successor of CMSIS_5)
C
149
star
29

ML-zoo

Python
149
star
30

workload-automation

A framework for automating workload execution and measurement collection on ARM devices.
Python
138
star
31

gator

Sources for Arm Streamline's gator daemon
C++
121
star
32

keyword-transformer

Official implementation of the Keyword Transformer: https://arxiv.org/abs/2104.00769
Jupyter Notebook
116
star
33

ebbr

Embedded Base Boot Requirements Specification
PostScript
113
star
34

perfdoc

A cross-platform Vulkan layer which checks Vulkan applications for best practices on Arm Mali devices.
C++
112
star
35

linux

C
95
star
36

asl-interpreter

Example implementation of Arm's Architecture Specification Language (ASL)
OCaml
94
star
37

MDK-Middleware

MDK-Middleware (file system, network and USB components) source code for Arm Cortex-M using CMSIS-Drivers and CMSIS-RTOS2 APIs.
C
93
star
38

sbsa-acs

ARM Enterprise: SBSA Architecture Compliance Suite
C
88
star
39

sesr

Super-Efficient Super Resolution
Python
87
star
40

mobile-studio-integration-for-unity

Mobile Studio tool integration with C# scripting for the Unity game engine.
C
86
star
41

CSAL

Coresight Access Library
C
78
star
42

progress64

PROGRESS64 is a C library of scalable functions for concurrent programs, primarily focused on networking applications.
C
70
star
43

psa-arch-tests

Tests for verifying implementations of TBSA-v8M and the PSA Certified APIs
C
66
star
44

CMSIS-RTX

RTX5 real time kernel for Arm Cortex-based embedded systems (spin-off from CMSIS_5)
C
64
star
45

Cloud-IoT-Core-Kit-Examples

Example projects and code are supplied to support the Arm-based IoT Kit for Cloud IoT Core
Python
62
star
46

developer

GTM related documentation
C++
61
star
47

cmsis-pack-eclipse

CMSIS-Pack Eclipse Plug-ins
Java
60
star
48

trappy

This repository has moved to https://gitlab.arm.com/tooling/trappy
Python
60
star
49

ethos-n-driver-stack

Driver stack (including user space libraries, kernel module and firmware) for the Arm® Ethos™-N NPU
C++
59
star
50

AVH-GetStarted

DEPRECATED - use instead AVH_CI_Template
C
58
star
51

CMSIS-CV

Computer Vision library for IoT
C++
54
star
52

acle

Arm C Language Extensions (ACLE)
Python
52
star
53

arm-systemready

Arm SystemReady
Shell
52
star
54

patrace

C++
52
star
55

tarmac-trace-utilities

Tools for analyzing and browsing Tarmac instruction traces.
C++
47
star
56

devlib

Library for interaction with and instrumentation of remote devices.
Python
47
star
57

speculation-barrier

This project provides a header file which contains wrapper macros for the __builtin_load_no_speculate builtin function defined at https://www.arm.com/security-update This builtin function defines a speculation barrier, which can be used to limit the conditions under which a value which has been loaded can be used under speculative execution.
Objective-C
44
star
58

arm-enterprise-acs

ARM Enterprise ACS
C
42
star
59

DeepFreeze

SystemVerilog
38
star
60

tf-issues

Issue tracking for the ARM Trusted Firmware project
36
star
61

scalpel

This is a PyTorch implementation of the Scalpel. Node pruning for five benchmark networks and SIMD-aware weight pruning for LeNet-300-100 and LeNet-5 is included.
Python
35
star
62

psa-api

Documentation source and development of the PSA Certified API
C
34
star
63

TZ-TRNG

TrustZone True Number Generator
C
33
star
64

AVH

AVH-FVP: Arm Virtual Hardware - Fixed Virtual Platform
C
32
star
65

CMSIS-View

Repository of CMSIS Software Pack for software event generation and input/output handling.
Go
32
star
66

perf-libs-tools

C
31
star
67

bob-build

Meta-build system using Blueprint and ninja
Go
30
star
68

CMSIS-DAP

CoreSight Debug Access Port (DAP) debug probe protocol reference implementation (spin-off from CMSIS_5)
C
30
star
69

mram_simulation_framework

MRAM magnetization simulation framework. s-LLGS python and verilog-a solvers for transients simulation and Fokker-planck equation solver for stochastic analysis
Python
28
star
70

bento-linker

A light-weight alternative to processes for microcontrollers.
C
27
star
71

toolchain-gnu-bare-metal

A toolchain sub-project dedicated to build GNU toolchain for 32-bit bare-metal targets
Shell
26
star
72

data

Machine-readable data describing Arm architecture and implementations. Includes JSON descriptions of implemented PMU events.
26
star
73

synchronization-benchmarks

Collection of synchronization micro-benchmarks and traces from infrastructure applications
C
26
star
74

libGPUInfo

A utility library for application developers to query the configuration of the Arm Immortalis GPU or Arm Mali GPU present in their system.
C++
24
star
75

cryptocell-312-runtime

CryptoCell 312 runtime code
C
24
star
76

CMSIS-Compiler

CMSIS Compiler support for Arm Compiler
C
24
star
77

vscode-cmsis-csolution

Extension support for VS Code CMSIS Project Extension
24
star
78

libddssec

DDS Security library - Project moved to https://gitlab.arm.com/libraries/libddssec
C
23
star
79

NXP_LPC

CMSIS Driver Implementations for the NXP LPC Microcontroller Series
C
23
star
80

golang-utils

Helpers and utilities for Golang in order to do actions not available in the standard library.
Go
23
star
81

AArch64cryptolib

AArch64cryptolib is a from scratch implementation of cryptographic primitives aiming for optimal performance on Arm A-class cores
C
23
star
82

AVH-TFLmicrospeech

Example: Micro speech for TensorFlow Lite
C
22
star
83

Shackleton-Framework

A generic genetic programming framework that aims to make genetic programming easier for a myriad of uses. Currently, the main target is to use the framework for code optimization in tandem with the LLVM framework.
C
22
star
84

CMSIS-Stream

CMSIS-Stream software component
Python
21
star
85

bart

Behavioural Analysis and Regression Toolkit
Python
20
star
86

PAF

PAF (the Physical Attack Framework) is a framework for analyzing physical attacks: fault injection and side channels
C++
20
star
87

HPCG_for_Arm

C++
20
star
88

armnn-mlperf

Arm mlperf.org benchmark port
C++
20
star
89

coresight-wire-protocol

Coresight Wire Protocol (CSWP) Server/Client and streaming trace examples.
HTML
18
star
90

ATP-Engine

C++
18
star
91

bsa-acs

Arm SystemReady : BSA Architecture Compliance Suite
C
17
star
92

ATS-Keyword

Smart Home Total Solution - Keyword Recognition
C
17
star
93

open-iot-sdk

Open-IoT-SDK - Home of the Total Solution applications.
C
16
star
94

vscode-keil-studio-pack

Extension pack for all VS Code extensions
16
star
95

CMSIS-RTOS2_Validation

Validation test suite for CMSIS-RTOS2 API implementations using Arm Virtual Hardware (AVH).
C
16
star
96

vr-sdk-for-android

VR SDK for Android
CSS
16
star
97

meabo

Multi-purpose multi-phase micro-benchmark
C
15
star
98

avhclient

Arm Virtual Hardware Client
Python
15
star
99

CMSIS-Driver_Validation

Test suite for verifying CMSIS-Driver implementations.
C
15
star
100

Methodology_for_ArmIE_SVE

C++
15
star