• Stars
    star
    514
  • Rank 86,021 (Top 2 %)
  • Language
    C++
  • License
    Other
  • Created about 9 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is a Experimental version of OpenCL by AMD Research, we now recommend you to use The official BVLC Caffe OpenCL branch is over at Caffe branch now at https://github.com/BVLC/caffe/tree/opencl

This was experimental branch of Caffe for OpenCL, we know recommend you use the now official OpenCL port of Caffe in BVLC GitHub Repo at https://github.com/BVLC/caffe/tree/opencl

OpenCL Caffe Experimental branch by AMD Reserach- No new development is happing on it.

This is an OpenCL implementation of Caffe, a mainstream DNN framework (https://github.com/BVLC/caffe). It includes a largely complete Caffe feature set as of August 2015. The project is under active development to improve performance and add new features. Contributions from the community are welcome.

OpenCL (https://en.wikipedia.org/wiki/OpenCL) is an open standard parallel programming language for heterogeneous platforms. OpenCL is supported by a variety of commercial chip manufacturers.

Branches

We have three branches in this repo.

-stable, the stable branch for users

-dev, the developer branch, we encourage people to contribute on this branch

-master, the original Caffe's master branch against which our code is synchronized.

Design features

-All Caffe layers ported to OpenCL

-Performance improvement by batched implementation for conv layer based on clBLAS

-The user can choose the optimal batch number depending on H/W properties, image size and minibatch size

-Supports OpenCL 2.0, 1.2

-Implemented in C++ and OpenCL, maintaining the same interfaces as the original Caffe

-Users can directly run DNN models: AlexNet, VGG-16 and VGG-19

Note: More features are planned in the near future. Currently this implementation has been verified and tuned on AMD devices (CPUs/GPUs/APUs). Compatibility across different chip manufacturers will be considered for future addition.

Performance

We intend to keep updating the latest performance as we make optimizations. Fury results are preliminary and are actively being improved.

  • Training speed (Model: AlexNet, minibatch size 128)
Platform Speed (images per second)
AMD W9100 & A10-7850k 255
AMD R9 Fury & A10-7850k 261
AMD R290X @1000MHz & A10-7850k 268
AMD S9150 @900MHz & Xeon E5-2640 227
  • Recognition speed (Model: AlexNet, minibatch size 128)
Platform Speed (images per second)
AMD W9100 & A10-7850k 590
AMD R9 Fury & A10-7850k 699
AMD R290X @1000MHz & A10-7850k 606
AMD S9150 @900MHz & Xeon E5-2640 452

Wiki

For more information on how to install, use or contribute to this code base, please visit our wiki page: https://github.com/amd/OpenCL-caffe/wiki

Contributors

Junli Gu, Yibing Liu, Yuan Gao, Maohua Zhu

We thank Mauricio Breternitz, Hanjin Chu and Greg Stoner for their technical suggestions and support.

If you have any questions, please send an email to [email protected]

Support needed

As an open source project, we hope to maintain an open dynamics and sharing culture. We encourage the contribution and support from the community to improve it together.

License

The original Caffe is provided in the BSD 2-Clause license open source license. The OpenCL ports written by AMD is covered by AMD license. We encourage the contribution and support from external, your contribution will be covered either by BSD 2-Clause license or whichever your preferred license.

Original Caffe information

Caffe

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Join the chat at https://gitter.im/BVLC/caffe

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

More Repositories

1

RyzenAI-SW

C++
368
star
2

xdna-driver

C
274
star
3

aocl-libm-ose

AOCL-LibM
C
99
star
4

Chromium-WebCL

WebCL implementation for Chromium
C
91
star
5

amd-lab-notes

AMD lab notes with code examples to demonstrate use of AMD GPUs
C++
88
star
6

ZenDNN

C++
80
star
7

furious.js

scientific computing package for JavaScript - inspired by NumPy
JavaScript
79
star
8

AMD-ASPFW

C
79
star
9

UIF

57
star
10

fuzzyHSA

Python
50
star
11

amd-fftw

FFTW code optimized for AMD based processors
C
47
star
12

firmware_binaries

Administrator : [email protected]
C
45
star
13

ryzen-ai-documentation

Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take pretrained machine learning models in popular frameworks and run them on laptops powered by AMD Ryzen™ AI technology.
44
star
14

HPCTrainingExamples

C++
37
star
15

amd_smi_exporter

The AMD SMI Exporter exports AMD EPYC CPU & Datacenter GPU metrics to the Prometheus server.
Go
29
star
16

amd_energy

C
28
star
17

apml_modules

AMD APML modules, apml_sbtsi and apml_sbrmi registers to hwmon framework provding temperature and power metrics of the socket. They are extended by registering a misc_device, which provides ioctl interface to BMC admins.
C
26
star
18

amd_hsmp

AMD HSMP module to provide user interface to system management features.
C
26
star
19

aocl-sparse

AMD optimized Sparse Linear Algebra library
C++
23
star
20

esmi_ib_library

E-SMI: EPYC™ System management Interface In-band Library
C
23
star
21

aocl-compression

A software library of lossless data compression methods tuned and optimized for AMD “Zen”-based CPUs
C
21
star
22

win-libm

Core Math functions for MS Windows
Assembly
20
star
23

openmm-hip

C
19
star
24

esmi_oob_library

for hosting E-SMI Out-of-band code
C
17
star
25

aocl-crypto

C++
15
star
26

Kria-RoboticsAI

Python
14
star
27

InfinityHub-CI

Dockerfile
12
star
28

go_amd_smi

C
10
star
29

scalapack

DEPRECATED. This Scalapck repository is deprecated. The last version in this repository is 3.0. Refer to "aocl-scalapack" repository under the same "amd" organization for AOCL Scalapack 3.1 release onwards. https://github.com/amd/aocl-scalapack
Fortran
9
star
30

GenAI-contest

Python
8
star
31

aocl-libmem

A library of AMD optimized memory and string functions
C
8
star
32

ZenDNN-onnxruntime

C++
7
star
33

sev-utils

Utilities and tools for AMD SEV memory encryption technologies
Shell
7
star
34

ama-sdk

HTML
7
star
35

aocl-utils

AOCL-Utils library to get CPU architecture, Cache information and CPU features flags etc.
C++
7
star
36

ZenDNN-pytorch

Python
5
star
37

ZenDNN-tensorflow-plugin

C++
5
star
38

Open-Field-Health-Check

Python
5
star
39

resource-optimizer

C
4
star
40

aocl-spack

DEPRECATED. This Spack repo is deprecated. Refer to "spack" repository under the same "amd" organization for AOCL Spack packages. https://github.com/amd/spack
Shell
3
star
41

ZenDNN-pytorch-plugin

Python
3
star
42

ZenDNN-tensorflow

C++
2
star
43

HPCTrainingDock

Shell
2
star
44

Linux_ISP_Kernel

C
2
star
45

mumps-build

CMake
1
star
46

tools-and-sdks

For distributing software via Spack tool
1
star
47

Linux_ISP_libcamera

C++
1
star