apple/ml-sigma-reparam

Stars
292
Rank 142,152 (Top 3 %)
Language
Python
License
Other
Created about 1 year ago
Updated 6 months ago

apple/ml-sigma-reparam

apple

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

This software project accompanies the research paper, Stabilizing Transformer Training by Preventing Attention Entropy Collapse, published at ICML 2023.

Introduction

Transformers are difficult to train. In this work, we study the training stability of Transformers by proposing a novel lense named Attention Entropy Collapse. Attention Entropy is defined as the quantity

$$\text{Ent}(A_i) = -\sum_{j=1}^T A_{i,j}\log(A_{i,j})$$

for an attention matrix $A$, with $A_{i,j}$ corresponding to the $i$ -th query and $j$ -th key/value, respectively. Our observation is that training instability often occurrs in conjunction with sharp decreases of the average attention entropy, and we denote this phenomenon as entropy collapse. This is illustrated in Figure below.

We provide both theoretical and emprical analyses to the entropy collapse phenomenon, and propose a simple fix named $\sigma$ Reparam, where we reparamaeterize all the weights in a Transformer with

$$\widehat{W}=\frac{\gamma}{\sigma(W)}W$$

Getting Started

We provide two reference implementations. One in PyTorch, applied to the Vision Transformer (VIT) setting; and another in JAX, applied to speech recognition (ASR). Please refer to the vision and speech folders for details. The same PyTorch implementation was used for language modeling (LM) and machine translation (MT) experiments.

BibTex

@inproceedings{zhai2023stabilizing,
  title={Stabilizing Transformer Training by Preventing Attention Entropy Collapse},
  author={Zhai, Shuangfei and Likhomanenko, Tatiana and Littwin, Etai and Busbridge, Dan and Ramapuram, Jason and Zhang, Yizhe and Gu, Jiatao and Susskind, Joshua M},
  booktitle={International Conference on Machine Learning},
  pages={40770--40803},
  year={2023},
  organization={PMLR}
}

swift

The Swift Programming Language

ml-stable-diffusion

Stable Diffusion with Core ML on Apple Silicon

swift-evolution

This maintains proposals for changes and user-visible enhancements to the Swift Programming Language.

foundationdb

FoundationDB - the open source, distributed, transactional key-value store

turicreate

Turi Create simplifies the development of custom machine learning models.

darwin-xnu

The Darwin Kernel (mirror). This repository is a pure mirror and contributions are currently not accepted via pull-requests, please submit your contributions via https://developer.apple.com/bug-reporting/

pkl

A configuration as code language with rich validation and tooling.

swift-package-manager

The Package Manager for the Swift Programming Language

ml-ferret

swift-nio

Event-driven network application framework for high performance protocol servers & clients, non-blocking.

corenet

CoreNet: A library for training deep neural networks

Jupyter Notebook

swift-algorithms

Commonly used sequence and collection algorithms for Swift

swift-corelibs-foundation

The Foundation Project, providing core utilities, internationalization, and OS independence

swift-protobuf

Plugin and runtime library for using protobuf with Swift

coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

password-manager-resources

A place for creators and users of password managers to collaborate on resources to make password management better.

ml-mgie

tensorflow_macos

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.

swift-collections

Commonly used data structures for Swift

ml-depth-pro

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.

swift-argument-parser

Straightforward, type-safe argument parsing for Swift

sourcekit-lsp

Language Server Protocol implementation for Swift and C-based languages

swift-syntax

A set of Swift libraries for parsing, inspecting, generating, and transforming Swift source code.

swift-log

A Logging API for Swift

swift-async-algorithms

Async Algorithms for Swift

swift-markdown

A Swift package for parsing, building, editing, and analyzing Markdown documents.

ml-ane-transformers

Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)

swift-corelibs-libdispatch

The libdispatch Project, (a.k.a. Grand Central Dispatch), for concurrency on multicore hardware

HomeKitADK

swift-format

Formatting technology for Swift source code

swift-foundation

The Foundation project

homebrew-apple

cups

Apple CUPS Sources

axlearn

An Extensible Deep Learning Library

ml-fastvit

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023

ml-cvnets

CVNets: A library for training computer vision networks

sample-food-truck

SwiftUI sample code from WWDC22

swift-numerics

Advanced mathematical types and functions for Swift

swift-book

The Swift Programming Language book

ml-4m

4M: Massively Multimodal Masked Modeling

swift-testing

A modern, expressive testing package for Swift

ml-hypersim

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

swift-crypto

Open-source implementation of a substantial portion of the API of Apple CryptoKit suitable for use on Linux platforms.

swift-openapi-generator

Generate Swift client and server code from an OpenAPI document.

swift-docker

Docker Official Image packaging for Swift

ml-neuman

Official repository of NeuMan: Neural Human Radiance Field from a Single Video (ECCV 2022)

swift-system

Low-level system calls and types for Swift

swift-docc

Documentation compiler that produces rich API reference documentation and interactive tutorials for your Swift framework or package.

swift-corelibs-xctest

The XCTest Project, A Swift core library for providing unit test support

swift-llbuild

A low-level build system, used by Xcode and the Swift Package Manager

swift-atomics

Low-level atomic operations for Swift

servicetalk

A networking framework that evolves with your application

swift-http-types

Version-independent HTTP currency types for Swift

swift-llvm

swift-driver

Swift compiler driver reimplementation in Swift

swift-protobuf-plugin

Moved to apple/swift-protobuf

unityplugins

swift-embedded-examples

A collection of example projects using Embedded Swift

ml-mobileone

This repository contains the official implementation of the research paper, "An Improved One millisecond Mobile Backbone".

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models

swift-lldb

This is the version of LLDB that supports the Swift programming language & REPL.

swift-clang

ml-gaudi

ml-mobileclip

This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024

swift-metrics

Metrics API for Swift

swift-distributed-actors

Peer-to-peer cluster implementation for Swift Distributed Actors

ARKitScenes

This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and process assets, and training code described in our paper.

device-management

Device management schema data for MDM.

sample-backyard-birds

ml-facelit

Official repository of FaceLit: Neural 3D Relightable Faces (CVPR 2023)

ccs-calendarserver

The Calendar and Contacts Server.

swift-3-api-guidelines-review

swift-org-website

Swift.org website

GCGC

Jupyter Notebook

ml-mdm

Train high-quality text-to-image diffusion models in a data & compute efficient manner

swift-nio-http2

HTTP/2 support for SwiftNIO

swift-tools-support-core

Contains common infrastructural code for both SwiftPM and llbuild.

swift-nio-ssh

SwiftNIO SSH is a programmatic implementation of SSH using SwiftNIO

swift-playdate-examples

An Embedded Swift game running on Playdate by Panic

swift-nio-ssl

TLS Support for SwiftNIO, based on BoringSSL.

ml-gmpi

[ECCV 2022, Oral Presentation] Official PyTorch implementation of GMPI

example-package-dealer

Example package for use with the Swift Package Manager

security-pcc

Private Cloud Compute (PCC)

swift-collections-benchmark

A benchmarking tool for Swift Collection algorithms

swift-homomorphic-encryption

Homomorphic Encryption library and applications in Swift

example-package-playingcard

Example package for use with the Swift Package Manager

indexstore-db

Index database library for use with sourcekit-lsp

swift-docc-render

Web renderer for Swift-DocC documentation.

ml-hierarchical-confusion-matrix

Neo: Hierarchical Confusion Matrix Visualization (CHI 2022)

swift-docc-plugin

Swift Package Manager command plugin for Swift-DocC

swift-migration-guide

pfl-research

Simulation framework for accelerating research in Private Federated Learning

Jupyter Notebook

ml-gsn

swift-llbuild2

A fresh take on a low-level build system API.

swift-source-compat-suite

The infrastructure and project index comprising the Swift source compatibility suite.

swift-xcode-playground-support

Logging and communication to allow Swift toolchains to communicate with Xcode.

sample-cloudkit-sharing

swift-experimental-string-processing

An early experimental general-purpose pattern matching engine for Swift.

swift-matter-examples

An Embedded Swift Matter application running on ESP32-C6

pkl-go

Pkl bindings for the Go programming language