fkodom/fft-conv-pytorch

Stars
472
Rank 93,034 (Top 2 %)
Language
Python
License
MIT License
Created over 5 years ago
Updated about 1 year ago

fkodom/fft-conv-pytorch

fkodom

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Implementation of 1D, 2D, and 3D FFT convolutions in PyTorch. Much faster than direct convolutions for large kernel sizes.

fft-conv-pytorch

Implementation of 1D, 2D, and 3D FFT convolutions in PyTorch.

Faster than direct convolution for large kernels.
Much slower than direct convolution for small kernels.
In my local tests, FFT convolution is faster when the kernel has >100 or so elements.
- Dependent on machine and PyTorch version.
- Also see benchmarks below.

Install

Using pip:

pip install fft-conv-pytorch

From source:

git clone https://github.com/fkodom/fft-conv-pytorch.git
cd fft-conv-pytorch
pip install .

Example Usage

import torch
from fft_conv_pytorch import fft_conv, FFTConv1d

# Create dummy data.  
#     Data shape: (batch, channels, length)
#     Kernel shape: (out_channels, in_channels, kernel_size)
#     Bias shape: (out channels, )
# For ordinary 1D convolution, simply set batch=1.
signal = torch.randn(3, 3, 1024 * 1024)
kernel = torch.randn(2, 3, 128)
bias = torch.randn(2)

# Functional execution.  (Easiest for generic use cases.)
out = fft_conv(signal, kernel, bias=bias)

# Object-oriented execution.  (Requires some extra work, since the 
# defined classes were designed for use in neural networks.)
fft_conv = FFTConv1d(3, 2, 128, bias=True)
fft_conv.weight = torch.nn.Parameter(kernel)
fft_conv.bias = torch.nn.Parameter(bias)
out = fft_conv(signal)

Benchmarks

Benchmarking FFT convolution against the direct convolution from PyTorch in 1D, 2D, and 3D. The exact times are heavily dependent on your local machine, but relative scaling with kernel size is always the same.

Dimensions	Input Size	Input Channels	Output Channels	Bias	Padding	Stride	Dilation
1	(4096)	4	4	True	0	1	1
2	(512, 512)	4	4	True	0	1	1
3	(64, 64, 64)	4	4	True	0	1	1

grouped-query-attention-pytorch

(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints" (https://arxiv.org/pdf/2305.13245.pdf)

yet-another-retnet

A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (https://arxiv.org/pdf/2307.08621.pdf)

clip-text-decoder

Generate text captions for images from their embeddings.

transformer-from-scratch

Code implementation from my blog post: https://fkodom.substack.com/p/transformers-from-scratch-in-pytorch

soft-mixture-of-experts

PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)

dilated-attention-pytorch

(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)

wnet-unsupervised-image-segmentation

lora-pytorch

Simple but robust implementation of LoRA for PyTorch. Compatible with NLP, CV, and other model types. Strongly typed and tested.

semantle

Fastest 'Semantle' solver this side of the Mississippi.

python-repo-template

Template repo for Python projects, especially those focusing on machine learning and/or deep learning.

document-rag

RAG application to answer questions about PDF documents using LLMs.

byol

PyTorch implementation of BYOL: a fantastically simple method for self-supervised image representation learning with SOTA performance.

simple-bert-pytorch

A very simple BERT implementation in PyTorch, which only depends on PyTorch itself.

blip-inference

Pretrained BLIP with a similar API to CLIP.

octconv-pytorch

metis

Minimalist library for training RL agents in PyTorch. Implements many common training algorithms, with a focus on actor-critic methods. Includes SAC, TD3, PPO, A2C, VPG.

simple-cocotools

A simple, modern alternative to 'pycocotools'.

unipipe

Build batch pipelines in Python that run anywhere.

ncsn

Personal repo recreating Noise Contrastive Score Networks (https://arxiv.org/abs/1907.05600v2) for learning purposes.

wordle

Fastest Wordle solver in the West.

docker-images

Repository for automatically building base Docker images with Github Actions.

flaxseed

(Personal) Training library build on top of Flax, making it easier to train deep learning models with Jax. Because Flax is great, but doesn't take things far enough.

ddpg-her-pytorch

Implementation of the Hindsight Experience Replay (HER) algorithm using PyTorch. Utilizes Deep Deterministic Policy Gradients for off-policy optimization of RL agents in continuous action spaces.