There are no reviews yet. Be the first to send feedback to the community and the maintainers!
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinksbevfusion
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representationtemporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understandingonce-for-all
[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deploymentllm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationproxylessnas
[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardwaretorchquantum
A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.data-efficient-gans
[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Trainingefficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.torchsparse
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsgan-compression
[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANsanycost-gan
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editingtinyml
TinyChatEngine
TinyChatEngine: On-Device LLM Inference Librarytinyengine
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memoryfastcomposer
[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attentionpvcnn
[NeurIPS 2019, Spotlight] Point-Voxel CNN for Efficient 3D Deep Learninglite-transformer
[ICLR 2020] Lite Transformer with Long-Short Range Attentionspvnas
[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolutiondistrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Modelsmcunet
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learningtiny-training
On-Device Training Under 256KB Memory [NeurIPS'22]amc
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devicesdlg
[NeurIPS 2019] Deep Leakage From Gradientshaq
[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precisionoffsite-tuning
Offsite-Tuning: Transfer Learning without Full Modelhardware-aware-transformers
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processinglitepose
[CVPR'22] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimationinter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Accelerationamc-models
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devicesapq
[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policyparallel-computing-tutorial
flatformer
[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformerpatch_conv
Patch convolution to avoid large GPU memory usage of Conv2D6s965-fall2022
sparsevit
[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformerbnn-icestick
Binary Neural Network on IceStick FPGA.e3d
Efficient 3D Deep Learningneurips-micronet
[JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Championspatten-llm
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruningtinychat-tutorial
pruning-sparsity-publications
iccad-tinyml-open
[ICCAD'22 TinyML Contest] Efficient Heart Stroke Detection on Low-cost Microcontrollerscalo-cluster
ml-blood-pressure
gan-compression-dynamic
Love Open Source and this site? Check out how you can help us