There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Repository Details
A PyTorch implementation of MEGABYTE. This multi-scale transformer architecture has the excellent features of tokenization-free and sub-quadratic attention. The paper link: https://arxiv.org/abs/2305.07185