Discover arturbeg/efficient_transformer Open Source project

Stars
2
Language
Python
Created about 5 years ago
Updated almost 4 years ago

arturbeg

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Scaling Transformer architectures has been critical for pushing the frontiers of Language Modelling (LM), a problem central to Natural Language Processing (NLP) and Language Understanding. Although there is a direct positive relationship between the Transformer capacity and its LM performance, there are practical limitations which make training massive models impossible. These limitations come in the form of computation and memory costs which cannot be solely addressed by training on parallel devices. In this thesis, we investigate two approaches which can make Transformers more computationally and memory efficient. First, we introduce the Mixture-of-Experts (MoE) Transformer which can scale its capacity at a sub-linear computational cost. Second, we present a novel content-based sparse attention mechanism called Hierarchical Self Attention (HSA). We demonstrate that the MoE Transformer is capable of achieving lower test perplexity values than a vanilla Transformer model with higher computational demands. Language Modelling experiments, involving a Transformer which uses HSA in place of conventional attention, revealed that HSA can speed up attention computation by up to 330% at a negligible cost in model performance.

reinforcement_learning_oe

The work aims to explore Value based, Deep Reinforcment Learning (Deep Q-Learning and Double Deep Q-Learning) for the problem of Optimal Trade Execution. The problem of Optimal Trade Execution aims to find the the optimal "path" of executing a stock order, or in other words the number of shares to be executed at different steps given a time constraint, such that the price impact from the market is minimised and consequently revenue from executing a stock order maximised.

Python

google_trends_consumption_prediction

This work investigates the forecasting relationship between a Google Trends indicator and real private consumption expenditure in the US. The indicator is constructed by applying Kernel Principal Component Analysis to consumption-related Google Trends search categories. The predictive performance of the indicator is evaluated in relation to two conventional survey-based indicators: the Conference Board Consumer Confidence Index and the University of Michigan Consumer Sentiment Index. The findings suggest that in both in-sample and out-of-sample nowcasting estimations the Google indicator performs better than survey-based predictors. The results also demonstrate that the predictive performance of survey-augmented models is no different than the power of a baseline autoregressive model that includes macroeconomic variables as controls. The results demonstrate an enormous potential of Google Trends data as a tool of unmatched value to forecasters of private consumption.

Python

fractal_flutter

Fractal is an ML-powered network of interconnected public chats that allows branching of chats into more focused “sub-chats”, thereby overcoming the problem of rapid conversation subject dilution and low engagement. Fractal aims to allow unacquainted individuals to spontaneously find and discuss niche topics of common interest in real-time.

Dart

random_google_colabs

Jupyter Notebook

fractal