• Stars
    star
    2
  • Language
    Python
  • Created almost 5 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scaling Transformer architectures has been critical for pushing the frontiers of Language Modelling (LM), a problem central to Natural Language Processing (NLP) and Language Understanding. Although there is a direct positive relationship between the Transformer capacity and its LM performance, there are practical limitations which make training massive models impossible. These limitations come in the form of computation and memory costs which cannot be solely addressed by training on parallel devices. In this thesis, we investigate two approaches which can make Transformers more computationally and memory efficient. First, we introduce the Mixture-of-Experts (MoE) Transformer which can scale its capacity at a sub-linear computational cost. Second, we present a novel content-based sparse attention mechanism called Hierarchical Self Attention (HSA). We demonstrate that the MoE Transformer is capable of achieving lower test perplexity values than a vanilla Transformer model with higher computational demands. Language Modelling experiments, involving a Transformer which uses HSA in place of conventional attention, revealed that HSA can speed up attention computation by up to 330% at a negligible cost in model performance.

More Repositories

1

reinforcement_learning_oe

The work aims to explore Value based, Deep Reinforcment Learning (Deep Q-Learning and Double Deep Q-Learning) for the problem of Optimal Trade Execution. The problem of Optimal Trade Execution aims to find the the optimal "path" of executing a stock order, or in other words the number of shares to be executed at different steps given a time constraint, such that the price impact from the market is minimised and consequently revenue from executing a stock order maximised.
Python
49
star
2

google_trends_consumption_prediction

This work investigates the forecasting relationship between a Google Trends indicator and real private consumption expenditure in the US. The indicator is constructed by applying Kernel Principal Component Analysis to consumption-related Google Trends search categories. The predictive performance of the indicator is evaluated in relation to two conventional survey-based indicators: the Conference Board Consumer Confidence Index and the University of Michigan Consumer Sentiment Index. The findings suggest that in both in-sample and out-of-sample nowcasting estimations the Google indicator performs better than survey-based predictors. The results also demonstrate that the predictive performance of survey-augmented models is no different than the power of a baseline autoregressive model that includes macroeconomic variables as controls. The results demonstrate an enormous potential of Google Trends data as a tool of unmatched value to forecasters of private consumption.
Python
40
star
3

fractal_flutter

Fractal is an ML-powered network of interconnected public chats that allows branching of chats into more focused “sub-chats”, thereby overcoming the problem of rapid conversation subject dilution and low engagement. Fractal aims to allow unacquainted individuals to spontaneously find and discuss niche topics of common interest in real-time.
Dart
26
star
4

random_google_colabs

Jupyter Notebook
1
star
5

fractal

Python
1
star
6

PLAsTiCC-Astronomical-Classification-Solution

PLAsTiCC is a large data challenge that attempts to classify astronomical objects by analysing the time series measurements of the ‘light curves’ data (intensity of photon flux) emitted by cosmological objects using six different astronomical passbands . The flux may be decreasing or 1 increasing over time, the pattern of the changes in its brightness acts as a good indicator of the underlying object. Each object in the set of training data belongs to one of the 14 classes, in the test data there is one additional 15th class that is meant to capture “novelties” (objects that are hypothesised to exist).
Python
1
star
7

fractal-angular-prod

Fractal is an ML-powered network of interconnected public chats that allows branching of chats into more focused “sub-chats”, thereby overcoming the problem of rapid conversation subject dilution and low engagement. Fractal aims to allow unacquainted individuals to spontaneously find and discuss niche topics of common interest in real-time.
TypeScript
1
star