• Stars
    star
    199
  • Rank 194,915 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ALBERT model Pretraining and Fine Tuning using TF2.0

ALBERT-TF2.0

ALBERT model Fine Tuning using TF2.0

This repository contains TensorFlow 2.0 implementation for ALBERT.

Requirements

  • python3
  • pip install -r requirements.txt

ALBERT Pre-training

ALBERT model pre-training from scratch and Domain specific fine-tuning. Instructions here

Download ALBERT TF 2.0 weights

Verison 1 Version 2
base base
large large
xlarge xlarge
xxlarge xxlarge

unzip the model inside repo.

Above weights does not contain the final layer in original model. Now can only be used for fine tuning downstream tasks.

For full Weights conversion from TF-HUB to TF 2.0 here

Download glue data

Download using the below cmd

python download_glue_data.py --data_dir glue_data --tasks all

Fine-tuning

To prepare the fine-tuning data for final model training, use the create_finetuning_data.py script. Resulting datasets in tf_record format and training meta data should be later passed to training or evaluation scripts. The task-specific arguments are described in following sections:

Creating finetuninig data

  • Example CoLA
export GLUE_DIR=glue_data/
export ALBERT_DIR=large/

export TASK_NAME=CoLA
export OUTPUT_DIR=cola_processed
mkdir $OUTPUT_DIR

python create_finetuning_data.py \
 --input_data_dir=${GLUE_DIR}/ \
 --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
 --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
 --eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
 --meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
 --fine_tuning_task_type=classification --max_seq_length=128 \
 --classification_task_name=${TASK_NAME}

Running classifier

export MODEL_DIR=CoLA_OUT
python run_classifer.py \
--train_data_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
--eval_data_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
--input_meta_data_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
--albert_config_file=${ALBERT_DIR}/config.json \
--task_name=${TASK_NAME} \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
--output_dir=${MODEL_DIR} \
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
--do_train \
--do_eval \
--train_batch_size=16 \
--learning_rate=1e-5 \
--custom_training_loop

By default run_classifier will run 3 epochs. and evaluate on development set

Above cmd would result in dev set accuracy of 76.22 in CoLA task

The above code tested on TITAN RTX 24GB single GPU

SQuAD

Data and Evalution scripts

Training Data Preparation

export SQUAD_DIR=SQuAD
export SQUAD_VERSION=v1.1
export ALBERT_DIR=large
export OUTPUT_DIR=squad_out_${SQUAD_VERSION}
mkdir $OUTPUT_DIR

python create_finetuning_data.py \
--squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model  \
--train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record  \
--meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
--fine_tuning_task_type=squad \
--max_seq_length=384

Running Model

python run_squad.py \
--mode=train_and_predict \
--input_meta_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
--train_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
--predict_file=${SQUAD_DIR}/dev-${SQUAD_VERSION}.json \
--albert_config_file=${ALBERT_DIR}/config.json \
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
--train_batch_size=48 \
--predict_batch_size=48 \
--learning_rate=1e-5 \
--num_train_epochs=3 \
--model_dir=${OUTPUT_DIR} \
--strategy_type=mirror

Runnig SQuAD V2.0

export SQUAD_DIR=SQuAD
export SQUAD_VERSION=v2.0
export ALBERT_DIR=xxlarge
export OUTPUT_DIR=squad_out_${SQUAD_VERSION}
mkdir $OUTPUT_DIR
python create_finetuning_data.py \
--squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model  \
--train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record  \
--meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
--fine_tuning_task_type=squad \
--max_seq_length=384
python run_squad.py \
--mode=train_and_predict \
--input_meta_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
--train_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
--predict_file=${SQUAD_DIR}/dev-${SQUAD_VERSION}.json \
--albert_config_file=${ALBERT_DIR}/config.json \
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
--train_batch_size=24 \
--predict_batch_size=24 \
--learning_rate=1.5e-5 \
--num_train_epochs=3 \
--model_dir=${OUTPUT_DIR} \
--strategy_type=mirror \
--version_2_with_negative \
--max_seq_length=384

Experiment done on 4 x NVIDIA TITAN RTX 24 GB.

Result

SQuAD output image

Multi-GPU training and XLA

  • Use flag --strategy_type=mirror for Multi GPU training. Currently All the existing GPUs in the environment will be used.
  • Use flag --enable-xla to enable XLA. Model training starting time will be increase.(JIT compilation)

Ignore

Below warning will be displayed if you use keras model.fit method at end of each epoch. Issue with training steps calculation when tf.data provided to model.fit() Have no effect on model performance so ignore. Mostly will fixed in the next tf2 relase . Issue-link

2019-10-31 13:35:48.322897: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range:
End of sequence
         [[{{node IteratorGetNext}}]]
         [[model_1/albert_model/word_embeddings/Shape/_10]]
2019-10-31 13:36:03.302722: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range:
End of sequence
         [[{{node IteratorGetNext}}]]
         [[IteratorGetNext/_4]]

References

  1. TensorFlow offical implementation of BERT in TF 2.0 . Lot of parts of code in this repo adapted from the above repo.
  2. LAMB optimizer from TensorFlow addons
  3. TF-HUB weights to TF 2.0 weights conversion : KPE

More Repositories

1

BERT-NER

Pytorch-Named-Entity-Recognition-with-BERT
Python
1,199
star
2

BERT-SQuAD

SQuAD Question Answering Using BERT, PyTorch
Python
396
star
3

Named-Entity-Recognition-with-Bidirectional-LSTM-CNNs

Named-Entity-Recognition-with-Bidirectional-LSTM-CNNs
Python
357
star
4

BERT-NER-TF

Named Entity Recognition with BERT using TensorFlow 2.0
Python
213
star
5

stable-diffusion-tritonserver

Deploy stable diffusion model with onnx/tenorrt + tritonserver
Jupyter Notebook
119
star
6

Vision-Transformer

Vision Transformer using TensorFlow 2.0
Python
95
star
7

DATA-SCIENCE-BOWL-2018

DATA-SCIENCE-BOWL-2018 Find the nuclei in divergent images to advance medical discovery
Jupyter Notebook
90
star
8

e5-mistral-7b-instruct

Finetune mistral-7b-instruct for sentence embeddings
Python
65
star
9

minGPT-TF

A minimal TF2 re-implementation of the OpenAI GPT training
Jupyter Notebook
55
star
10

BioELECTRA

BioELECTRA
51
star
11

Swin-Transformer-Serve

Deploy Swin Transformer using TorchServe
Python
26
star
12

TAPAS-TF2

End-to-end neural table-text understanding models.
Python
8
star
13

Malayalam-News-Classifier

Python
7
star
14

BioGPT-HF

Jupyter Notebook
5
star
15

Tapas-Tutorial

Jupyter Notebook
3
star
16

Summarizer

Python
2
star
17

pytorch-tutorial

Jupyter Notebook
2
star
18

Redis-Stack-Bitnami-Helm-Chart

Redis Stack Server Helm Chart
Mustache
1
star
19

librispeech_100_jax

Python
1
star
20

Tensorflow-Paper-Implementation

Python
1
star
21

BioNLP-Corpus

Python
1
star
22

dlrm-jax

Python
1
star
23

S4-Standalone

Python
1
star
24

Multilingual-Complex-Named-Entity-Recognition

Python
1
star
25

git-actions-python

Python
1
star
26

NLI4CT

Jupyter Notebook
1
star
27

BioSimCSE

1
star