• Stars
    star
    2,411
  • Rank 18,941 (Top 0.4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementation of BERT that could load official pre-trained models for feature extraction and prediction

Keras BERT

Version License

[中文|English]

Implementation of the BERT. Official pre-trained models could be loaded for feature extraction and prediction.

Install

pip install keras-bert

Usage

External Links

Load Official Pre-trained Models

In feature extraction demo, you should be able to get the same extraction results as the official model chinese_L-12_H-768_A-12. And in prediction demo, the missing word in the sentence could be predicted.

Run on TPU

The extraction demo shows how to convert to a model that runs on TPU.

The classification demo shows how to apply the model to simple classification tasks.

Tokenizer

The Tokenizer class is used for splitting texts and generating indices:

from keras_bert import Tokenizer

token_dict = {
    '[CLS]': 0,
    '[SEP]': 1,
    'un': 2,
    '##aff': 3,
    '##able': 4,
    '[UNK]': 5,
}
tokenizer = Tokenizer(token_dict)
print(tokenizer.tokenize('unaffable'))  # The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]']`
indices, segments = tokenizer.encode('unaffable')
print(indices)  # Should be `[0, 2, 3, 4, 1]`
print(segments)  # Should be `[0, 0, 0, 0, 0]`

print(tokenizer.tokenize(first='unaffable', second='é’¢'))
# The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]', 'é’¢', '[SEP]']`
indices, segments = tokenizer.encode(first='unaffable', second='é’¢', max_len=10)
print(indices)  # Should be `[0, 2, 3, 4, 1, 5, 1, 0, 0, 0]`
print(segments)  # Should be `[0, 0, 0, 0, 0, 1, 1, 0, 0, 0]`

Train & Use

from tensorflow import keras
from keras_bert import get_base_dict, get_model, compile_model, gen_batch_inputs


# A toy input example
sentence_pairs = [
    [['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
    [['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
    [['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
]


# Build token dictionary
token_dict = get_base_dict()  # A dict that contains some special tokens
for pairs in sentence_pairs:
    for token in pairs[0] + pairs[1]:
        if token not in token_dict:
            token_dict[token] = len(token_dict)
token_list = list(token_dict.keys())  # Used for selecting a random word


# Build & train the model
model = get_model(
    token_num=len(token_dict),
    head_num=5,
    transformer_num=12,
    embed_dim=25,
    feed_forward_dim=100,
    seq_len=20,
    pos_num=20,
    dropout_rate=0.05,
)
compile_model(model)
model.summary()

def _generator():
    while True:
        yield gen_batch_inputs(
            sentence_pairs,
            token_dict,
            token_list,
            seq_len=20,
            mask_rate=0.3,
            swap_sentence_rate=1.0,
        )

model.fit_generator(
    generator=_generator(),
    steps_per_epoch=1000,
    epochs=100,
    validation_data=_generator(),
    validation_steps=100,
    callbacks=[
        keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
    ],
)


# Use the trained model
inputs, output_layer = get_model(
    token_num=len(token_dict),
    head_num=5,
    transformer_num=12,
    embed_dim=25,
    feed_forward_dim=100,
    seq_len=20,
    pos_num=20,
    dropout_rate=0.05,
    training=False,      # The input layers and output layer will be returned if `training` is `False`
    trainable=False,     # Whether the model is trainable. The default value is the same with `training`
    output_layer_num=4,  # The number of layers whose outputs will be concatenated as a single output.
                         # Only available when `training` is `False`.
)

Use Warmup

AdamWarmup optimizer is provided for warmup and decay. The learning rate will reach lr in warmpup_steps steps, and decay to min_lr in decay_steps steps. There is a helper function calc_train_steps for calculating the two steps:

import numpy as np
from keras_bert import AdamWarmup, calc_train_steps

train_x = np.random.standard_normal((1024, 100))

total_steps, warmup_steps = calc_train_steps(
    num_example=train_x.shape[0],
    batch_size=32,
    epochs=10,
    warmup_proportion=0.1,
)

optimizer = AdamWarmup(total_steps, warmup_steps, lr=1e-3, min_lr=1e-5)

Download Pretrained Checkpoints

Several download urls has been added. You can get the downloaded and uncompressed path of a checkpoint by:

from keras_bert import get_pretrained, PretrainedList, get_checkpoint_paths

model_path = get_pretrained(PretrainedList.multi_cased_base)
paths = get_checkpoint_paths(model_path)
print(paths.config, paths.checkpoint, paths.vocab)

Extract Features

You can use helper function extract_embeddings if the features of tokens or sentences (without further tuning) are what you need. To extract the features of all tokens:

from keras_bert import extract_embeddings

model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
texts = ['all work and no play', 'makes jack a dull boy~']

embeddings = extract_embeddings(model_path, texts)

The returned result is a list with the same length as texts. Each item in the list is a numpy array truncated by the length of the input. The shapes of outputs in this example are (7, 768) and (8, 768).

When the inputs are paired-sentences, and you need the outputs of NSP and max-pooling of the last 4 layers:

from keras_bert import extract_embeddings, POOL_NSP, POOL_MAX

model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
texts = [
    ('all work and no play', 'makes jack a dull boy'),
    ('makes jack a dull boy', 'all work and no play'),
]

embeddings = extract_embeddings(model_path, texts, output_layer_num=4, poolings=[POOL_NSP, POOL_MAX])

There are no token features in the results. The outputs of NSP and max-pooling will be concatenated with the final shape (768 x 4 x 2,).

The second argument in the helper function is a generator. To extract features from file:

import codecs
from keras_bert import extract_embeddings

model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'

with codecs.open('xxx.txt', 'r', 'utf8') as reader:
    texts = map(lambda x: x.strip(), reader)
    embeddings = extract_embeddings(model_path, texts)

More Repositories

1

toolbox

https://cyberzhg.github.io/toolbox/ Encoding and parsing tools.
JavaScript
842
star
2

keras-self-attention

Attention mechanism for processing sequential data that considers the context for each timestamp.
Python
641
star
3

CLRS

Some exercises and problems in Introduction to Algorithms 3rd edition.
Jupyter Notebook
392
star
4

keras-transformer

Transformer implemented in Keras
Python
356
star
5

keras-radam

RAdam implemented in Keras & TensorFlow
Python
326
star
6

keras-multi-head

A wrapper layer for stacking layers horizontally
Python
222
star
7

keras-xlnet

Implementation of XLNet that can load pretrained checkpoints
Python
171
star
8

keras-gpt-2

Load GPT-2 checkpoint and generate texts
Python
127
star
9

torch-multi-head-attention

Multi-head attention in PyTorch
Python
125
star
10

keras-transformer-xl

Transformer-XL with checkpoint loader
Python
68
star
11

keras-pos-embd

Position embedding layers in Keras
Python
62
star
12

keras-gcn

Graph convolutional layers
Python
61
star
13

keras-layer-normalization

Layer normalization implemented in Keras
Python
60
star
14

keras-adabound

AdaBound optimizer in Keras
Python
57
star
15

keras-lookahead

Lookahead mechanism for optimizers in Keras.
Python
50
star
16

keras-word-char-embd

Concatenate word and character embeddings in Keras
Python
46
star
17

keras-lr-multiplier

Learning rate multiplier
Python
46
star
18

keras-octave-conv

Octave convolution
Python
36
star
19

keras-gradient-accumulation

Gradient accumulation for Keras
Python
35
star
20

keras-ordered-neurons

Ordered Neurons LSTM
Python
30
star
21

keras-drop-block

DropBlock implemented in Keras
Python
25
star
22

wiki-dump-reader

Extract corpora from Wikipedia dumps
Python
21
star
23

torch-layer-normalization

Layer normalization in PyTorch
Python
18
star
24

keras-adaptive-softmax

Adaptive embedding and softmax
Python
17
star
25

tf-keras-kervolution-2d

Kervolutional neural networks
Python
16
star
26

keras-trans-mask

Remove and restore masks for layers that do not support masking
Python
16
star
27

keras-lamb

Layer-wise Adaptive Moments optimizer for Batch training
Python
15
star
28

torch-position-embedding

Position embedding in PyTorch
Python
14
star
29

keras-losses

Some loss functions in Keras
Python
10
star
30

keras-embed-sim

Calculate similarity with embedding
Python
10
star
31

keras-position-wise-feed-forward

Feed forward layer implemented in Keras
Python
8
star
32

keras-targeted-dropout

Targeted dropout implemented in Keras
Python
8
star
33

EmojiView

😼 EmojiView for Android.
Java
8
star
34

github-action-python-lint

GitHub action that runs pycodestyle
Dockerfile
7
star
35

LaTeXGitHubMarkdown

Show LaTeX formulas for GitHub Markdown files.
JavaScript
7
star
36

keras-drop-connect

Drop-connect wrapper
Python
7
star
37

torch-gpt-2

Load GPT-2 checkpoint and generate texts in PyTorch
Python
6
star
38

keras-conv-vis

Convolution visualization
Python
6
star
39

MIXAL

MIX Assembly Language Simulator
C++
6
star
40

Sketch-Based

Some implementations of sketch-based methods; no longer maintained.
MATLAB
6
star
41

MineForces

Codeforces problem filter.
JavaScript
5
star
42

github-action-cpp-lint

GitHub action that runs cpplint
Dockerfile
5
star
43

keras-bi-lm

Train the Bi-LM model and use it as a feature extraction method
Python
5
star
44

mxnet-octave-conv

Octave convolution
Python
3
star
45

torch-same-pad

Paddings used for converting TensorFlow conv/pool layers to PyTorch.
Python
3
star
46

gitbook-plugin-meta

Add meta data to <head> for your gitbook.
HTML
3
star
47

toy-auto-diff

Toy implementation of automatic differentiation
Python
3
star
48

keras-piecewise-pooling

Piecewise pooling layer in Keras
Python
2
star
49

keras-piecewise

A wrapper layer for splitting and accumulating sequential data.
Python
2
star
50

keras-perturbation

A demonstration of perturbation of data
Python
2
star
51

parse-toys

Parsing toys
Python
2
star
52

swift-6502-core

Emulation of the 6502 CPU
Swift
2
star
53

github-action-python-test

GitHub action that runs nose tests
Dockerfile
2
star
54

CrimsonTomato

https://goo.gl/JpF6eP Pomodoro timer, sync to calendar.
Java
2
star
55

keras-succ-reg-wrapper

A wrapper that slows down the updates of trainable weights.
Python
1
star
56

torch-embed-sim

Embedding similarity in PyTorch
Python
1
star
57

torch-transformer

Transformer in PyTorch
Python
1
star
58

UChar

Basic unicode information about a character.
C++
1
star
59

CppTesting

Personal C++ testing framework.
C++
1
star