• Stars
    star
    143
  • Rank 257,007 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

How good is BERT ? Comparing BERT to other state-of-the-art approaches on a French sentiment analysis dataset

French sentiment analysis with BERT

How good is BERT ? Comparing BERT to other state-of-the-art approaches on a large-scale French sentiment analysis dataset 📚

The contribution of this repository is threefold.

  • Firstly, I introduce a new dataset for sentiment analysis, scraped from Allociné.fr user reviews. It contains 100k positive and 100k negative reviews divided into 3 balanced splits: train (160k reviews), val (20k) and test (20k). At my knowledge, there is no dataset of this size in French language available on the internet.

  • Secondly, I share my code for French sentiment analysis with BERT, based on CamemBERT, and the 🤗Transformers library.

  • Lastly, I compare BERT results with other state-of-the-art approaches, such as TF-IDF and fastText, as well as other non-contextual word embeddings based methods.

Installation

If you want to experiment with the training code, follow these steps:

# Download repo and its dependencies 
git clone https://github.com/TheophileBlard/french-sentiment-analysis-with-bert/
cd french-sentiment-analysis-with-bert
pipenv install

# Extract dataset
pushd allocine_dataset && tar xvjf data.tar.bz2 && popd

# Activate virtualenv and open-up BERT notebook
pipenv shell
jupyter notebook 03_bert.ipynb 

But if you only need the model for inference, please refer to this paragraph.

Dataset

The dataset is made available as .jsonl files, as well as a .pickle file. Some examples from the training set are presented in the following table:

Review Polarity
Magnifique épopée, une belle histoire, touchante avec des acteurs qui interprètent très bien leur rôles (Mel Gibson, Heath Ledger, Jason Isaacs...), le genre de film qui se savoure en famille! Positive
N'étant pas fan de SF, j'ai du mal à commenter ce film. Au moins, dirons nous, il n'y a pas d'effets spéciaux et le thème de ces 3 derniers survivants, un blanc, un maori, une blanche est assez bien traité. Mais c'est quand même bien longuet ! Negative
Les scènes s'enchaînent de manière saccadée, les dialogues sont théâtraux, le jeu des acteurs ne transcende pas franchement le film. Seule la musique de Vivaldi sauve le tout. Belle déception. Negative

For more information, please refer to the dedicated page.

The dataset is also available in the 🤗Datasets library, please refer to this paragraph.

Results

Full dataset

Model Validation Accuracy Validation F1-Score Test Accuracy Test F1-Score
CamemBERT 97.39 97.36 97.44 97.34
RNN 94.39 94.34 94.58 94.39
TF-IDF + LogReg 94.35 94.29 94.38 94.19
CNN 93.69 93.72 94.10 93.98
fastText (unigrams) 92.88 92.75 92.90 92.57

CamemBERT outperforms all other models by a large margin.

Learning curves

Test accuracy as a function of training dataset size.

With only 500 training examples, CamemBERT is already showing better results that any other model trained on the full dataset. This is the power of modern language models and self-supervised pre-training.

For this kind of tasks, RNNs need a lot of data (>100k) to perform well. The same result (for English language) is empirically observed by Alec Radford in these slides.

Inference time

Time taken by a model to perform a single prediction (averaged on 1000 predictions).

As one would expect, the slowest model is CamemBERT, followed by TF-IDF.

On the other hand, fastText performs the ... fastest, but is actually slow compared to the original implementation, because of the overhead of Python and Keras.

Generalizability

I considered the text classification task from FLUE (French Language Understanding Evaluation) to evaluate the cross-domain generalization capabilities of the models. This is also a binary classification task, but on Amazon product reviews.

There is one train and test set for each product category (books, DVD and music). The train and test sets are balanced, including around 1000 positive and 1000 negative reviews, for a total of 2000 reviews in each dataset.

I didn't do any additional training, only inference on the test sets. The resulting accuracies are reported in the following table:

Model Books DVD Music
CamemBERT 94.10 93.25 94.55
TF-IDF + LogReg 87.10 88.10 87.45
CNN 85.80 88.75 87.25
RNN 85.30 87.55 87.50
fastText (unigrams) 85.25 87.10 86.65

Without additional training on domain-specific data, the CamemBERT model outperforms finetuned CamemBERT & FlauBERT models reported in (He et al., 2020). Update: FlauBERT (Large) released 03/20 gets better results, but it is excessively heavy.

TF-IDF + LogReg also performs better than specifically-trained mBERT (Eisenschlos et al., 2019).

Hugging Face Integration

The CamemBERT model is now part of the 🤗Transformers library ! You can retrieve it and perform inference with the following code:

from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine")
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")

nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

print(nlp("Alad'2 est clairement le meilleur film de l'année 2018.")) # POSITIVE
print(nlp("Juste whoaaahouuu !")) # POSITIVE
print(nlp("NUL...A...CHIER ! FIN DE TRANSMISSION.")) # NEGATIVE
print(nlp("Je m'attendais à mieux de la part de Franck Dubosc !")) # NEGATIVE

The dataset is also available in 🤗Datasets. To download it and start training your own model, simply use:

from datasets import load_dataset

train_ds, val_ds, test_ds = load_dataset(
    'allocine', 
    split=['train', 'validation', 'test']
)

Online Demo

Open the online demo on Google Colab:

Colab Demo

Release History

Task List

  • Dataset available
  • Models available
  • Results on full dataset
  • Learning curves
  • Inference time
  • Generalizability
  • Online demo
  • Hugging Face integration
  • Predicting usefulness

Author

Théophile Blard – 📧 [email protected]

If you use this work (code or dataset), please cite as:

Théophile Blard, French sentiment analysis with BERT, (2020), GitHub repository, https://github.com/TheophileBlard/french-sentiment-analysis-with-bert