French sentiment analysis with BERT
How good is BERT ? Comparing BERT to other state-of-the-art approaches on a large-scale French sentiment analysis dataset
📚
The contribution of this repository is threefold.
-
Firstly, I introduce a new dataset for sentiment analysis, scraped from Allociné.fr user reviews. It contains 100k positive and 100k negative reviews divided into 3 balanced splits: train (160k reviews), val (20k) and test (20k). At my knowledge, there is no dataset of this size in French language available on the internet.
-
Secondly, I share my code for French sentiment analysis with BERT, based on CamemBERT, and the
🤗 Transformers library. -
Lastly, I compare BERT results with other state-of-the-art approaches, such as TF-IDF and fastText, as well as other non-contextual word embeddings based methods.
Installation
If you want to experiment with the training code, follow these steps:
# Download repo and its dependencies
git clone https://github.com/TheophileBlard/french-sentiment-analysis-with-bert/
cd french-sentiment-analysis-with-bert
pipenv install
# Extract dataset
pushd allocine_dataset && tar xvjf data.tar.bz2 && popd
# Activate virtualenv and open-up BERT notebook
pipenv shell
jupyter notebook 03_bert.ipynb
But if you only need the model for inference, please refer to this paragraph.
Dataset
The dataset is made available as .jsonl
files, as well as a .pickle
file.
Some examples from the training set are presented in the following table:
Review | Polarity |
---|---|
Magnifique épopée, une belle histoire, touchante avec des acteurs qui interprètent très bien leur rôles (Mel Gibson, Heath Ledger, Jason Isaacs...), le genre de film qui se savoure en famille! | Positive |
N'étant pas fan de SF, j'ai du mal à commenter ce film. Au moins, dirons nous, il n'y a pas d'effets spéciaux et le thème de ces 3 derniers survivants, un blanc, un maori, une blanche est assez bien traité. Mais c'est quand même bien longuet ! | Negative |
Les scènes s'enchaînent de manière saccadée, les dialogues sont théâtraux, le jeu des acteurs ne transcende pas franchement le film. Seule la musique de Vivaldi sauve le tout. Belle déception. | Negative |
For more information, please refer to the dedicated page.
The dataset is also available in the
Results
Full dataset
Model | Validation Accuracy | Validation F1-Score | Test Accuracy | Test F1-Score |
---|---|---|---|---|
CamemBERT | 97.39 | 97.36 | 97.44 | 97.34 |
RNN | 94.39 | 94.34 | 94.58 | 94.39 |
TF-IDF + LogReg | 94.35 | 94.29 | 94.38 | 94.19 |
CNN | 93.69 | 93.72 | 94.10 | 93.98 |
fastText (unigrams) | 92.88 | 92.75 | 92.90 | 92.57 |
CamemBERT outperforms all other models by a large margin.
Learning curves
Test accuracy as a function of training dataset size.
With only 500 training examples, CamemBERT is already showing better results that any other model trained on the full dataset. This is the power of modern language models and self-supervised pre-training.
For this kind of tasks, RNNs need a lot of data (>100k) to perform well. The same result (for English language) is empirically observed by Alec Radford in these slides.
Inference time
Time taken by a model to perform a single prediction (averaged on 1000 predictions).
As one would expect, the slowest model is CamemBERT, followed by TF-IDF.
On the other hand, fastText performs the ... fastest, but is actually slow compared to the original implementation, because of the overhead of Python and Keras.
Generalizability
I considered the text classification task from FLUE (French Language Understanding Evaluation) to evaluate the cross-domain generalization capabilities of the models. This is also a binary classification task, but on Amazon product reviews.
There is one train and test set for each product category (books, DVD and music). The train and test sets are balanced, including around 1000 positive and 1000 negative reviews, for a total of 2000 reviews in each dataset.
I didn't do any additional training, only inference on the test sets. The resulting accuracies are reported in the following table:
Model | Books | DVD | Music |
---|---|---|---|
CamemBERT | 94.10 | 93.25 | 94.55 |
TF-IDF + LogReg | 87.10 | 88.10 | 87.45 |
CNN | 85.80 | 88.75 | 87.25 |
RNN | 85.30 | 87.55 | 87.50 |
fastText (unigrams) | 85.25 | 87.10 | 86.65 |
Without additional training on domain-specific data, the CamemBERT model outperforms finetuned CamemBERT & FlauBERT models reported in (He et al., 2020). Update: FlauBERT (Large) released 03/20 gets better results, but it is excessively heavy.
TF-IDF + LogReg also performs better than specifically-trained mBERT (Eisenschlos et al., 2019).
Hugging Face Integration
The CamemBERT model is now part of the
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine")
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")
nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
print(nlp("Alad'2 est clairement le meilleur film de l'année 2018.")) # POSITIVE
print(nlp("Juste whoaaahouuu !")) # POSITIVE
print(nlp("NUL...A...CHIER ! FIN DE TRANSMISSION.")) # NEGATIVE
print(nlp("Je m'attendais à mieux de la part de Franck Dubosc !")) # NEGATIVE
The dataset is also available in
from datasets import load_dataset
train_ds, val_ds, test_ds = load_dataset(
'allocine',
split=['train', 'validation', 'test']
)
Online Demo
Open the online demo on Google Colab:
Release History
- 0.4.0
- Uploaded model to https://huggingface.co/tblard/tf-allocine
- Uploaded the dataset to https://huggingface.co/datasets/viewer/?dataset=allocine
- 0.3.0
- Added Google Colab online demo
- 0.2.0
- Added inference time + generalizability
- 0.1.0
- First proper release
- Learning curves & results for all models
- 0.0.1
- Work in progress
Task List
- Dataset available
- Models available
- Results on full dataset
- Learning curves
- Inference time
- Generalizability
- Online demo
- Hugging Face integration
- Predicting usefulness
Author
Théophile Blard –
If you use this work (code or dataset), please cite as:
Théophile Blard, French sentiment analysis with BERT, (2020), GitHub repository, https://github.com/TheophileBlard/french-sentiment-analysis-with-bert