KaoKore Dataset
Dataset History
We are keeping expanding the dataset. Besides adding more images, all other settings remain the same.The update history is:
- Version
1.3
: Exapnded to9683
images. Most recent version. - Version
1.2
: Exapnded to8848
images. - Version
1.1
: Exapnded to8573
images. - Version
1.0
:5552
images. The initial relase.
Note that the classification and the generative results here and in the paper correspond to the version 1.0
of our dataset.
The Dataset
KaoKore is a novel dataset of face images from Japanese illustrations along with multiple labels for each face, derived from the Collection of Facial Expressions.
KaoKore dataset is build based on the Collection of Facial Expressions, which results from an effort by the ROIS-DS Center for Open Data in the Humanities (CODH) that has been publicly available since 2018. It provides a dataset of cropped face images extracted from Japanese artworks publicly available from National Institute of Japanese Literature, Kyoto University Rare Materials Digital Archive and Keio University Media Center from the Late Muromachi Period (16th century) to the Early Edo Period (17th century) to facilitate research into art history, especially the study of artistic style. It also provides corresponding metadata annotated by researchers with domain expertise.
KaoKore dataset contains image files, each being an color (RGB) image of size 256 x 256
as well as two sets of labels gender and social status. The most recent version contains 8848
images.
Example of the KaoKore dataset, showing various faces in diverse yet coherent artisticstyles.
Labels (labels.csv) available in the dataset along with exemplary images belonging to each labels.
💾
Get the data python3 download.py
download KaoKore datasets. The default setting downloads the initial version 1.0
of the dataset. To try newer versions (e.g. 1.3
), please use python3 download.py --dataset_version 1.3
. For version numbers plese refer to Dataset History above.
Also, see the output of download.py --help
for more details.
It is known that some conda installations may have trouble looking for SSL certificates. If that is the case, you could use download.py --ssl_unverified_context
, at your own risk and only if you know what you are doing, to disable the certificate verification. The default downlaod concurrency --threads 4
can be adjusted if needed (e.g. if it is too high for some network/machines, please try a lower one)
Please note that we intentionally did not include image data into the dataset so that image providers can check which images are used. We request not to create a derived dataset including image data for user's convenience.
The Data Loaders
Data loaders for Pytorch and TensorFlow are available in code folder.
📈
Benchmarks & Results We provide quantitative results on the supervised machine learning tasks of gender and social status prediction from KaoKore images. (Keras classification code is available in code folder)
Have more results to add to the table? Feel free to submit an issue or pull request! (update the link****)
Model | Gender | Status | Credit |
---|---|---|---|
VGG11 | 92.03% | 78.74% | alantian |
AlexNet | 91.27% | 78.93% | alantian |
ResNet-18 | 92.98% | 82.16% | alantian |
ResNet-34 | 93.55% | 84.82% | alantian |
MobileNet-v2 | 95.06% | 82.35% | alantian |
DenseNet-121 | 94.31% | 79.70% | alantian |
Inception-v3 | 96.58% | 84.25% | alantian |
Generative Models Demo Videos
Please download generative model demo video from here.
Citing KaoKore dataset
If you use any of the Kaokore datasets in your work, we would appreciate a reference to our paper:
KaoKore dataset etc. Yingtao Tian et al. arXiv:2002.08595 update the link
@inproceedings{tian2020kaokore,
title = "{KaoKore: A Pre-modern Japanese Art Facial Expression Dataset}",
author = {Yingtao Tian and Chikahiko Suzuki and Tarin Clanuwat and Mikel Bober-Irizar and Alex Lamb and Asanobu Kitamoto},
booktitle = "Proceedings of the International Conference on Computational Creativity",
year = "2020",
pages = "415--422"
}
License
Both the dataset itself and the contents of this repo are licensed under a permissive CC BY-SA 4.0 license, except where specified within some benchmark scripts. CC BY-SA 4.0 license requires attribution, and we would suggest to use the following attribution to the KaoKore dataset.
"KaoKore Dataset" (collected by CODH from multiple organizations), doi:10.20676/00000353