• Stars
    star
    304
  • Rank 137,274 (Top 3 %)
  • Language
    Python
  • License
    Creative Commons ...
  • Created over 2 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

JGLUE: Japanese General Language Understanding Evaluation

JGLUE: Japanese General Language Understanding Evaluation

JGLUE, Japanese General Language Understanding Evaluation, is built to measure the general NLU ability in Japanese. JGLUE has been constructed from scratch without translation. We hope that JGLUE will facilitate NLU research in Japanese.

JGLUE has been constructed by a joint research project of Yahoo Japan Corporation and Kawahara Lab at Waseda University.

Tasks/Datasets

JGLUE consists of the tasks of text classification, sentence pair classification, and QA. Each task consists of multiple datasets. Each dataset can be found under the datasets directory. Only train/dev sets are available now, and the test set will be available after the leaderboard is made public. We use Yahoo! Crowdsourcing for all crowdsourcing tasks in constructing the datasets.

Task Dataset Train Dev Test
Text Classification MARC-ja 187,528 5,654 5,639
JCoLA† - - -
Sentence Pair Classification JSTS 12,451 1,457 1,589
JNLI 20,073 2,434 2,508
QA JSQuAD 62,859 4,442 4,420
JCommonsenseQA 8,939 1,119 1,118

†JCoLA will be added soon.

Dataset Description

MARC-ja

MARC-ja is a dataset of the text classification task. This dataset is based on the Japanese portion of Multilingual Amazon Reviews Corpus (MARC) (Keung+, 2020).

We performed the following modifications to the original dataset:

  1. To make it easy for both humans and computers to judge a class label, we cast the text classification task as a binary classification task, where 1 and 2-star ratings are converted to negative, and 4 and 5 are converted to positive. We do not use reviews with a 3-star rating.
  2. There are some instances where the rating diverges from a review text. To improve the quality of the dev/test instances, we crowdsource a positive/negative judgment task, adopt only the reviews with the same votes from seven or more out of 10 workers and assign a label of the maximum votes to these reviews.

We don't distribute the dataset itself. Please download the original dataset, and run a conversion script as follows:

  1. Download https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_multilingual_JP_v1_00.tsv.gz
  2. Run the following commands:
$ pip install -r preprocess/requirements.txt
$ cd preprocess/marc-ja/scripts
$ gzip -dc /somewhere/amazon_reviews_multilingual_JP_v1_00.tsv.gz | \
  python marc-ja.py \
         --positive-negative \
         --output-dir ../../../datasets/marc_ja-v1.1 \
         --max-char-length 500 \
         --filter-review-id-list-valid ../data/filter_review_id_list/valid.txt \
         --label-conv-review-id-list-valid ../data/label_conv_review_id_list/valid.txt

The train and valid sets will be generated under the datasets/marc_ja-v1.1 directory.

When you use this dataset, please follow the license of Multilingual Amazon Reviews Corpus (MARC).

JSTS

JSTS is a Japanese version of the STS (Semantic Textual Similarity) dataset. STS is a task to estimate the semantic similarity of a sentence pair. The sentences in JSTS and JNLI (described below) are extracted from the Japanese version of the MS COCO Caption Dataset, the YJ Captions Dataset (Miyazaki and Shimizu, 2016).

{"sentence_pair_id": "691",
 "yjcaptions_id": "127202-129817-129818",
 "sentence1": "街中の道路を大きなバスが走っています。 (A big bus is running on the road in the city.)", 
 "sentence2": "道路を大きなバスが走っています。 (There is a big bus running on the road.)", 
 "label": 4.4}

(Note that English translations are added in this example for those who do not understand Japanese, and are not included in the dataset.)

Name Description
sentence_pair_id id
yjcaptions_id sentence ids in yjcaptions (explained below)
sentence1 first sentence
sentence2 second sentence
label sentence similarity: 5 (equivalent meaning) - 0 (completely different meaning)

Explanation for yjcaptions_id

There are the following two cases:

  1. sentence pairs in one image: (image id)-(sentence1 id)-(sentence2 id)
    • e.g., 723-844-847
    • a sentence id starting with "g" means a sentence generated by a crowdworker (e.g., 69501-75698-g103): only for JNLI
  2. sentence pairs in two images: (image id of sentence1)_(image id of sentence2)-(sentence1 id)-(sentence2 id)
    • e.g., 91337_217583-96105-91680

JNLI

JNLI is a Japanese version of the NLI (Natural Language Inference) dataset. NLI is a task to recognize the inference relation that a premise sentence has to a hypothesis sentence. The inference relations are entailment, contradiction, and neutral.

{"sentence_pair_id": "1157",
 "yjcaptions_id": "127202-129817-129818",
 "sentence1": "街中の道路を大きなバスが走っています。 (A big bus is running on the road in the city.)", 
 "sentence2": "道路を大きなバスが走っています。 (There is a big bus running on the road.)", 
 "label": "entailment"}
Name Description
sentence_pair_id id
yjcaptions_id sentence ids in yjcaptions
sentence1 premise sentence
sentence2 hypothesis sentence
label inference relation

JSQuAD

JSQuAD is a Japanese version of SQuAD (Rajpurkar+, 2016), one of the datasets of reading comprehension. Each instance in the dataset consists of a question regarding a given context (Wikipedia article) and its answer. JSQuAD is based on SQuAD 1.1 (there are no unanswerable questions). We used the Japanese Wikipedia dump as of 20211101.

The json format is the same as the original SQuAD.

    {
      "title": "東海道新幹線 (Tokaido Shinkansen)",
      "paragraphs": [
        {
          "qas": [
            {
              "question": "2020年(令和2年)3月現在、東京駅 - 新大阪駅間の最高速度はどのくらいか。 (What is the maximum speed between Tokyo Station and Shin-Osaka Station as of March 2020?)",
              "id": "a1531320p0q0",
              "answers": [
                {
                  "text": "285 km/h",
                  "answer_start": 182
                }
              ],
              "is_impossible": false
            },
            {
             .. 
            }
          ],
          "context": "東海道新幹線 [SEP] 1987年(昭和62年)4月1日の国鉄分割民営化により、JR東海が運営を継承した。西日本旅客鉄道(JR西日本)が継承した山陽新幹線とは相互乗り入れが行われており、東海道新幹線区間のみで運転される列車にもJR西日本所有の車両が使用されることがある。2020年(令和2年)3月現在、東京駅 - 新大阪駅間の所要時間は最速2時間21分、最高速度285 km/hで運行されている。"
        }
      ]
    }
Name Description
title title of a Wikipedia article
paragraphs a set of paragraphs
qas a set of pairs of a question and its answer
question question
id id of a question
answers a set of answers
text answer text
answer_start start position (character index)
is_impossible all the values are false
context a concatenation of the title and paragraph

JCommonsenseQA

JCommonsenseQA is a Japanese version of CommonsenseQA (Talmor+, 2019), which is a multiple-choice question answering dataset that requires commonsense reasoning ability. It is built using crowdsourcing with seeds extracted from the knowledge base ConceptNet.

{"q_id": 3016,
 "question": "会社の最高責任者を何というか? (What do you call the chief executive officer of a company?)",
 "choice0": "社長 (president)",
 "choice1": "教師 (teacher)",
 "choice2": "部長 (manager)",
 "choice3": "バイト (part-time worker)",
 "choice4": "部下 (subordinate)",
 "label": 0}
Name Description
q_id id
question question
choice{0..4} choice
label correct choice id

Baseline Scores

The following foundation models are used for the evaluation.

Model Basic Unit Pretraining Texts
Tohoku BERT base subword
(MeCab + BPE)
Japanese Wikipedia
Tohoku BERT base (char) character Japanese Wikipedia
NICT BERT base subword
(MeCab + BPE)
Japanese Wikipedia
Waseda RoBERTa base subword
(Juman++ + Unigram LM)
Japanese Wikipedia + CC
XLM RoBERTa base subword
(Unigram LM)
multi-lingual CC

Note that the large-sized models are also used corresponding to Tohoku BERT base, Waseda RoBERTa base and XLM RoBERTa base. For Waseda RoBERTa large, the following two versions with different maximum sequence lengths are used: Waseda RoBERTa large (s128) and Waseda RoBERTa large (s512).

When you use NICT BERT base or Waseda RoBERTa base models, the dataset text should be segmented into words by the following corresponding morphological analyzer in advance:

  • NICT BERT base: MeCab (0.996) with JUMAN dictionary
  • Waseda RoBERTa base: Juman++ (2.0.0-rc3)

Please refer to preprocess/morphological-analysis/README.md.

The fine-tuning was performed using the transformers library provided by Hugging Face. See fine-tuning/README.md for details.

The performance along with human scores on the JGLUE dev set is shown below.

Model MARC-ja JSTS JNLI JSQuAD JCommonsenseQA
acc Pearson/Spearman acc EM/F1 acc
Human 0.989 0.899/0.861 0.925 0.871/0.944 0.986
Tohoku BERT base 0.958 0.909/0.868 0.899 0.871/0.941 0.808
Tohoku BERT base (char) 0.956 0.893/0.851 0.892 0.864/0.937 0.718
Tohoku BERT large 0.955 0.913/0.872 0.900 0.880/0.946 0.816
NICT BERT base 0.958 0.910/0.871 0.902 0.897/0.947 0.823
Waseda RoBERTa base 0.962 0.913/0.873 0.895 0.864/0.927 0.840
Waseda RoBERTa large (s128) 0.954 0.930/0.896 0.924 0.884/0.940 0.907
Waseda RoBERTa large (s512) 0.961 0.926/0.892 0.926 0.918/0.963 0.891
XLM RoBERTa base 0.961 0.877/0.831 0.893 -/-† 0.687
XLM RoBERTa large 0.964 0.918/0.884 0.919 -/-† 0.840

†XLM RoBERTa base/large models use the unigram language model as a tokenizer and they are excluded from the JSQuAD evaluation because the token delimitation and the start/end of the answer span often do not match, resulting in poor performance.

Leaderboard

A leaderboard will be made public soon. The test set will be released at that time.

Reference

@inproceedings{kurihara-etal-2022-jglue,
    title = "{JGLUE}: {J}apanese General Language Understanding Evaluation",
    author = "Kurihara, Kentaro  and
      Kawahara, Daisuke  and
      Shibata, Tomohide",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.317",
    pages = "2957--2966",
    abstract = "To develop high-performance natural language understanding (NLU) models, it is necessary to have a benchmark to evaluate and analyze NLU ability from various perspectives. While the English NLU benchmark, GLUE, has been the forerunner, benchmarks are now being released for languages other than English, such as CLUE for Chinese and FLUE for French; but there is no such benchmark for Japanese. We build a Japanese NLU benchmark, JGLUE, from scratch without translation to measure the general NLU ability in Japanese. We hope that JGLUE will facilitate NLU research in Japanese.",
}

@InProceedings{Kurihara_nlp2022,
  author = 	"栗原健太郎 and 河原大輔 and 柴田知秀",
  title = 	"JGLUE: 日本語言語理解ベンチマーク",
  booktitle = 	"言語処理学会第28回年次大会",
  year =	"2022",
  url = "https://www.anlp.jp/proceedings/annual_meeting/2022/pdf_dir/E8-4.pdf"
  note= "in Japanese"
}

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Creative Commons License

Contributor License Agreement

This project requires contributors to accept the terms in the Contributor License Agreement (CLA).

Please note that contributors to the JGLUE repository on GitHub (https://github.com/yahoojapan/JGLUE) shall be deemed to have accepted the CLA without individual written agreements.

More Repositories

1

NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
C++
1,253
star
2

objc2swift

Open Source Obj-C to Swift Converter.
Scala
1,033
star
3

SwiftyXMLParser

Simple XML Parser implemented in Swift
Swift
575
star
4

UICollectionViewSplitLayout

UICollectionViewSplitLayout makes collection view more responsive.
Swift
243
star
5

AnnexML

AnnexML is a multi-label classifier designed for extremely large label space.
C++
106
star
6

yskip

Incremental Skip-gram Model with Negative Sampling
Shell
69
star
7

yosegi

Yosegi is a Schema-less columnar storage format. Provide flexible representation like JSON and efficient reading similar to other columnar storage formats.
Java
66
star
8

XCMetricsAggregator

Automation tool for Xcode Metrics Organizer with AppleScript
Ruby
62
star
9

YJCaptions

60
star
10

bakusoku-jsonp

Codeless Blog Widgets framework
JavaScript
60
star
11

AppFeedback-ios

📸 You can post feedback messages and screenshots to Slack from your iOS app! 🎥
Objective-C
42
star
12

ngtd

Serving NGT over HTTP or gRPC ※This project is not maintained. We have moved to a new product, [Vald](https://vald.vdaas.org) .
Go
38
star
13

k2hash

K2HASH - NoSQL Key Value Store(KVS) library
C++
37
star
14

authorization-proxy

Moved to https://github.com/AthenZ/authorization-proxy
Go
35
star
15

presto_exporter

Go
34
star
16

k2hftfuse

File transaction by FUSE-based file system
C++
32
star
17

gongt

NGT Go client library
Go
28
star
18

fullock

Fast User Level LOCK library
C++
26
star
19

yjlogin-ios-sdk

Yahoo! JAPAN Login iOS SDK
Swift
26
star
20

ja-vg-vqa

26
star
21

gpu-monitoring-exporter

Prometheus exporter for GPU process metrics.
Shell
26
star
22

jenkins-with-docker-demo

Shell
25
star
23

geobleu

Python implementation of GEO-BLEU, a similarity evaluation method for trajectories
Python
22
star
24

lcom4go

Compute LCOM4, Lack of Cohesion of Methods metrics ver.4, for golang projects.
Go
21
star
25

yconnect-php-sdk

YConnect PHP SDK
PHP
21
star
26

vespa-tutorial

Japanese tutorial for Vespa
Shell
20
star
27

AppFeedback-android

📸 You can post feedback messages and screenshots to Slack from your Android app! 🎥
Java
20
star
28

presto-audit

THIS REPOSITORY IS DEPRECATED
Java
19
star
29

garm

Garm is k8s authorization webhook (SubjectAccessReview API) server for Athenz. Moved to https://github.com/AthenZ/garm
Go
17
star
30

chmpx

Consistent Hashing Mq inProcess data eXchange
C++
17
star
31

docker-continuous-integration-workflow

2014/02/12 Docker Meetup in Tokyo #1 での発表内容です。
Ruby
17
star
32

MultitaskingSample

iOS 7の新機能、BackgroundFetch, SilentPushNotification, BackgroundTransferを利用したサンプルコードです。
Objective-C
16
star
33

athenz-authorizer

athenz policy management library for golang. Moved to https://github.com/AthenZ/athenz-authorizer
Go
15
star
34

athenz-client-sidecar

Moved to https://github.com/AthenZ/athenz-client-sidecar
Go
15
star
35

vespa-kuromoji-linguistics

Java
15
star
36

k2hdkc

k2hdkc is k2hash based distributed kvs cluster
C++
13
star
37

big3store

Erlang
12
star
38

textwebapi-cookbook

Cookbook for the Text Analysis Web API provided by Yahoo! DEVELOPER NETWORK.
Jupyter Notebook
12
star
39

VFD-Dataset

Python
11
star
40

k2htp_dtor

K2HASH Distributed Transaction Of Repeater
C++
10
star
41

solr-plugin-samples

Java
9
star
42

VSU-Dataset

8
star
43

yconnect-servlet-sdk

YConnect Servlet SDK
Java
8
star
44

DynamicsSample

iOS 7の新機能、UIKit Dynamics、Motion Effectsを利用したサンプルコードです。
Objective-C
6
star
45

ConfigCacheBundle

Symfony ConfigCacheBundle for easier handling of user-defined configuration file cache
PHP
6
star
46

AntPickax

AntPickax provides basic libraries, components and systems
6
star
47

yjlogin-android-sdk

Kotlin
5
star
48

chmpx_nodejs

CHMPX nodejs addon library - Consistent Hashing Mq inProcess data eXchange
C++
5
star
49

k2hr3

K2HR3 - K2Hdkc based Resource and Roles and policy Rules
5
star
50

yosegi-spark

Java
5
star
51

hubot-shuffle

hubot-shuffle add shuffle system.
CoffeeScript
5
star
52

yosegi-hive

This is Yosegi's Hive plugin. This can write and read tables with Hive.
Java
5
star
53

k2hr3_osnl

K2HR3 OpenStack Notification Listener - K2Hdkc based Resource and Roles and policy Rules
Python
4
star
54

embulk-output-solr

Java
4
star
55

fastlane-plugin-setup_app_feedback_sdk

Fastlane plugin that update Info.plist for AppFeedback SDK
Ruby
4
star
56

k2hdkc_dbaas

Database as a Service for K2HDKC
Python
4
star
57

k2hash_phpext

PHP Extension library for K2HASH
C
4
star
58

k2hr3_utils

K2HR3 Utils - Utils for K2Hdkc based Resource and Roles and policy Rules
Shell
4
star
59

k2hr3_app

K2HR3 Web Application - K2Hdkc based Resource and Roles and policy Rules
JavaScript
4
star
60

k2hr3_api

K2HR3 REST API - K2Hdkc based Resource and Roles and policy Rules
JavaScript
4
star
61

k2htp_mdtor

K2Hash Transaction Plugin for Multiple Distributed Transaction Of Repeater
Shell
4
star
62

k2hr3_helm_chart

Helm Chart for K2HR3
Shell
3
star
63

k2hdkc_java

K2HDKC Java library - k2hash based distributed kvs cluster
Java
3
star
64

k2hdkc_go

K2HDKC Go library - k2hash based distributed kvs cluster
Go
3
star
65

yosegi-tools

Java
3
star
66

k2hash_go

K2HASH Go library - NoSQL Key Value Store(KVS) library
Go
3
star
67

yj-ci-dataset

3
star
68

k2hr3_cli

K2HR3 Command Line Interface
Shell
3
star
69

embulk-parser-xml2

Java
3
star
70

k2hr3_sidecar

K2HR3 Container Registration Sidecar - K2Hdkc based Resource and Roles and policy Rules
Shell
3
star
71

k2hdkc_python

K2HDKC Python library - k2hash based distributed kvs cluster
Python
3
star
72

k2hash_python

K2HASH Python library - NoSQL Key Value Store(KVS) library
Python
3
star
73

yosegi-hadoop

Java
3
star
74

k2hdkc_nodejs

K2HDKC nodejs addon library - k2hash based distributed kvs cluster
JavaScript
3
star
75

k2hash_nodejs

K2HASH nodejs addon library - NoSQL Key Value Store(KVS) nodejs library
JavaScript
3
star
76

k2hash_java

K2HASH Java library - NoSQL Key Value Store(KVS) library
Java
3
star
77

yosegi-avro

Java
2
star
78

k2hdkc_dbaas_override_conf

K2HDKC DBaaS Override Configuration
Shell
2
star
79

k2hdkc_dbaas_k8s_cli

K2HDKC DBaaS on kubernetes Command Line Interface - K2HR3 CLI Plugin
Shell
2
star
80

k2hr3_get_resource

K2HR3 Utilities - Get K2HR3 Resource Helper for Systemd service
Shell
2
star
81

k2hdkc_dbaas_cli

K2HDKC DBaaS Command Line Interface - K2HR3 CLI Plugin
Shell
2
star
82

hubot-package-version-release

publish release on GitHub based package.json
CoffeeScript
2
star
83

k2hdkc_helm_chart

Helm Chart for K2HDKC DBaaS
Shell
2
star
84

k2hr3client_python

k2hr3client_python is an official Python WebAPI client for k2hr3.
Python
2
star
85

k2hdkc_phpext

PHP Extension library for K2HDKC
PHP
1
star
86

yosegi-example

Java
1
star
87

chmpx_phpext

PHP Extension library for CHMPX
PHP
1
star
88

yosegi-legacy

Java
1
star