CLOTH Dataset

The CLOTH dataset is a large-scale CLOze test dataset created by TeacHers. It is collected from English examinations that are created for middle school and high school students.

Report your results: If you have new results, please send Qizhe ( or Guokun ( an email with the link to your paper!


Model Report Time Institute CLOTH CLOTH-M CLOTH-H
BERT-LARGE* Dec. 2018 CMU 86.0 88.7 85.0
Amazon Mechanical Turker Nov. 2017 CMU 85.9 89.7 84.5
BERT-BASE* Dec. 2018 CMU 82.0 85.0 80.9
One Billion Word Language Model Nov. 2017 CMU 70.7 74.5 69.3
SemiMPNet-ngram Aug. 2018 Yuanfudao Research &
Peking Univ.
60.9 67.6 58.3
Representativeness Model Nov. 2017 CMU 58.3 67.3 54.9
Language Model Nov. 2017 CMU 54.8 64.6 50.6
MPNet-ngram Aug. 2018 Yuanfudao Research &
Peking Univ.
50.1 53.2 49.0
Stanford Attentive Reader* Nov. 2017 CMU 48.7 52.9 47.1
LSTM Nov. 2017 CMU 48.4 51.8 47.1

* : The link does not point to the model paper, but the paper that tests the corresponding model on CLOTH.

Why CLOTH is more challenging and interesting?

On a personal note: please think about college and high school entrance tests you or your children have experienced, e.g., SAT or college entrance exams (高考), which are purposely designed to differentiate smart and hard-working students from others. Questions in CLOTH were created to prepare Chinese students for the college entrance test and high school entrance tests :)

Useful resources

Baseline code
Dataset paper