A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

ByteDance AI Lab, †UC Santa Barbara
NLPCC 2021


Automatic text summarization aims to produce a brief but crucial summary for the input documents. Both extractive and abstractive methods have witnessed great success in English datasets in recent years. However, there has been a minimal exploration of text summarization in other languages, limited by the lack of large-scale datasets. In this paper, we present a large-scale Chinese news summarization dataset CNewSum, which consists of 304,307 documents and human-written summaries for the news feed. It has long documents with high-abstractive summaries, which encourages document-level understanding and generation for current summarization models. An additional distinguishing feature of CNewSum is that its test set includes adequacy and deducibility annotations for the summaries. The adequacy level measures the degree of summary information covered by the document, and the deducibility indicates the reasoning ability the model needs to generate the summary. These annotations help researchers target their model performance bottleneck. We examine recent methods on CNewSum and will release our dataset after the anonymous period to provide a solid testbed for automatic Chinese summarization research.


CNewSum is a large-scale Chinese news summarization dataset, which consists of 304,307 documents and human-written summaries from Toutiao. It is a extended version of TTNews for NLPCC2017 and NLPCC2018, which is much larger and has several features:

  • The news article are collected from hundreds of thousands of news publishers. A team of expert editors are hired to provide human-written summaries for the daily news feed.
  • Human-annotated labels are provided for each example in the test set to figure out how much knowledge the model needs to generate a human- like summary.
    • Adequacy Level: Does necessary information of the summary has been included in the document?
    • Deducibility Level: Can the information of the summary be easily inferred from the document?
Dataset Information

We list the statistics of common English and Chinese summarization datasets. The 'Article' and 'Summary' are the average length of articles and summaries in the dataset. For English, it is calculated by words and for Chinese, it is calculated by characters.

Dataet Train Dev Test Total Article Summary Source
NYT 589.2k 32.7k 32.7k 654.8k 552.1 42.8 New York Times
CNNDM 287.2k 13.4k 11.4k 312.1k 791.7 55.2 CNN & Daily Mail
Newsroom 995.0k 108.8k 108.8k 1.2m 765.6 30.2 38 publishers
LCSTS 2.4m 8.7k 0.7k 2.4m 103.7 17.9 Weibo
RASG 863.8k - - 863.8k 67.1 16.6 Weibo
TTNews 50.0k - 4.0k 54.0k 747.2 36.9 Toutiao
CLTS 148.3k 20.3k 16.7k 185.3k 1363.7 58.1 ThePaper
CNewSum 275.6k 14.4k 14.4k 304.3k 730.4 35.1 Toutiao

We provide Coverage, Density and Compression to characterize our summarization dataset, which are introduced by Grusky et al.. We also compare n-gram novelties for Chinese characters.

We show one example for each Chinese summarization dataset. The overlaps between the article and the summary are underlined.

0, 4
Human Label
Model Performance

We provide some abstractive and extractive baselines for CNewSum.
Since the original summarization metric ROUGE is made only for English, we follow the method of Hu et al. and map the Chinese words to numbers.
Specifically, the Chinese text is split by characters, and the English words and numbers will be split by space. For example, “Surface Phone将装载Windows 10” will be transformed to “Surface/phone/将/装/载/windows/10” and then mapped to numeral IDs.

Lead 30.43 17.26 25.33
Oracle 46.84 30.54 40.08
TextRank 24.04 13.07 20.08
NeuSum 30.61 17.36 25.66
Transformer-ext 32.87 18.85 27.59
BERT-ext 34.78 20.33 29.34
Pointer Generator 25.70 11.05 19.62
Transformer-abs 37.36 18.62 30.62
BERT-abs 44.18 27.37 38.32
Adequacy & Deducibility

Analyzing our dataset, we find that the expert editors often perform some reasoning or add external knowledge to make the summary more friendly for the readers.

Thus, we defind two metric to evaluate how much knowledge the model needs to generate the human-like summary.


Does necessary information of the summary has been included in the document?
For example, all words in the summary can be directly found in the document, or they have synonyms or detailed descriptions in the original text. Under these circumstances, the summary is labeled as 1. Otherwise, the summary is labeled as 0.


Can the information of the summary be easily inferred from the document?
Unit conversion, number calculation, exchange rates, and name abbreviations that can be inferred are labeled as 1. In contrast, complex conclusions with no direct mentions in the original document are labeled as 0. Here are some cases:
  • Unit conversion: 4 kg -> 4000 g
  • Number Calculation: 300+1500 -> 1800
  • Exchange Rates: 300 dollar -> 1938 RMB
  • Name Abbreviation: 武汉大学 -> 武大

We carefully recheck our dataset before the release, and get CNewSum version 2 with updated annotations.

There are about 91.08% examples are adequate and deducible (A = 1 and D = 1), but the rest lack essential information. For 4.11% examples with A = 0 and D = 1, the information can be inferred from the document.

Model Category ROUGE-1 ROUGE-2 ROUGE-L
Transformer-ext A=1&D=1 33.16 19.19 27.88
A=0&D=1 30.89 15.60 25.38
A=0&D=0 28.92 14.88 23.74
Transformer-abs A=1&D=1 37.54 18.85 30.83
A=0&D=1 36.36 16.70 29.63
A=0&D=0 34.73 15.95 27.52
BERT-ext A=1&D=1 35.05 20.67 29.62
A=0&D=1 32.81 16.90 27.05
A=0&D=0 31.07 16.57 25.72
Bert-abs A=1&D=1 44.51 27.76 38.70
A=0&D=1 41.75 23.64 35.34
A=0&D=0 40.18 23.34 33.60


      author="Wang, Danqing and Chen, Jiaze and Wu, Xianze and Zhou, Hao and Li, Lei",
      editor="Wang, Lu and Feng, Yansong and Hong, Yu and He, Ruifang",
      title="CNewSum: A Large-Scale Summarization Dataset with Human-Annotated Adequacy and Deducibility Level",
      booktitle="Natural Language Processing and Chinese Computing",
      publisher="Springer International Publishing",