A collection of papers and resources across three principal components of long document summarization: Datasets, Models and Metrics.
Updates
Large language models (LLMs) can seemingly perform a limitless range of tasks, including long document summarization. Nonetheless, we believe significant performance gaps remain when dealing with long texts.
We suggest the following two papers for potential insights:
- Lost in the Middle: How Language Models Use Long Contexts [Paper]. Despite the buzzy headlines boasting of a 100K or 1B input length limit, one must question the models' actual performance. It would be of limited utility if a model could accept unlimited input (e.g., RNN) but could not effectively reason from it. The paper provides in-depth experiments, yielding numerous insights into the actual capabilities of these models.
- Unifying Large Language Models and Knowledge Graphs: A Roadmap [Paper]. Although it does not specifically discuss long document summarization, we believe it is highly relevant for addressing the limitations of LLMs across a broad range of tasks.
We maintain that the work conducted on long document summarization thus far remains highly relevant. Practitioners can draw lessons from this research to overcome the challenges arising from long texts.
Going forward, we intend to periodically update this repository for its ongoing relevance in the current era of LLMs. Stay tuned!
- An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics. Huan Yee Koh, Jiaxin Ju, Ming Liu, Shirui Pan. 2022.
ACM Comput. Surv.
paper - Automatic summarization of scientific articles: A survey Nouf Ibrahim Altmami, Mohamed El Bachir Menai. 2020.
Journal of King Saud University - Computer and Information Sciences
[paper]
- From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information Shen Gao, Xiuying Chen, Zhaochun Ren, Dongyan Zhao, Rui Yan. 2020.
IJCAI
[paper] - Neural Abstractive Text Summarization with Sequence-to-Sequence Models Tian Shi, Yaser Keneshloo, Naren Ramakrishnan, Chandan K. Reddy. 2019.
ACM/IMS Transactions on Data Science
[paper] - Neural Text Summarization: A Critical Evaluation Wojciech Kryściński, Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher. 2019.
EMNLP
[paper] - Text Summarization Techniques: A Brief Survey Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys Kochut. 2017.
IJACSA
[paper] - Recent automatic text summarization techniques: a survey Mahak Gambhir, Vishal Gupta. 2017.
Artif Intell Rev
[paper]
Dataset | Year | Title | tl;dr |
---|---|---|---|
arXiv | 2018 | A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents NAACL [Paper] |
Scientific |
PubMed | 2018 | A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents NAACL [Paper] |
Scientific |
BigPatent | 2019 | BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization ACL [Paper] |
Business/Legal |
BillSum | 2019 | BillSum: A Corpus for Automatic Summarization of US Legislation [Paper] | Legislative |
TLDR | 2020 | TLDR: Extreme Summarization of Scientific Documents ACL Findings [Paper] |
Scientific |
CORD-19 | 2020 | CORD-19: The Covid-19 Open Research Dataset ACL NLP-COVID Workshop [Paper] |
Scientific |
FacetSum | 2021 | Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents [Paper] | Scientific |
GovReport | 2021 | Efficient Attentions for Long Document Summarization NAACL [Paper] |
Legislative |
BookSum | 2021 | BookSum: A Collection of Datasets for Long-form Narrative Summarization [Paper] | General Literature |
SCROLLS | 2022 | SCROLLS: Standardized CompaRison Over Long Language Sequences [Paper] | Leaderboard |
Dataset | 2022 | SQuALITY: Building a Long-Document Summarization Dataset the Hard Way EMNLP [Paper] |
Question-focused Summarization |
Model | Year | Title | tl;dr |
---|---|---|---|
Discourse-RNN | 2018 | A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents NAACL [Paper] |
Hierarchical RNN + Sectional Bias |
Longformer | 2020 | Longformer: The Long-Document Transformer [Paper] | Transformer + Efficient Attention |
BigBird | 2020 | Big Bird: Transformers for Longer Sequences NeurIPS [Paper] |
Transformer + Efficient Attention |
FacetSum | 2021 | Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents [Paper] | Transformer + Prompt Engineering |
GSUM | 2021 | GSum: A General Framework for Guided Neural Abstractive Summarization [Paper] | Transformer + Signal Guidance |
CRTLSum | 2021 | CTRLsum: Towards Generic Controllable Text Summarization [Paper] | Transformer + Prompt Engineering |
HAT-BART | 2021 | Hierarchical Learning for Generation with Long Source Sequences [Paper] | Transformer + Hierarchical Attention |
HEPOS | 2021 | Efficient Attentions for Long Document Summarization NAACL [Paper] |
Transformer + Efficient Attention |
DeepPyramidion | 2022 | Sparsifying Transformer Models with Trainable Representation Pooling ACL [Paper] |
Transformer + Efficient Attention |
PRIMERA | 2022 | PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization ACL [Paper] |
Transformer + Multi-document Pre-training + Efficient Attention |
HIBRIDS | 2022 | HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization ACL [Paper] |
Transformer + Discourse Bias Attention |
LongT5 | 2022 | LongT5: Efficient Text-To-Text Transformer for Long Sequences NAACL [Paper] |
Transformer + Long Document Pre-training + Efficient Attention |
ECC | 2022 | Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control NAACL Findings [Paper] |
Transformer + Factuality-Aware Fine-tuning |
PEGASUS-X | 2022 | Investigating Efficiently Extending Transformers for Long Input Summarization [Paper] | Transformer + Efficient Attention |
Model | Year | Title | tl;dr |
---|---|---|---|
GL-LSTM | 2019 | Extractive Summarization of Long Documents by Combining Global and Local Context EMNLP-IJCNLP [Paper] |
Hierarchical RNN + Sectional Bias |
Sent-CLF/PTR | 2019 | On extractive and abstractive neural document summarization with transformer language models EMNLP [Paper] |
Hierarchical RNN |
Topic-GraphSum | 2020 | Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks COLING [Paper] |
Graph Attention Network + Topic Modelling |
SSN-DM | 2021 | Sliding Selector Network with Dynamic Memory for Extractive Summarization of Long Documents NAACL [Paper] |
Memory Network |
MemSum | 2022 | MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes ACL [Paper] |
RL-based extractor via Multi-step Episodic MDP |
HiStruct+ | 2022 | HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information ACL Findings [Paper] |
Transformer + Discourse Bias Embeddings |
TSTR | 2022 | TSTR: Too Short to Represent, Summarize with Details! Intro-Guided Extended Summary Generation NAACL [Paper] |
Transformer + Signal Guidance |
Model | Year | Title | tl;dr |
---|---|---|---|
TLM+Ext | 2019 | Extractive Summarization of Long Documents by Combining Global and Local Context EMNLP-IJCNLP [Paper] |
Extract-then-Summarize |
DANCER | 2019 | A Divide-and-Conquer Approach to the Summarization of Long Documents IEEE/ACM Transactions on Audio, Speech, and Language Processing [Paper] |
Summarize-then-Combine |
SEAL | 2020 | SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization [Paper] | Extract-then-Summarize |
LoBART | 2021 | Long-Span Summarization via Local Attention and Content Selection ACL [Paper] |
Extract-then-Summarize |
DYLE | 2022 | DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization ACL [Paper] |
Extract-then-Summarize |
Summ^N | 2022 | Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents ACL [Paper] |
Summarize-then-Summarize |
Model | Year | Title | tl;dr |
---|---|---|---|
SciSummPip | 2020 | Monash-Summ@LongSumm 20 SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline Proceedings of the First Workshop on Scholarly Document Processing [Paper] |
Multi-sentence Compression |
Model | Year | Title | tl;dr |
---|---|---|---|
PacSum | 2019 | Sentence Centrality Revisited for Unsupervised Summarization ACL [Paper] |
Graph Centrality Scoring |
HipoRank | 2020 | Discourse-Aware Unsupervised Summarization of Long Scientific Documents EACL [Paper] |
Graph Centrality Scoring |
FAR | 2021 | Facet-Aware Evaluation for Extractive Summarization ACL-IJCNLP Findings [Paper] |
Graph Centrality Scoring |
IBsumm | 2021 | Leveraging Information Bottleneck for Scientific Document Summarization EMNLP Findings [Paper] |
Pipeline Approach |
OTExtSum | 2022 | OTExtSum: Extractive Text Summarisation with Optimal Transport NAACL Findings [Paper] |
Optimal Transport Extraction |
Metrics listed below are not specific to long document summarization (most summarization metrics are studied under a short/normal summarization setting). Our EMNLP 2022 work investigated these metrics under long document setting: How Far are We from Robust Long Abstractive Summarization?
Metric | Year | Title | tl;dr |
---|---|---|---|
ROUGE | 2004 | ROUGE: A Package for Automatic Evaluation of Summaries ACL [Paper] |
Hard Lexical Matching |
BERTScore | 2019 | BERTScore: Evaluating Text Generation with BERT ICLR [Paper] |
Soft Lexical Matching |
BARTScore | 2021 | BARTScore: Evaluating Generated Text as Text Generation NeurIPS [Paper] |
Conditional Text Generation |
Metric | Year | Title | tl;dr |
---|---|---|---|
OpenIE | 2019 | Assessing The Factual Accuracy of Generated Text KDD [Paper] |
Semantic Matching |
FactCC | 2019 | Evaluating the Factual Consistency of Abstractive Text Summarization EMNLP [Paper] |
Data Augmentation + Textual Entailment |
DAE | 2020 | Evaluating Factuality in Generation with Dependency-level Entailment EMNLP Findings [Paper] |
Textual Entailment |
FEQA | 2020 | FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization ACL [Paper] |
Question-Answering |
QAGS | 2020 | Asking and Answering Questions to Evaluate the Factual Consistency of Summaries ACL [Paper] |
Question-Answering |
TE-MNLI | 2020 | On Faithfulness and Factuality in Abstractive Summarization ACL [Paper] |
Textual Entailment |
BARTScore | 2021 | BARTScore: Evaluating Generated Text as Text Generation NeurIPS [Paper] |
Conditional Text Generation |
CoCo | 2021 | Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation EMNLP Findings [Paper] |
Causal Inference |
Q^2 | 2021 | Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering EMNLP [Paper] |
Question-Answering |
QUAL | 2021 | Improving Factual Consistency of Abstractive Summarization via Question Answering ACL-IJCNLP [Paper] |
Question-Answering |
SummaC | 2021 | SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization TACL [Paper] |
Textual Entailment |
FactGraph | 2022 | FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations NAACL [Paper] |
Knowledge Graph |
FalseSum | 2022 | Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization NAACL [Paper] |
Data Augmentation + Textual Entailment |
QAFactEval | 2022 | QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization NAACL [Paper] |
Question-Answering |
MFMA | 2022 | Masked Summarization to Generate Factually Inconsistent Summaries for Improved Factual Consistency Checking NAACL Findings [Paper] |
Data Augmentation + Textual Entailment |
Papers that provide insightful discussions related to long document summarization.
Topic | Year | Title | tl;dr |
---|---|---|---|
Metrics | 2020 | Re-evaluating Evaluation in Text Summarization EMNLP [Paper] |
On Effectiveness of Summarization Models and Metrics |
Models | 2022 | DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization ACL [Paper] |
Insighful Approach+Discussion into Extract-then-Summarize Models |
Models | 2022 | Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization ACL [Paper] |
Findings of abstractiveness v.s. factuality of abstractive models |
Models | 2022 | Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization ACL [Paper] |
Hallucinated texts (i.e., facts not in doc) often factual |
Models | 2022 | Training Dynamics for Text Summarization Models ACL [Paper] |
Fine-tuning affects generation strategies |
Models | 2022 | Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data ACL [Paper] |
REtrieving from the traINing datA (REINA) leads to Performance Gain |
Models | 2022 | Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models ACL [Paper] |
Model Efficiency (Training Cost; Train/Inference Speed) vs. Performance |
Models | 2022 | FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization NAACL [Paper] |
Factuality-Aware PEGASUS Pre-training |
Models | 2022 | Are Abstractive Summarization Models truly ‘Abstractive’? [Paper] | On Abstractiveness of Abstractive Model |
Metrics | 2022 | How Far are We from Robust Long Abstractive Summarization? [Paper] | On Metrics under Long Document Summarization Setting |
Topic | Year | Title | tl;dr |
---|---|---|---|
Efficient Attention | 2020 | Efficient Transformer: A Survey ACM Comput. Surv. [Paper] |
Practicality of Efficient Attention (section 4.4) |
Fine-tuning | 2022 | A Closer Look at How Fine-tuning Changes BERT ACL [Paper] |
How Representation Changes after Fine-tuning |
Text Generation | 2022 | Language modeling via stochastic processes ICLR [Paper] |
Text Generation via Latent Stochastic Process |
Efficient Attention | 2022 | Simple Local Attentions Remain Competitive for Long-Context Tasks NAACL [Paper] |
Local Window Attention remains Competitive |
Scaling Language Model | 2022 | Improving language models by retrieving from trillions of tokens ICML [Paper] |
Retrieve from Two Trillion Tokens Database, then Generate (See REINA above) |
Our survey work has been accepted by ACM Computing Surveys. For more information, please visit An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics.
Folders for metrics used to analyze intrinsic characteristics of dataset, metrics used to analyze model outputs and arXiv human annotated data are available. Will be uploaded soon. Please do not hesitate to contact me.
@article{10.1145/3545176,
author = {Koh, Huan Yee and Ju, Jiaxin and Liu, Ming and Pan, Shirui},
title = {An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics},
year = {2022},
month = {jun},
publisher = {Association for Computing Machinery},
journal = {ACM Comput. Surv.},
address = {New York, NY, USA},
issn = {0360-0300},
url = {https://doi.org/10.1145/3545176},
doi = {10.1145/3545176},
keywords = {datasets, neural networks, document summarization, language models, Transformer}
}