LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarization
<p>Sequence-to-sequence neural networks have recently achieved significant success in abstractive summarization, especially through fine-tuning large pre-trained language models on downstream datasets. However, these models frequently suffer from exposure bias, which can impair their performan...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1864513543151288320 |
|---|---|
| author | Eman Aloraini (21797867) |
| author2 | Hozaifa Kassab (21797870) Ali Hamdi (13432680) Khaled Shaban (20074425) |
| author2_role | author author author |
| author_facet | Eman Aloraini (21797867) Hozaifa Kassab (21797870) Ali Hamdi (13432680) Khaled Shaban (20074425) |
| author_role | author |
| dc.creator.none.fl_str_mv | Eman Aloraini (21797867) Hozaifa Kassab (21797870) Ali Hamdi (13432680) Khaled Shaban (20074425) |
| dc.date.none.fl_str_mv | 2025-07-05T09:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1016/j.neucom.2025.130816 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/LexiSem_A_re-ranker_balancing_lexical_and_semantic_quality_for_enhanced_abstractive_summarization/29655764 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Information and computing sciences Artificial intelligence Distributed computing and systems software Machine learning Abstractive summarization Re-ranking Lexical quality Semantic quality Deep learning |
| dc.title.none.fl_str_mv | LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarization |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p>Sequence-to-sequence neural networks have recently achieved significant success in abstractive summarization, especially through fine-tuning large pre-trained language models on downstream datasets. However, these models frequently suffer from exposure bias, which can impair their performance. To address this, re-ranking systems have been introduced, but their potential remains underexplored despite some demonstrated performance gains. Most prior work relies on ROUGE scores and aligned candidate summaries for ranking, exposing a substantial gap between semantic similarity and lexical overlap metrics. In this study, we demonstrate that a second-stage model can be trained to re-rank a set of summary candidates, significantly enhancing performance. Our novel approach leverages a re-ranker that balance lexical and semantic quality. Additionally, we introduce a new strategy for defining negative samples in ranking models. Through experiments on the CNN/DailyMail, XSum and Reddit TIFU datasets, we show that our method effectively estimates the semantic content of summaries without compromising lexical quality. In particular, our method sets a new performance benchmark on the CNN/DailyMail dataset (48.18 R1, 24.46 R2, 45.05 RL) and on Reddit TIFU (30.37 R1,RL 23.87).</p><h2>Other Information</h2> <p> Published in: Neurocomputing<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.neucom.2025.130816" target="_blank">https://dx.doi.org/10.1016/j.neucom.2025.130816</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_0333aa3c465b3c9dad28b58b9cd0b59e |
| identifier_str_mv | 10.1016/j.neucom.2025.130816 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/29655764 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarizationEman Aloraini (21797867)Hozaifa Kassab (21797870)Ali Hamdi (13432680)Khaled Shaban (20074425)Information and computing sciencesArtificial intelligenceDistributed computing and systems softwareMachine learningAbstractive summarizationRe-rankingLexical qualitySemantic qualityDeep learning<p>Sequence-to-sequence neural networks have recently achieved significant success in abstractive summarization, especially through fine-tuning large pre-trained language models on downstream datasets. However, these models frequently suffer from exposure bias, which can impair their performance. To address this, re-ranking systems have been introduced, but their potential remains underexplored despite some demonstrated performance gains. Most prior work relies on ROUGE scores and aligned candidate summaries for ranking, exposing a substantial gap between semantic similarity and lexical overlap metrics. In this study, we demonstrate that a second-stage model can be trained to re-rank a set of summary candidates, significantly enhancing performance. Our novel approach leverages a re-ranker that balance lexical and semantic quality. Additionally, we introduce a new strategy for defining negative samples in ranking models. Through experiments on the CNN/DailyMail, XSum and Reddit TIFU datasets, we show that our method effectively estimates the semantic content of summaries without compromising lexical quality. In particular, our method sets a new performance benchmark on the CNN/DailyMail dataset (48.18 R1, 24.46 R2, 45.05 RL) and on Reddit TIFU (30.37 R1,RL 23.87).</p><h2>Other Information</h2> <p> Published in: Neurocomputing<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.neucom.2025.130816" target="_blank">https://dx.doi.org/10.1016/j.neucom.2025.130816</a></p>2025-07-05T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.neucom.2025.130816https://figshare.com/articles/journal_contribution/LexiSem_A_re-ranker_balancing_lexical_and_semantic_quality_for_enhanced_abstractive_summarization/29655764CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/296557642025-07-05T09:00:00Z |
| spellingShingle | LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarization Eman Aloraini (21797867) Information and computing sciences Artificial intelligence Distributed computing and systems software Machine learning Abstractive summarization Re-ranking Lexical quality Semantic quality Deep learning |
| status_str | publishedVersion |
| title | LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarization |
| title_full | LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarization |
| title_fullStr | LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarization |
| title_full_unstemmed | LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarization |
| title_short | LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarization |
| title_sort | LexiSem: A re-ranker balancing lexical and semantic quality for enhanced abstractive summarization |
| topic | Information and computing sciences Artificial intelligence Distributed computing and systems software Machine learning Abstractive summarization Re-ranking Lexical quality Semantic quality Deep learning |