Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models

<p dir="ltr">In the era of Large Language Models (LLMs), Knowledge Distillation (KD) enables the transfer of capabilities from proprietary LLMs to open-source models. This survey provides a detailed discussion of the basic principles, algorithms, and implementation methods of knowled...

Full description

Saved in:

Bibliographic Details
Main Author:	Dingzong Zhang (23275066) (author)
Other Authors:	Devi Listiyani (23275069) (author), Priyanka Singh (256412) (author), Manoranjan Mohanty (23275072) (author)
Published:	2025
Subjects:	Information and computing sciences Artificial intelligence Machine learning Artificial intelligence (AI) large language model (LLM) knowledge distillation (KD) optimization Transformers Computational modeling Surveys Natural language processing Predictive models Technological innovation Encoding Context modeling
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864513522576130048
author	Dingzong Zhang (23275066)
author2	Devi Listiyani (23275069) Priyanka Singh (256412) Manoranjan Mohanty (23275072)
author2_role	author author author
author_facet	Dingzong Zhang (23275066) Devi Listiyani (23275069) Priyanka Singh (256412) Manoranjan Mohanty (23275072)
author_role	author
dc.creator.none.fl_str_mv	Dingzong Zhang (23275066) Devi Listiyani (23275069) Priyanka Singh (256412) Manoranjan Mohanty (23275072)
dc.date.none.fl_str_mv	2025-04-04T06:00:00Z
dc.identifier.none.fl_str_mv	10.1109/access.2025.3554586
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/Distilling_Wisdom_A_Review_on_Optimizing_Learning_From_Massive_Language_Models/31443841
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Information and computing sciences Artificial intelligence Machine learning Artificial intelligence (AI) large language model (LLM) knowledge distillation (KD) optimization Transformers Computational modeling Surveys Natural language processing Predictive models Technological innovation Encoding Context modeling
dc.title.none.fl_str_mv	Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<p dir="ltr">In the era of Large Language Models (LLMs), Knowledge Distillation (KD) enables the transfer of capabilities from proprietary LLMs to open-source models. This survey provides a detailed discussion of the basic principles, algorithms, and implementation methods of knowledge distillation. It explores KD’s impact on LLMs, emphasizing its utility in model compression, performance enhancement, and self-improvement. Through the analysis of practical examples such as DistilBERT, TinyBERT, and MobileBERT, the paper demonstrates how knowledge distillation can markedly enhance the efficiency and applicability of large language models in real-world scenarios. The discussion encompasses the varied applications of KD across multiple domains, including industrial systems, embedded systems, Natural Language Processing (NLP), multi-modal processing, and vertical domains, such as medicine, law, science, finance, and materials science. This survey outlines current KD methodologies and future research directions, highlighting its role in advancing AI technologies and fostering innovation across different sectors.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3554586" target="_blank">https://dx.doi.org/10.1109/access.2025.3554586</a></p>
eu_rights_str_mv	openAccess
id	Manara2_decd8fbd7cb40f33be2a07afc7368fe1
identifier_str_mv	10.1109/access.2025.3554586
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/31443841
publishDate	2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	Distilling Wisdom: A Review on Optimizing Learning From Massive Language ModelsDingzong Zhang (23275066)Devi Listiyani (23275069)Priyanka Singh (256412)Manoranjan Mohanty (23275072)Information and computing sciencesArtificial intelligenceMachine learningArtificial intelligence (AI)large language model (LLM)knowledge distillation (KD)optimizationTransformersComputational modelingSurveysNatural language processingPredictive modelsTechnological innovationEncodingContext modeling<p dir="ltr">In the era of Large Language Models (LLMs), Knowledge Distillation (KD) enables the transfer of capabilities from proprietary LLMs to open-source models. This survey provides a detailed discussion of the basic principles, algorithms, and implementation methods of knowledge distillation. It explores KD’s impact on LLMs, emphasizing its utility in model compression, performance enhancement, and self-improvement. Through the analysis of practical examples such as DistilBERT, TinyBERT, and MobileBERT, the paper demonstrates how knowledge distillation can markedly enhance the efficiency and applicability of large language models in real-world scenarios. The discussion encompasses the varied applications of KD across multiple domains, including industrial systems, embedded systems, Natural Language Processing (NLP), multi-modal processing, and vertical domains, such as medicine, law, science, finance, and materials science. This survey outlines current KD methodologies and future research directions, highlighting its role in advancing AI technologies and fostering innovation across different sectors.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3554586" target="_blank">https://dx.doi.org/10.1109/access.2025.3554586</a></p>2025-04-04T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2025.3554586https://figshare.com/articles/journal_contribution/Distilling_Wisdom_A_Review_on_Optimizing_Learning_From_Massive_Language_Models/31443841CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/314438412025-04-04T06:00:00Z
spellingShingle	Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models Dingzong Zhang (23275066) Information and computing sciences Artificial intelligence Machine learning Artificial intelligence (AI) large language model (LLM) knowledge distillation (KD) optimization Transformers Computational modeling Surveys Natural language processing Predictive models Technological innovation Encoding Context modeling
status_str	publishedVersion
title	Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
title_full	Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
title_fullStr	Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
title_full_unstemmed	Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
title_short	Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
title_sort	Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
topic	Information and computing sciences Artificial intelligence Machine learning Artificial intelligence (AI) large language model (LLM) knowledge distillation (KD) optimization Transformers Computational modeling Surveys Natural language processing Predictive models Technological innovation Encoding Context modeling

Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models

Similar Items