Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
<p dir="ltr">In the era of Large Language Models (LLMs), Knowledge Distillation (KD) enables the transfer of capabilities from proprietary LLMs to open-source models. This survey provides a detailed discussion of the basic principles, algorithms, and implementation methods of knowled...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1864513522576130048 |
|---|---|
| author | Dingzong Zhang (23275066) |
| author2 | Devi Listiyani (23275069) Priyanka Singh (256412) Manoranjan Mohanty (23275072) |
| author2_role | author author author |
| author_facet | Dingzong Zhang (23275066) Devi Listiyani (23275069) Priyanka Singh (256412) Manoranjan Mohanty (23275072) |
| author_role | author |
| dc.creator.none.fl_str_mv | Dingzong Zhang (23275066) Devi Listiyani (23275069) Priyanka Singh (256412) Manoranjan Mohanty (23275072) |
| dc.date.none.fl_str_mv | 2025-04-04T06:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1109/access.2025.3554586 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/Distilling_Wisdom_A_Review_on_Optimizing_Learning_From_Massive_Language_Models/31443841 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Information and computing sciences Artificial intelligence Machine learning Artificial intelligence (AI) large language model (LLM) knowledge distillation (KD) optimization Transformers Computational modeling Surveys Natural language processing Predictive models Technological innovation Encoding Context modeling |
| dc.title.none.fl_str_mv | Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p dir="ltr">In the era of Large Language Models (LLMs), Knowledge Distillation (KD) enables the transfer of capabilities from proprietary LLMs to open-source models. This survey provides a detailed discussion of the basic principles, algorithms, and implementation methods of knowledge distillation. It explores KD’s impact on LLMs, emphasizing its utility in model compression, performance enhancement, and self-improvement. Through the analysis of practical examples such as DistilBERT, TinyBERT, and MobileBERT, the paper demonstrates how knowledge distillation can markedly enhance the efficiency and applicability of large language models in real-world scenarios. The discussion encompasses the varied applications of KD across multiple domains, including industrial systems, embedded systems, Natural Language Processing (NLP), multi-modal processing, and vertical domains, such as medicine, law, science, finance, and materials science. This survey outlines current KD methodologies and future research directions, highlighting its role in advancing AI technologies and fostering innovation across different sectors.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3554586" target="_blank">https://dx.doi.org/10.1109/access.2025.3554586</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_decd8fbd7cb40f33be2a07afc7368fe1 |
| identifier_str_mv | 10.1109/access.2025.3554586 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/31443841 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Distilling Wisdom: A Review on Optimizing Learning From Massive Language ModelsDingzong Zhang (23275066)Devi Listiyani (23275069)Priyanka Singh (256412)Manoranjan Mohanty (23275072)Information and computing sciencesArtificial intelligenceMachine learningArtificial intelligence (AI)large language model (LLM)knowledge distillation (KD)optimizationTransformersComputational modelingSurveysNatural language processingPredictive modelsTechnological innovationEncodingContext modeling<p dir="ltr">In the era of Large Language Models (LLMs), Knowledge Distillation (KD) enables the transfer of capabilities from proprietary LLMs to open-source models. This survey provides a detailed discussion of the basic principles, algorithms, and implementation methods of knowledge distillation. It explores KD’s impact on LLMs, emphasizing its utility in model compression, performance enhancement, and self-improvement. Through the analysis of practical examples such as DistilBERT, TinyBERT, and MobileBERT, the paper demonstrates how knowledge distillation can markedly enhance the efficiency and applicability of large language models in real-world scenarios. The discussion encompasses the varied applications of KD across multiple domains, including industrial systems, embedded systems, Natural Language Processing (NLP), multi-modal processing, and vertical domains, such as medicine, law, science, finance, and materials science. This survey outlines current KD methodologies and future research directions, highlighting its role in advancing AI technologies and fostering innovation across different sectors.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3554586" target="_blank">https://dx.doi.org/10.1109/access.2025.3554586</a></p>2025-04-04T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2025.3554586https://figshare.com/articles/journal_contribution/Distilling_Wisdom_A_Review_on_Optimizing_Learning_From_Massive_Language_Models/31443841CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/314438412025-04-04T06:00:00Z |
| spellingShingle | Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models Dingzong Zhang (23275066) Information and computing sciences Artificial intelligence Machine learning Artificial intelligence (AI) large language model (LLM) knowledge distillation (KD) optimization Transformers Computational modeling Surveys Natural language processing Predictive models Technological innovation Encoding Context modeling |
| status_str | publishedVersion |
| title | Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models |
| title_full | Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models |
| title_fullStr | Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models |
| title_full_unstemmed | Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models |
| title_short | Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models |
| title_sort | Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models |
| topic | Information and computing sciences Artificial intelligence Machine learning Artificial intelligence (AI) large language model (LLM) knowledge distillation (KD) optimization Transformers Computational modeling Surveys Natural language processing Predictive models Technological innovation Encoding Context modeling |