DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs

<p dir="ltr">Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding task-specific contexts. To address these issues, we propose a system called <b>DemoCraft</b>, which...

Full description

Saved in:
Bibliographic Details
Main Author: Nirmal Joshua Kapu (18846481) (author)
Other Authors: Mihit Sreejith (19953017) (author)
Published: 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852025645005012992
author Nirmal Joshua Kapu (18846481)
author2 Mihit Sreejith (19953017)
author2_role author
author_facet Nirmal Joshua Kapu (18846481)
Mihit Sreejith (19953017)
author_role author
dc.creator.none.fl_str_mv Nirmal Joshua Kapu (18846481)
Mihit Sreejith (19953017)
dc.date.none.fl_str_mv 2024-10-27T04:13:18Z
dc.identifier.none.fl_str_mv 10.6084/m9.figshare.27310776.v1
dc.relation.none.fl_str_mv https://figshare.com/articles/poster/DemoCraft_Using_In-Context_Learning_to_Improve_Code_Generation_in_LLMs/27310776
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Artificial intelligence not elsewhere classified
Machine learning not elsewhere classified
large language models
in context learning
machine learning
latent concept learning
nl2code
code generation
dc.title.none.fl_str_mv DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
dc.type.none.fl_str_mv Image
Poster
info:eu-repo/semantics/publishedVersion
image
description <p dir="ltr">Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding task-specific contexts. To address these issues, we propose a system called <b>DemoCraft</b>, which enhances code generation by <b>leveraging in-context learning</b> and demonstration selection, combined with latent concept learning. Latent concept learning introduces additional concept tokens, which are trainable embeddings that capture task-specific knowledge. These tokens are integrated into the model’s input space, enabling the model to effectively identify and select optimal demonstrations for a given task. This approach is grounded in the principles of latent variable models, where taskspecific latent parameters d encapsulate complex contextual information. These concept tokens refine the model’s prediction process, ensuring task-specific knowledge is applied during code generation. This study evaluates the impact of these techniques on code generation, specifically using the SantaCoder model, tested on the MBPP and HumanEval datasets. Our methodology is structured into four phases: latent concept learning, demonstration selection, output formatting, and code evaluation. Demonstration selection, a critical step, optimizes the model’s generalization capabilities by identifying examples that best infer the task concepts. We address this by investigating two methods: latent concept selection, where demonstrations are chosen based on learned embeddings, and random selection. We also ensure that the model’s outputs conform to syntactic and semantic correctness through output formatting procedures. The generated code is rigorously evaluated using metrics such as <i>correctness@k</i>, <i>similarity@k</i>, and <i>pass@k</i>. Our experiments demonstrate a near <b>2x improvement</b> across these metrics, underscoring the role of latent concept learning and demonstration selection in improving the efficiency, accuracy, and adaptability of the SantaCoder model in real-world code generation tasks.</p>
eu_rights_str_mv openAccess
id Manara_2b4dca522e1045efa6de5effecf005f7
identifier_str_mv 10.6084/m9.figshare.27310776.v1
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/27310776
publishDate 2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling DemoCraft: Using In-Context Learning to Improve Code Generation in LLMsNirmal Joshua Kapu (18846481)Mihit Sreejith (19953017)Artificial intelligence not elsewhere classifiedMachine learning not elsewhere classifiedlarge language modelsin context learningmachine learninglatent concept learningnl2codecode generation<p dir="ltr">Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding task-specific contexts. To address these issues, we propose a system called <b>DemoCraft</b>, which enhances code generation by <b>leveraging in-context learning</b> and demonstration selection, combined with latent concept learning. Latent concept learning introduces additional concept tokens, which are trainable embeddings that capture task-specific knowledge. These tokens are integrated into the model’s input space, enabling the model to effectively identify and select optimal demonstrations for a given task. This approach is grounded in the principles of latent variable models, where taskspecific latent parameters d encapsulate complex contextual information. These concept tokens refine the model’s prediction process, ensuring task-specific knowledge is applied during code generation. This study evaluates the impact of these techniques on code generation, specifically using the SantaCoder model, tested on the MBPP and HumanEval datasets. Our methodology is structured into four phases: latent concept learning, demonstration selection, output formatting, and code evaluation. Demonstration selection, a critical step, optimizes the model’s generalization capabilities by identifying examples that best infer the task concepts. We address this by investigating two methods: latent concept selection, where demonstrations are chosen based on learned embeddings, and random selection. We also ensure that the model’s outputs conform to syntactic and semantic correctness through output formatting procedures. The generated code is rigorously evaluated using metrics such as <i>correctness@k</i>, <i>similarity@k</i>, and <i>pass@k</i>. Our experiments demonstrate a near <b>2x improvement</b> across these metrics, underscoring the role of latent concept learning and demonstration selection in improving the efficiency, accuracy, and adaptability of the SantaCoder model in real-world code generation tasks.</p>2024-10-27T04:13:18ZImagePosterinfo:eu-repo/semantics/publishedVersionimage10.6084/m9.figshare.27310776.v1https://figshare.com/articles/poster/DemoCraft_Using_In-Context_Learning_to_Improve_Code_Generation_in_LLMs/27310776CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/273107762024-10-27T04:13:18Z
spellingShingle DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
Nirmal Joshua Kapu (18846481)
Artificial intelligence not elsewhere classified
Machine learning not elsewhere classified
large language models
in context learning
machine learning
latent concept learning
nl2code
code generation
status_str publishedVersion
title DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_full DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_fullStr DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_full_unstemmed DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_short DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_sort DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
topic Artificial intelligence not elsewhere classified
Machine learning not elsewhere classified
large language models
in context learning
machine learning
latent concept learning
nl2code
code generation