DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs

<p dir="ltr">Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding task-specific contexts. To address these issues, we propose a system called <b>DemoCraft</b>, which...

Full description

Saved in:

Bibliographic Details
Main Author:	Nirmal Joshua Kapu (18846481) (author)
Other Authors:	Mihit Sreejith (19953017) (author)
Published:	2024
Subjects:	Artificial intelligence not elsewhere classified Machine learning not elsewhere classified large language models in context learning machine learning latent concept learning nl2code code generation
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1852025645005012992
author	Nirmal Joshua Kapu (18846481)
author2	Mihit Sreejith (19953017)
author2_role	author
author_facet	Nirmal Joshua Kapu (18846481) Mihit Sreejith (19953017)
author_role	author
dc.creator.none.fl_str_mv	Nirmal Joshua Kapu (18846481) Mihit Sreejith (19953017)
dc.date.none.fl_str_mv	2024-10-27T04:13:18Z
dc.identifier.none.fl_str_mv	10.6084/m9.figshare.27310776.v1
dc.relation.none.fl_str_mv	https://figshare.com/articles/poster/DemoCraft_Using_In-Context_Learning_to_Improve_Code_Generation_in_LLMs/27310776
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Artificial intelligence not elsewhere classified Machine learning not elsewhere classified large language models in context learning machine learning latent concept learning nl2code code generation
dc.title.none.fl_str_mv	DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
dc.type.none.fl_str_mv	Image Poster info:eu-repo/semantics/publishedVersion image
description	<p dir="ltr">Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding task-specific contexts. To address these issues, we propose a system called <b>DemoCraft</b>, which enhances code generation by <b>leveraging in-context learning</b> and demonstration selection, combined with latent concept learning. Latent concept learning introduces additional concept tokens, which are trainable embeddings that capture task-specific knowledge. These tokens are integrated into the model’s input space, enabling the model to effectively identify and select optimal demonstrations for a given task. This approach is grounded in the principles of latent variable models, where taskspecific latent parameters d encapsulate complex contextual information. These concept tokens refine the model’s prediction process, ensuring task-specific knowledge is applied during code generation. This study evaluates the impact of these techniques on code generation, specifically using the SantaCoder model, tested on the MBPP and HumanEval datasets. Our methodology is structured into four phases: latent concept learning, demonstration selection, output formatting, and code evaluation. Demonstration selection, a critical step, optimizes the model’s generalization capabilities by identifying examples that best infer the task concepts. We address this by investigating two methods: latent concept selection, where demonstrations are chosen based on learned embeddings, and random selection. We also ensure that the model’s outputs conform to syntactic and semantic correctness through output formatting procedures. The generated code is rigorously evaluated using metrics such as <i>correctness@k</i>, <i>similarity@k</i>, and <i>pass@k</i>. Our experiments demonstrate a near <b>2x improvement</b> across these metrics, underscoring the role of latent concept learning and demonstration selection in improving the efficiency, accuracy, and adaptability of the SantaCoder model in real-world code generation tasks.</p>
eu_rights_str_mv	openAccess
id	Manara_2b4dca522e1045efa6de5effecf005f7
identifier_str_mv	10.6084/m9.figshare.27310776.v1
network_acronym_str	Manara
network_name_str	ManaraRepo
oai_identifier_str	oai:figshare.com:article/27310776
publishDate	2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	DemoCraft: Using In-Context Learning to Improve Code Generation in LLMsNirmal Joshua Kapu (18846481)Mihit Sreejith (19953017)Artificial intelligence not elsewhere classifiedMachine learning not elsewhere classifiedlarge language modelsin context learningmachine learninglatent concept learningnl2codecode generation<p dir="ltr">Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding task-specific contexts. To address these issues, we propose a system called <b>DemoCraft</b>, which enhances code generation by <b>leveraging in-context learning</b> and demonstration selection, combined with latent concept learning. Latent concept learning introduces additional concept tokens, which are trainable embeddings that capture task-specific knowledge. These tokens are integrated into the model’s input space, enabling the model to effectively identify and select optimal demonstrations for a given task. This approach is grounded in the principles of latent variable models, where taskspecific latent parameters d encapsulate complex contextual information. These concept tokens refine the model’s prediction process, ensuring task-specific knowledge is applied during code generation. This study evaluates the impact of these techniques on code generation, specifically using the SantaCoder model, tested on the MBPP and HumanEval datasets. Our methodology is structured into four phases: latent concept learning, demonstration selection, output formatting, and code evaluation. Demonstration selection, a critical step, optimizes the model’s generalization capabilities by identifying examples that best infer the task concepts. We address this by investigating two methods: latent concept selection, where demonstrations are chosen based on learned embeddings, and random selection. We also ensure that the model’s outputs conform to syntactic and semantic correctness through output formatting procedures. The generated code is rigorously evaluated using metrics such as <i>correctness@k</i>, <i>similarity@k</i>, and <i>pass@k</i>. Our experiments demonstrate a near <b>2x improvement</b> across these metrics, underscoring the role of latent concept learning and demonstration selection in improving the efficiency, accuracy, and adaptability of the SantaCoder model in real-world code generation tasks.</p>2024-10-27T04:13:18ZImagePosterinfo:eu-repo/semantics/publishedVersionimage10.6084/m9.figshare.27310776.v1https://figshare.com/articles/poster/DemoCraft_Using_In-Context_Learning_to_Improve_Code_Generation_in_LLMs/27310776CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/273107762024-10-27T04:13:18Z
spellingShingle	DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs Nirmal Joshua Kapu (18846481) Artificial intelligence not elsewhere classified Machine learning not elsewhere classified large language models in context learning machine learning latent concept learning nl2code code generation
status_str	publishedVersion
title	DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_full	DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_fullStr	DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_full_unstemmed	DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_short	DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
title_sort	DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs
topic	Artificial intelligence not elsewhere classified Machine learning not elsewhere classified large language models in context learning machine learning latent concept learning nl2code code generation

DemoCraft: Using In-Context Learning to Improve Code Generation in LLMs

Similar Items