Automated Logical Graph Representation of Machine Learning Pipelines Using Collaborative Language Models - Replication package.

<p dir="ltr">Machine Learning (ML) is increasingly used across diverse application domains, offering powerful capabilities for intelligent services. As ML integrates into larger software systems, it adds complexity, making maintenance and evolution challenging. ML pipelines manage th...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Anonymous Anonymous (20002335) (author)
منشور في: 2024
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:<p dir="ltr">Machine Learning (ML) is increasingly used across diverse application domains, offering powerful capabilities for intelligent services. As ML integrates into larger software systems, it adds complexity, making maintenance and evolution challenging. ML pipelines manage the lifecycle of these models but are often difficult to understand, maintain, and adapt due to limited support from existing orchestration tools. This paper presents a novel approach, coined CLAIRE (Collaborative LLMs Approach for Intelligent Representation Extraction of ML Pipelines), which generates Directed Acyclic Graph (DAG) representations that capture the logical structure and purpose of ML pipelines, rather than syntactic constructs. By focusing on logical segments, CLAIRE enables an improved representation of source code, advancing the capabilities of existing AST-based representations, which instead emphasize syntax. We leverage a collaborative ensemble of Large Language Models to improve accuracy and support the evolutionary activities of ML-enabled systems. A qualitative evaluation with 18 experienced ML engineers indicates that our approach effectively captures pipeline structure and highlights critical stages, aiding in understanding, diagnosing, and refining ML workflows. Practitioners found it useful for supporting evolutionary activities, though improvements are needed in representing complex data flows. The results demonstrate the potential of our approach to enhance ML pipeline maintainability and scalability, representing a potentially valuable solution to promote the reliability of ML-enabled systems.</p>