AI-Based Multiclass Grading of Hepatic Steatosis From B-Mode Ultrasound: Generalization Across Modalities and Clinical Comparison With Radiologists
<p dir="ltr">Non-alcoholic fatty liver disease (NAFLD) is a growing public health challenge, underscoring the need for scalable, non-invasive tools to grade hepatic steatosis. Although B-mode ultrasound is accessible and safe, its reliability is limited by operator and scanner variab...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , , , , , , , , , , , , , , |
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| الملخص: | <p dir="ltr">Non-alcoholic fatty liver disease (NAFLD) is a growing public health challenge, underscoring the need for scalable, non-invasive tools to grade hepatic steatosis. Although B-mode ultrasound is accessible and safe, its reliability is limited by operator and scanner variability. We present the Deep Domain Adaptation Neural Network (DDANN), a deep learning system for multiclass steatosis classification (Normal, Mild, Moderate, Severe) from ultrasound that emphasizes cross-device generalizability. To mitigate distribution shifts across scanners (LOGIQ, iU22, EPIQ), DDANN combines a MobileNetV2 backbone with triplet loss, entropy-based domain adaptation, and preprocessing that includes speckle suppression, percentile normalization, and LOGIQ-specific harmonization. Trained on a biopsy-confirmed, multi-institutional cohort (primarily LOGIQ and iU22), the model was externally validated on an unseen EPIQ test set of 1,083 images from 47 patients, achieving 98.71% accuracy, 0.9872 macro <i>F</i><sub><em>1</em></sub> -score, and 0.9998 AUC-ROC, outperforming baselines. In a separate radiologist–AI comparison on 224 biopsy-confirmed images not used for training or validation, the AI reached 91.96% accuracy, significantly exceeding radiologists’ 19.64%–31.70% (McNemar’s test, <i>p</i><0.001 ), with strong agreement to ground truth (<i>κ</i>=0.893) versus radiologists’ poor-to-slight agreement (<i>κ</i>=0.006 –0.194). The AI maintained balanced class-wise <i>F</i><sub><em>1</em></sub> -scores (0.90–0.94), while radiologists struggled, particularly with Mild and Moderate cases, and exhibited substantial inter-reader variability (<i>κ</i>=0.068 –0.648). These results demonstrate robust cross-device performance and support integrating AI as a reliable second reader or primary screening tool to reduce subjectivity in steatosis assessment.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3617778" target="_blank">https://dx.doi.org/10.1109/access.2025.3617778</a></p> |
|---|