AI-Based Multiclass Grading of Hepatic Steatosis From B-Mode Ultrasound: Generalization Across Modalities and Clinical Comparison With Radiologists

<p dir="ltr">Non-alcoholic fatty liver disease (NAFLD) is a growing public health challenge, underscoring the need for scalable, non-invasive tools to grade hepatic steatosis. Although B-mode ultrasound is accessible and safe, its reliability is limited by operator and scanner variab...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Fahad Muflih Alshagathrh (18427950) (author)
مؤلفون آخرون: Haider Dhia Zubaydi (18519360) (author), Mahmood Alzubaidi (15740693) (author), Abdulaziz Alosaimi (11475559) (author), Raneem Mohammed Al Saqer (23073700) (author), Abdullah Mutlaq Alzahrani (23073703) (author), Mei Khalid Alfaqiri (23073706) (author), Mohamed Rajab Elzahrani (23073709) (author), Khalid Alswat (13047418) (author), Ali Aldhebaib (23073712) (author), Bushra Alahmadi (23073715) (author), Meteb Alkubeyyer (23073718) (author), Amani Alsadoon (23073721) (author), Maram Alkhamash (23073724) (author), Jawad Ahmad Alraimi (23073727) (author), Jens Schneider (16885948) (author), Mowafa Househ (9154124) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:<p dir="ltr">Non-alcoholic fatty liver disease (NAFLD) is a growing public health challenge, underscoring the need for scalable, non-invasive tools to grade hepatic steatosis. Although B-mode ultrasound is accessible and safe, its reliability is limited by operator and scanner variability. We present the Deep Domain Adaptation Neural Network (DDANN), a deep learning system for multiclass steatosis classification (Normal, Mild, Moderate, Severe) from ultrasound that emphasizes cross-device generalizability. To mitigate distribution shifts across scanners (LOGIQ, iU22, EPIQ), DDANN combines a MobileNetV2 backbone with triplet loss, entropy-based domain adaptation, and preprocessing that includes speckle suppression, percentile normalization, and LOGIQ-specific harmonization. Trained on a biopsy-confirmed, multi-institutional cohort (primarily LOGIQ and iU22), the model was externally validated on an unseen EPIQ test set of 1,083 images from 47 patients, achieving 98.71% accuracy, 0.9872 macro <i>F</i><sub><em>1</em></sub> -score, and 0.9998 AUC-ROC, outperforming baselines. In a separate radiologist–AI comparison on 224 biopsy-confirmed images not used for training or validation, the AI reached 91.96% accuracy, significantly exceeding radiologists’ 19.64%–31.70% (McNemar’s test, <i>p</i><0.001 ), with strong agreement to ground truth (<i>κ</i>=0.893) versus radiologists’ poor-to-slight agreement (<i>κ</i>=0.006 –0.194). The AI maintained balanced class-wise <i>F</i><sub><em>1</em></sub> -scores (0.90–0.94), while radiologists struggled, particularly with Mild and Moderate cases, and exhibited substantial inter-reader variability (<i>κ</i>=0.068 –0.648). These results demonstrate robust cross-device performance and support integrating AI as a reliable second reader or primary screening tool to reduce subjectivity in steatosis assessment.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3617778" target="_blank">https://dx.doi.org/10.1109/access.2025.3617778</a></p>