CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation

<p dir="ltr">Automatic segmentation of two-dimensional (2D) echocardiogram is beneficial for heart disease diagnosis and assessment. Convolutional Neural Network (CNN) based U-shaped architectures such as UNet have shown remarkable success for medical images segmentation. UNet genera...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Md Rabiul Islam (6424796) (author)
مؤلفون آخرون:	Marwa Qaraqe (10135172) (author), Erchin Serpedin (3706543) (author)
منشور في:	2024
الموضوعات:	Engineering Biomedical engineering Health sciences Health services and systems Information and computing sciences Artificial intelligence Machine learning Segmentation Echocardiogram Vision transformer CNN-transformer Local–global
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

_version_	1864513541449449472
author	Md Rabiul Islam (6424796)
author2	Marwa Qaraqe (10135172) Erchin Serpedin (3706543)
author2_role	author author
author_facet	Md Rabiul Islam (6424796) Marwa Qaraqe (10135172) Erchin Serpedin (3706543)
author_role	author
dc.creator.none.fl_str_mv	Md Rabiul Islam (6424796) Marwa Qaraqe (10135172) Erchin Serpedin (3706543)
dc.date.none.fl_str_mv	2024-07-13T15:00:00Z
dc.identifier.none.fl_str_mv	10.1016/j.bspc.2024.106633
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/CoST-UNet_Convolution_and_swin_transformer_based_deep_learning_architecture_for_cardiac_segmentation/29900453
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Engineering Biomedical engineering Health sciences Health services and systems Information and computing sciences Artificial intelligence Machine learning Segmentation Echocardiogram Vision transformer CNN-transformer Local–global
dc.title.none.fl_str_mv	CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<p dir="ltr">Automatic segmentation of two-dimensional (2D) echocardiogram is beneficial for heart disease diagnosis and assessment. Convolutional Neural Network (CNN) based U-shaped architectures such as UNet have shown remarkable success for medical images segmentation. UNet generally exhibits limitations for seizing long-range dependencies due to the intrinsic locality of the convolution operation. On the contrary, transformer models can capture global-level information using the multi-head attention mechanism. Taken separately these models exhibit limited localization abilities due to insufficient low-level details. To overcome these limitations, this paper proposes the novel vision transformer CoST-UNet (Convolution and Swin Transformer-based U-shaped Network) architecture that incorporates CNN to leverage spatial information from images in the upper layers and transformer to emphasize global contextual insight in the deeper levels. Unlike existing hybrid models like TransUNet and UNETR, the transformer block of the proposed model employs a Swin Transformer backbone, which ensures linear computational complexity relative to image size. Furthermore, the primary barrier to improving the performance of the transformers, which is the lack of medical images, is effectively addressed by incorporating two convolution layers at the network’s uppermost level. The experimental results demonstrate that the model achieved state-of-the-art performance on the ultrasound-based CAMUS dataset (by achieving mean Dice Similarity Coefficients of 0.925, 0.851, and 0.895 for segmenting LV<sub>endo</sub>, LV<sub>epi</sub>, and LA, respectively, from apical 4CH echocardiograms), as well as competitive results for MRI-based ACDC datasets, due to its effective capture of local and global context.</p><h2>Other Information</h2><p dir="ltr">Published in: Biomedical Signal Processing and Control<br>License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.bspc.2024.106633" target="_blank">https://dx.doi.org/10.1016/j.bspc.2024.106633</a></p>
eu_rights_str_mv	openAccess
id	Manara2_2fbbce77f4989964eee326f12b6778d3
identifier_str_mv	10.1016/j.bspc.2024.106633
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/29900453
publishDate	2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentationMd Rabiul Islam (6424796)Marwa Qaraqe (10135172)Erchin Serpedin (3706543)EngineeringBiomedical engineeringHealth sciencesHealth services and systemsInformation and computing sciencesArtificial intelligenceMachine learningSegmentationEchocardiogramVision transformerCNN-transformerLocal–global<p dir="ltr">Automatic segmentation of two-dimensional (2D) echocardiogram is beneficial for heart disease diagnosis and assessment. Convolutional Neural Network (CNN) based U-shaped architectures such as UNet have shown remarkable success for medical images segmentation. UNet generally exhibits limitations for seizing long-range dependencies due to the intrinsic locality of the convolution operation. On the contrary, transformer models can capture global-level information using the multi-head attention mechanism. Taken separately these models exhibit limited localization abilities due to insufficient low-level details. To overcome these limitations, this paper proposes the novel vision transformer CoST-UNet (Convolution and Swin Transformer-based U-shaped Network) architecture that incorporates CNN to leverage spatial information from images in the upper layers and transformer to emphasize global contextual insight in the deeper levels. Unlike existing hybrid models like TransUNet and UNETR, the transformer block of the proposed model employs a Swin Transformer backbone, which ensures linear computational complexity relative to image size. Furthermore, the primary barrier to improving the performance of the transformers, which is the lack of medical images, is effectively addressed by incorporating two convolution layers at the network’s uppermost level. The experimental results demonstrate that the model achieved state-of-the-art performance on the ultrasound-based CAMUS dataset (by achieving mean Dice Similarity Coefficients of 0.925, 0.851, and 0.895 for segmenting LV<sub>endo</sub>, LV<sub>epi</sub>, and LA, respectively, from apical 4CH echocardiograms), as well as competitive results for MRI-based ACDC datasets, due to its effective capture of local and global context.</p><h2>Other Information</h2><p dir="ltr">Published in: Biomedical Signal Processing and Control<br>License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.bspc.2024.106633" target="_blank">https://dx.doi.org/10.1016/j.bspc.2024.106633</a></p>2024-07-13T15:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.bspc.2024.106633https://figshare.com/articles/journal_contribution/CoST-UNet_Convolution_and_swin_transformer_based_deep_learning_architecture_for_cardiac_segmentation/29900453CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/299004532024-07-13T15:00:00Z
spellingShingle	CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation Md Rabiul Islam (6424796) Engineering Biomedical engineering Health sciences Health services and systems Information and computing sciences Artificial intelligence Machine learning Segmentation Echocardiogram Vision transformer CNN-transformer Local–global
status_str	publishedVersion
title	CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation
title_full	CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation
title_fullStr	CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation
title_full_unstemmed	CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation
title_short	CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation
title_sort	CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation
topic	Engineering Biomedical engineering Health sciences Health services and systems Information and computing sciences Artificial intelligence Machine learning Segmentation Echocardiogram Vision transformer CNN-transformer Local–global

CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation

مواد مشابهة