The detection performance for different dataset.
<div><p>The detection of defects on steel surfaces constitutes a vital area of research in computer vision, characterized by its complexity and variety, which pose significant difficulties for accurate identification. In this context, we introduce a deep learning framework that combines...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1852014954633232384 |
|---|---|
| author | Shouluan Wu (22601074) |
| author2 | Hui Yang (91136) Liefa Liao (12098633) Chao Song (379006) Yating Fang (5483327) Yang Yang (45629) |
| author2_role | author author author author author |
| author_facet | Shouluan Wu (22601074) Hui Yang (91136) Liefa Liao (12098633) Chao Song (379006) Yating Fang (5483327) Yang Yang (45629) |
| author_role | author |
| dc.creator.none.fl_str_mv | Shouluan Wu (22601074) Hui Yang (91136) Liefa Liao (12098633) Chao Song (379006) Yating Fang (5483327) Yang Yang (45629) |
| dc.date.none.fl_str_mv | 2025-11-11T18:37:14Z |
| dc.identifier.none.fl_str_mv | 10.1371/journal.pone.0334048.t003 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/dataset/The_detection_performance_for_different_dataset_/30592498 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Medicine Neuroscience Cancer Space Science Biological Sciences not elsewhere classified Information Systems not elsewhere classified steel surfaces constitutes pose significant difficulties incorporating resnet18 along feature extraction functionality different convolutional groups detailed ablation studies deep learning framework convolutional neural networks channel random coding scale feature fusion low recognition accuracy de datasets demonstrate model &# 8217 attention mechanism associated scale features de dataset model registers model capitalizes classification accuracy xlink "> vital area transformer architecture thus minimizing parameter redundancy novel improvement conventional models considerable potential computer vision combined improvement cnns ), bring forth backbone network accurate identification 73 %, 72 %, 46 %. 03 %, |
| dc.title.none.fl_str_mv | The detection performance for different dataset. |
| dc.type.none.fl_str_mv | Dataset info:eu-repo/semantics/publishedVersion dataset |
| description | <div><p>The detection of defects on steel surfaces constitutes a vital area of research in computer vision, characterized by its complexity and variety, which pose significant difficulties for accurate identification. In this context, we introduce a deep learning framework that combines multi-channel random coding with modules for multi-scale feature fusion to tackle the challenges of low recognition accuracy and insufficient classification power prevalent in conventional models. Our model capitalizes on the self-attention mechanism associated with the Transformer architecture, alongside the strong feature extraction capabilities of Convolutional Neural Networks (CNNs), to facilitate a combined improvement in performance. To start, we enhance the model’s feature extraction functionality by incorporating ResNet18 along with global self-attention. Next, we bring forth a novel improvement to the backbone network by adding a multi-channel shuffled encoding module, which effectively encodes various features through the interactions of different convolutional groups, thus minimizing the number of parameters. Additionally, we introduce a multi-feature fusion module UPC-SimAM (upsample concatenated Simple Parameter-Free Attention Module), which is free from parameter redundancy to bolster the model’s capacity to merge multi-scale features. Our experiments on the NEU-DET and GC10-DE datasets demonstrate that our model outperforms existing state-of-the-art techniques regarding detection efficiency. Specifically, the model registers a classification accuracy of 91.72%, an mAP@0.5 of 83.03%, and an mAP@0.5:0.95 of 45.55% on the NEU-DET dataset. On the GC10-DE dataset, it achieves a classification precision of 76.73%, an mAP@0.5 of 65.03%, and an mAP@0.5:0.95 of 32.46%. Through detailed ablation studies and visualization experiments, we affirm the considerable potential and benefits of the proposed SH-DETR model in the field of detecting defects on steel surfaces.</p></div> |
| eu_rights_str_mv | openAccess |
| id | Manara_efa588e99bf862138db6cfe29fbdfd52 |
| identifier_str_mv | 10.1371/journal.pone.0334048.t003 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/30592498 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | The detection performance for different dataset.Shouluan Wu (22601074)Hui Yang (91136)Liefa Liao (12098633)Chao Song (379006)Yating Fang (5483327)Yang Yang (45629)MedicineNeuroscienceCancerSpace ScienceBiological Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedsteel surfaces constitutespose significant difficultiesincorporating resnet18 alongfeature extraction functionalitydifferent convolutional groupsdetailed ablation studiesdeep learning frameworkconvolutional neural networkschannel random codingscale feature fusionlow recognition accuracyde datasets demonstratemodel &# 8217attention mechanism associatedscale featuresde datasetmodel registersmodel capitalizesclassification accuracyxlink ">vital areatransformer architecturethus minimizingparameter redundancynovel improvementconventional modelsconsiderable potentialcomputer visioncombined improvementcnns ),bring forthbackbone networkaccurate identification73 %,72 %,46 %.03 %,<div><p>The detection of defects on steel surfaces constitutes a vital area of research in computer vision, characterized by its complexity and variety, which pose significant difficulties for accurate identification. In this context, we introduce a deep learning framework that combines multi-channel random coding with modules for multi-scale feature fusion to tackle the challenges of low recognition accuracy and insufficient classification power prevalent in conventional models. Our model capitalizes on the self-attention mechanism associated with the Transformer architecture, alongside the strong feature extraction capabilities of Convolutional Neural Networks (CNNs), to facilitate a combined improvement in performance. To start, we enhance the model’s feature extraction functionality by incorporating ResNet18 along with global self-attention. Next, we bring forth a novel improvement to the backbone network by adding a multi-channel shuffled encoding module, which effectively encodes various features through the interactions of different convolutional groups, thus minimizing the number of parameters. Additionally, we introduce a multi-feature fusion module UPC-SimAM (upsample concatenated Simple Parameter-Free Attention Module), which is free from parameter redundancy to bolster the model’s capacity to merge multi-scale features. Our experiments on the NEU-DET and GC10-DE datasets demonstrate that our model outperforms existing state-of-the-art techniques regarding detection efficiency. Specifically, the model registers a classification accuracy of 91.72%, an mAP@0.5 of 83.03%, and an mAP@0.5:0.95 of 45.55% on the NEU-DET dataset. On the GC10-DE dataset, it achieves a classification precision of 76.73%, an mAP@0.5 of 65.03%, and an mAP@0.5:0.95 of 32.46%. Through detailed ablation studies and visualization experiments, we affirm the considerable potential and benefits of the proposed SH-DETR model in the field of detecting defects on steel surfaces.</p></div>2025-11-11T18:37:14ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pone.0334048.t003https://figshare.com/articles/dataset/The_detection_performance_for_different_dataset_/30592498CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/305924982025-11-11T18:37:14Z |
| spellingShingle | The detection performance for different dataset. Shouluan Wu (22601074) Medicine Neuroscience Cancer Space Science Biological Sciences not elsewhere classified Information Systems not elsewhere classified steel surfaces constitutes pose significant difficulties incorporating resnet18 along feature extraction functionality different convolutional groups detailed ablation studies deep learning framework convolutional neural networks channel random coding scale feature fusion low recognition accuracy de datasets demonstrate model &# 8217 attention mechanism associated scale features de dataset model registers model capitalizes classification accuracy xlink "> vital area transformer architecture thus minimizing parameter redundancy novel improvement conventional models considerable potential computer vision combined improvement cnns ), bring forth backbone network accurate identification 73 %, 72 %, 46 %. 03 %, |
| status_str | publishedVersion |
| title | The detection performance for different dataset. |
| title_full | The detection performance for different dataset. |
| title_fullStr | The detection performance for different dataset. |
| title_full_unstemmed | The detection performance for different dataset. |
| title_short | The detection performance for different dataset. |
| title_sort | The detection performance for different dataset. |
| topic | Medicine Neuroscience Cancer Space Science Biological Sciences not elsewhere classified Information Systems not elsewhere classified steel surfaces constitutes pose significant difficulties incorporating resnet18 along feature extraction functionality different convolutional groups detailed ablation studies deep learning framework convolutional neural networks channel random coding scale feature fusion low recognition accuracy de datasets demonstrate model &# 8217 attention mechanism associated scale features de dataset model registers model capitalizes classification accuracy xlink "> vital area transformer architecture thus minimizing parameter redundancy novel improvement conventional models considerable potential computer vision combined improvement cnns ), bring forth backbone network accurate identification 73 %, 72 %, 46 %. 03 %, |