Single-channel speech denoising by masking the colored spectrograms

<p>Speech denoising (SD) covers the algorithms that remove the background noise from the target speech and thus improve its quality and intelligibility. In this paper, a novel SD technique is proposed that masks the colored spectrogram. U-Net (a deep neural network fundamentally developed for...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Sania Gul (18272227) (author)
مؤلفون آخرون: Muhammad Salman Khan (7202543) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513524523335680
author Sania Gul (18272227)
author2 Muhammad Salman Khan (7202543)
author2_role author
author_facet Sania Gul (18272227)
Muhammad Salman Khan (7202543)
author_role author
dc.creator.none.fl_str_mv Sania Gul (18272227)
Muhammad Salman Khan (7202543)
dc.date.none.fl_str_mv 2025-09-06T09:00:00Z
dc.identifier.none.fl_str_mv 10.1016/j.compeleceng.2025.110656
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Single-channel_speech_denoising_by_masking_the_colored_spectrograms/31056898
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Engineering
Communications engineering
Information and computing sciences
Machine learning
Colors
Masking
Colored spectrograms
U-Net
Speech denoising
dc.title.none.fl_str_mv Single-channel speech denoising by masking the colored spectrograms
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>Speech denoising (SD) covers the algorithms that remove the background noise from the target speech and thus improve its quality and intelligibility. In this paper, a novel SD technique is proposed that masks the colored spectrogram. U-Net (a deep neural network fundamentally developed for image segmentation) is trained on the noisy log-powered colored spectrograms (LPcS), using the binarized Mel spectrograms as ground truth (GT). After training, the colored spectrogram of the noisy speech is passed through U-Net, which generates a soft mask at its output. This mask is applied to the magnitude matrix of the short-time Fourier transform (STFT) of the noisy speech to retrieve the magnitude matrix of the estimated speech. This matrix is later combined with the noisy phase matrix to recover the target speech. The results show that with masking-based targets, the colored spectrograms provide an improvement of 0.12 points in perceptual evaluation of speech quality (PESQ) score, 4 % in short time objective intelligibility (STOI), and a 163 times reduction in network learnable parameters, as compared to when they are processed by a mapping-based model using pix2pix generative adversarial network (GAN) followed by a feedforward regression neural network. With a slightly reduced PESQ score (by 0.58 points), the proposed model offers an improvement of 2 % in STOI, and 4375 and 1135 times reduction respectively in the required number of training epochs and network parameters when compared to a GAN-based model augmented by WavLM; a large-scale self-supervised learning model. Similarly, it offers an improvement of 1 % in STOI and a reduction of 33 and 200 times, respectively, in network size and training epochs when compared to a complex variational U-Net-based model. Also, with comparable PESQ, the proposed system offers almost 2 % improvement in STOI, and a 2 times reduction in network size and 100 times reduction in training epochs, when compared to a lightweight system using automatic dimension reduction of network layers by a structured pruning method.</p><h2>Other Information</h2> <p> Published in: Computers and Electrical Engineering<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.compeleceng.2025.110656" target="_blank">https://dx.doi.org/10.1016/j.compeleceng.2025.110656</a></p>
eu_rights_str_mv openAccess
id Manara2_7d4aadc362131b53ade11276c995fccf
identifier_str_mv 10.1016/j.compeleceng.2025.110656
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/31056898
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Single-channel speech denoising by masking the colored spectrogramsSania Gul (18272227)Muhammad Salman Khan (7202543)EngineeringCommunications engineeringInformation and computing sciencesMachine learningColorsMaskingColored spectrogramsU-NetSpeech denoising<p>Speech denoising (SD) covers the algorithms that remove the background noise from the target speech and thus improve its quality and intelligibility. In this paper, a novel SD technique is proposed that masks the colored spectrogram. U-Net (a deep neural network fundamentally developed for image segmentation) is trained on the noisy log-powered colored spectrograms (LPcS), using the binarized Mel spectrograms as ground truth (GT). After training, the colored spectrogram of the noisy speech is passed through U-Net, which generates a soft mask at its output. This mask is applied to the magnitude matrix of the short-time Fourier transform (STFT) of the noisy speech to retrieve the magnitude matrix of the estimated speech. This matrix is later combined with the noisy phase matrix to recover the target speech. The results show that with masking-based targets, the colored spectrograms provide an improvement of 0.12 points in perceptual evaluation of speech quality (PESQ) score, 4 % in short time objective intelligibility (STOI), and a 163 times reduction in network learnable parameters, as compared to when they are processed by a mapping-based model using pix2pix generative adversarial network (GAN) followed by a feedforward regression neural network. With a slightly reduced PESQ score (by 0.58 points), the proposed model offers an improvement of 2 % in STOI, and 4375 and 1135 times reduction respectively in the required number of training epochs and network parameters when compared to a GAN-based model augmented by WavLM; a large-scale self-supervised learning model. Similarly, it offers an improvement of 1 % in STOI and a reduction of 33 and 200 times, respectively, in network size and training epochs when compared to a complex variational U-Net-based model. Also, with comparable PESQ, the proposed system offers almost 2 % improvement in STOI, and a 2 times reduction in network size and 100 times reduction in training epochs, when compared to a lightweight system using automatic dimension reduction of network layers by a structured pruning method.</p><h2>Other Information</h2> <p> Published in: Computers and Electrical Engineering<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.compeleceng.2025.110656" target="_blank">https://dx.doi.org/10.1016/j.compeleceng.2025.110656</a></p>2025-09-06T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.compeleceng.2025.110656https://figshare.com/articles/journal_contribution/Single-channel_speech_denoising_by_masking_the_colored_spectrograms/31056898CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/310568982025-09-06T09:00:00Z
spellingShingle Single-channel speech denoising by masking the colored spectrograms
Sania Gul (18272227)
Engineering
Communications engineering
Information and computing sciences
Machine learning
Colors
Masking
Colored spectrograms
U-Net
Speech denoising
status_str publishedVersion
title Single-channel speech denoising by masking the colored spectrograms
title_full Single-channel speech denoising by masking the colored spectrograms
title_fullStr Single-channel speech denoising by masking the colored spectrograms
title_full_unstemmed Single-channel speech denoising by masking the colored spectrograms
title_short Single-channel speech denoising by masking the colored spectrograms
title_sort Single-channel speech denoising by masking the colored spectrograms
topic Engineering
Communications engineering
Information and computing sciences
Machine learning
Colors
Masking
Colored spectrograms
U-Net
Speech denoising