Single-channel speech denoising by masking the colored spectrograms

<p>Speech denoising (SD) covers the algorithms that remove the background noise from the target speech and thus improve its quality and intelligibility. In this paper, a novel SD technique is proposed that masks the colored spectrogram. U-Net (a deep neural network fundamentally developed for...

Full description

Saved in:

Bibliographic Details
Main Author:	Sania Gul (18272227) (author)
Other Authors:	Muhammad Salman Khan (7202543) (author)
Published:	2025
Subjects:	Engineering Communications engineering Information and computing sciences Machine learning Colors Masking Colored spectrograms U-Net Speech denoising
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864513524523335680
author	Sania Gul (18272227)
author2	Muhammad Salman Khan (7202543)
author2_role	author
author_facet	Sania Gul (18272227) Muhammad Salman Khan (7202543)
author_role	author
dc.creator.none.fl_str_mv	Sania Gul (18272227) Muhammad Salman Khan (7202543)
dc.date.none.fl_str_mv	2025-09-06T09:00:00Z
dc.identifier.none.fl_str_mv	10.1016/j.compeleceng.2025.110656
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/Single-channel_speech_denoising_by_masking_the_colored_spectrograms/31056898
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Engineering Communications engineering Information and computing sciences Machine learning Colors Masking Colored spectrograms U-Net Speech denoising
dc.title.none.fl_str_mv	Single-channel speech denoising by masking the colored spectrograms
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<p>Speech denoising (SD) covers the algorithms that remove the background noise from the target speech and thus improve its quality and intelligibility. In this paper, a novel SD technique is proposed that masks the colored spectrogram. U-Net (a deep neural network fundamentally developed for image segmentation) is trained on the noisy log-powered colored spectrograms (LPcS), using the binarized Mel spectrograms as ground truth (GT). After training, the colored spectrogram of the noisy speech is passed through U-Net, which generates a soft mask at its output. This mask is applied to the magnitude matrix of the short-time Fourier transform (STFT) of the noisy speech to retrieve the magnitude matrix of the estimated speech. This matrix is later combined with the noisy phase matrix to recover the target speech. The results show that with masking-based targets, the colored spectrograms provide an improvement of 0.12 points in perceptual evaluation of speech quality (PESQ) score, 4 % in short time objective intelligibility (STOI), and a 163 times reduction in network learnable parameters, as compared to when they are processed by a mapping-based model using pix2pix generative adversarial network (GAN) followed by a feedforward regression neural network. With a slightly reduced PESQ score (by 0.58 points), the proposed model offers an improvement of 2 % in STOI, and 4375 and 1135 times reduction respectively in the required number of training epochs and network parameters when compared to a GAN-based model augmented by WavLM; a large-scale self-supervised learning model. Similarly, it offers an improvement of 1 % in STOI and a reduction of 33 and 200 times, respectively, in network size and training epochs when compared to a complex variational U-Net-based model. Also, with comparable PESQ, the proposed system offers almost 2 % improvement in STOI, and a 2 times reduction in network size and 100 times reduction in training epochs, when compared to a lightweight system using automatic dimension reduction of network layers by a structured pruning method.</p><h2>Other Information</h2> <p> Published in: Computers and Electrical Engineering<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.compeleceng.2025.110656" target="_blank">https://dx.doi.org/10.1016/j.compeleceng.2025.110656</a></p>
eu_rights_str_mv	openAccess
id	Manara2_7d4aadc362131b53ade11276c995fccf
identifier_str_mv	10.1016/j.compeleceng.2025.110656
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/31056898
publishDate	2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	Single-channel speech denoising by masking the colored spectrogramsSania Gul (18272227)Muhammad Salman Khan (7202543)EngineeringCommunications engineeringInformation and computing sciencesMachine learningColorsMaskingColored spectrogramsU-NetSpeech denoising<p>Speech denoising (SD) covers the algorithms that remove the background noise from the target speech and thus improve its quality and intelligibility. In this paper, a novel SD technique is proposed that masks the colored spectrogram. U-Net (a deep neural network fundamentally developed for image segmentation) is trained on the noisy log-powered colored spectrograms (LPcS), using the binarized Mel spectrograms as ground truth (GT). After training, the colored spectrogram of the noisy speech is passed through U-Net, which generates a soft mask at its output. This mask is applied to the magnitude matrix of the short-time Fourier transform (STFT) of the noisy speech to retrieve the magnitude matrix of the estimated speech. This matrix is later combined with the noisy phase matrix to recover the target speech. The results show that with masking-based targets, the colored spectrograms provide an improvement of 0.12 points in perceptual evaluation of speech quality (PESQ) score, 4 % in short time objective intelligibility (STOI), and a 163 times reduction in network learnable parameters, as compared to when they are processed by a mapping-based model using pix2pix generative adversarial network (GAN) followed by a feedforward regression neural network. With a slightly reduced PESQ score (by 0.58 points), the proposed model offers an improvement of 2 % in STOI, and 4375 and 1135 times reduction respectively in the required number of training epochs and network parameters when compared to a GAN-based model augmented by WavLM; a large-scale self-supervised learning model. Similarly, it offers an improvement of 1 % in STOI and a reduction of 33 and 200 times, respectively, in network size and training epochs when compared to a complex variational U-Net-based model. Also, with comparable PESQ, the proposed system offers almost 2 % improvement in STOI, and a 2 times reduction in network size and 100 times reduction in training epochs, when compared to a lightweight system using automatic dimension reduction of network layers by a structured pruning method.</p><h2>Other Information</h2> <p> Published in: Computers and Electrical Engineering<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.compeleceng.2025.110656" target="_blank">https://dx.doi.org/10.1016/j.compeleceng.2025.110656</a></p>2025-09-06T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.compeleceng.2025.110656https://figshare.com/articles/journal_contribution/Single-channel_speech_denoising_by_masking_the_colored_spectrograms/31056898CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/310568982025-09-06T09:00:00Z
spellingShingle	Single-channel speech denoising by masking the colored spectrograms Sania Gul (18272227) Engineering Communications engineering Information and computing sciences Machine learning Colors Masking Colored spectrograms U-Net Speech denoising
status_str	publishedVersion
title	Single-channel speech denoising by masking the colored spectrograms
title_full	Single-channel speech denoising by masking the colored spectrograms
title_fullStr	Single-channel speech denoising by masking the colored spectrograms
title_full_unstemmed	Single-channel speech denoising by masking the colored spectrograms
title_short	Single-channel speech denoising by masking the colored spectrograms
title_sort	Single-channel speech denoising by masking the colored spectrograms
topic	Engineering Communications engineering Information and computing sciences Machine learning Colors Masking Colored spectrograms U-Net Speech denoising

Single-channel speech denoising by masking the colored spectrograms

Similar Items