STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization

<p>Although many high-performing speech separation models have been proposed recently, little attention has been paid to making them lightweight. In this paper, a novel speech separation algorithm is proposed that integrates the twin-delayed deep deterministic (TD3) policy gradient reinforceme...

Full description

Saved in:

Bibliographic Details
Main Author:	Muhammad Salman Khan (7202543) (author)
Other Authors:	Sania Gul (18272227) (author)
Published:	2025
Subjects:	Information and computing sciences Artificial intelligence Machine learning Speech separation Reinforcement learning Continuous action space Spatial cues Reward function Time–frequency masking
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864513539910139904
author	Muhammad Salman Khan (7202543)
author2	Sania Gul (18272227)
author2_role	author
author_facet	Muhammad Salman Khan (7202543) Sania Gul (18272227)
author_role	author
dc.creator.none.fl_str_mv	Muhammad Salman Khan (7202543) Sania Gul (18272227)
dc.date.none.fl_str_mv	2025-08-23T15:00:00Z
dc.identifier.none.fl_str_mv	10.1016/j.apacoust.2025.111022
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/STEM_spatial_speech_separation_using_twin-delayed_DDPG_reinforcement_learning_and_expectation_maximization/30135205
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Information and computing sciences Artificial intelligence Machine learning Speech separation Reinforcement learning Continuous action space Spatial cues Reward function Time–frequency masking
dc.title.none.fl_str_mv	STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<p>Although many high-performing speech separation models have been proposed recently, little attention has been paid to making them lightweight. In this paper, a novel speech separation algorithm is proposed that integrates the twin-delayed deep deterministic (TD3) policy gradient reinforcement learning (RL) agent with the expectation maximization (EM) algorithm for clustering the spatial cues of individual sources separated on azimuth. For stationary sources, the proposed system gives satisfactory performance in terms of quality, intelligibility, and separation speed, and generalizes well with the test data from a mismatched speech corpus. Its perceptual evaluation of speech quality (PESQ) score is 0.55 points better than a self-supervised learning (SSL) model and almost equivalent to the diffusion models at computational cost and training data which is many folds lesser than required by these algorithms. Additionally, it reduces the required training data by 39 times, training time by 36 times, model size by 6 times, real time factor (RTF) by 1 point, and multiply-accumulate operations (MACs) by 9 times compared to a recently proposed lightweight transformer-based encoder-decoder framework, while offering a slight decrease in PESQ score (by 0.45 points).</p><h2>Other Information</h2> <p> Published in: Applied Acoustics<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.apacoust.2025.111022" target="_blank">https://dx.doi.org/10.1016/j.apacoust.2025.111022</a></p>
eu_rights_str_mv	openAccess
id	Manara2_035a33b8adb3c0d2149359e752f33647
identifier_str_mv	10.1016/j.apacoust.2025.111022
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/30135205
publishDate	2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximizationMuhammad Salman Khan (7202543)Sania Gul (18272227)Information and computing sciencesArtificial intelligenceMachine learningSpeech separationReinforcement learningContinuous action spaceSpatial cuesReward functionTime–frequency masking<p>Although many high-performing speech separation models have been proposed recently, little attention has been paid to making them lightweight. In this paper, a novel speech separation algorithm is proposed that integrates the twin-delayed deep deterministic (TD3) policy gradient reinforcement learning (RL) agent with the expectation maximization (EM) algorithm for clustering the spatial cues of individual sources separated on azimuth. For stationary sources, the proposed system gives satisfactory performance in terms of quality, intelligibility, and separation speed, and generalizes well with the test data from a mismatched speech corpus. Its perceptual evaluation of speech quality (PESQ) score is 0.55 points better than a self-supervised learning (SSL) model and almost equivalent to the diffusion models at computational cost and training data which is many folds lesser than required by these algorithms. Additionally, it reduces the required training data by 39 times, training time by 36 times, model size by 6 times, real time factor (RTF) by 1 point, and multiply-accumulate operations (MACs) by 9 times compared to a recently proposed lightweight transformer-based encoder-decoder framework, while offering a slight decrease in PESQ score (by 0.45 points).</p><h2>Other Information</h2> <p> Published in: Applied Acoustics<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.apacoust.2025.111022" target="_blank">https://dx.doi.org/10.1016/j.apacoust.2025.111022</a></p>2025-08-23T15:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.apacoust.2025.111022https://figshare.com/articles/journal_contribution/STEM_spatial_speech_separation_using_twin-delayed_DDPG_reinforcement_learning_and_expectation_maximization/30135205CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/301352052025-08-23T15:00:00Z
spellingShingle	STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization Muhammad Salman Khan (7202543) Information and computing sciences Artificial intelligence Machine learning Speech separation Reinforcement learning Continuous action space Spatial cues Reward function Time–frequency masking
status_str	publishedVersion
title	STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization
title_full	STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization
title_fullStr	STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization
title_full_unstemmed	STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization
title_short	STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization
title_sort	STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization
topic	Information and computing sciences Artificial intelligence Machine learning Speech separation Reinforcement learning Continuous action space Spatial cues Reward function Time–frequency masking

STEM: spatial speech separation using twin-delayed DDPG reinforcement learning and expectation maximization

Similar Items