Deep Learning-Based Coding Strategy for Improved Cochlear Implant Speech Perception in Noisy Environments

<p dir="ltr">Automatic speech recognition (ASR) and speech enhancement are essential tools in modern life, aiding not only in machine interaction but also in supporting individuals with hearing impairments. These processes begin with capturing speech in analog form and applying signa...

Full description

Saved in:
Bibliographic Details
Main Author: Billel Essaid (22047578) (author)
Other Authors: Hamza Kheddar (17337712) (author), Noureddine Batel (22047581) (author), Muhammad E. H. Chowdhury (14150526) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<p dir="ltr">Automatic speech recognition (ASR) and speech enhancement are essential tools in modern life, aiding not only in machine interaction but also in supporting individuals with hearing impairments. These processes begin with capturing speech in analog form and applying signal processing algorithms to ensure compatibility with devices like cochlear implants (CIs). However, CIs, with their limited number of electrodes, often cause speech distortion, and despite advancements in state-of-the-art signal processing techniques, challenges persist, particularly in noisy environments with multiple speech sources. The rise of artificial intelligence (AI) has introduced innovative strategies to address these limitations. This paper presents a novel deep learning (DL)-based technique that leverages attention mechanisms to improve speech intelligibility through noise suppression. The proposed approach includes two strategies: the first integrates temporal convolutional networks (TCNs) and multi-head attention (MHA) layers to capture both local and global dependencies within the speech signal, enabling precise noise filtering and improved clarity. The second strategy builds on this framework by additionally incorporating bidirectional gated recurrent units (Bi-GRU) alongside TCN and MHA layers, further refining sequence modeling and enhancing noise reduction. The optimal model configuration, using TCN-MHA-Bi-GRU with a kernel size of 16, achieved a compact model size of 788K parameters and recorded training, and validation losses of 0.0350 and 0.0446, respectively. Experimental results on the TIMIT and Harvard Sentences datasets, enriched with diverse noise sources from the DEMAND database, yielded high intelligibility scores with a short-time objective intelligibility (STOI) of 0.8345, word recognition score (WRS) of 99.2636, and an near correlation coefficient (LCC) of 0.9607, underscoring the model’s capability to enhance speech perception in noisy CI environments, ensuring a balance between model size and speech quality, and surpassing the existing state-of-the-art techniques.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3542953" target="_blank">https://dx.doi.org/10.1109/access.2025.3542953</a></p>