Code and data

Code and data

<p dir="ltr">State-of-the-art (SOTA) Automatic Speech Recognition (ASR) systems primarily rely on acoustic information while disregarding additional multi-modal context. However, visual information are essential in disambiguation and adaptation. </p><p dir="ltr">...

Full description

Saved in:

Bibliographic Details
Main Author:	Supriti Sinhamahapatra (22271917) (author)
Other Authors:	Jan Niehues (22272010) (author)
Published:	2025
Subjects:	Speech recognition speech recognition outcomes multi modal ASR
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ssRSA for EMEG data.
by: Cai Wingfield (554068)
Published: (2017)

Relating brain data dRDMs to phone model dRDMs and converting to feature fits.
by: Cai Wingfield (554068)
Published: (2017)

Maps of fit for each feature.
by: Cai Wingfield (554068)
Published: (2017)

Similarities between model RDMs and phonetic features.
by: Cai Wingfield (554068)
Published: (2017)

Second-order similarity structure of phone models.
by: Cai Wingfield (554068)
Published: (2017)

The mapping between articulatory features and phonetic labels.
by: Cai Wingfield (554068)
Published: (2017)

Mapping from GMM–HMM triphone log likelihoods to phone model RDMs.
by: Cai Wingfield (554068)
Published: (2017)

Representational similarity analysis.
by: Cai Wingfield (554068)
Published: (2017)

ASR_Core.graphml
by: Martin Hagmueller (4508089)
Published: (2017)

Roles of phonation types and decoders’ gender (Chang et al., 2023)
by: Yajie Chang (10272722)
Published: (2023)

Speech recognition to identify DLD in bilinguals (Albudoor & Peña, 2022)
by: Nahar Albudoor (13035019)
Published: (2022)

Frame accuracies with multi-resolution features and FC-sigmoid-RBM-pretrain DNNs with hidden layers of 2048 units.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Frame accuracies with multi-resolution features and FC-sigmoid-RBM-pretrain DNNs with hidden layers of 1024 units.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Results with multi-resolution features and FC-sigmoid-RBM-pretrain DNNs with hidden layers of 2048 units.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Results with multi-resolution features and FC-sigmoid-RBM-pretrain DNNs with hidden layers of 1024 units.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Configurations used with multi-resolution highly processed and speaker-adapted features and FC-sigmoid-RBM-pretrain DNNs.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Results with multi-resolution spectrograms and Fully Connected feedforward DNNs trained with Keras and Theano with ReLU activation functions and ±4 feature splicing.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Results with multi-resolution spectrograms and TDNNs-ReLU networks with ±10 feature splicing.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Results with multi-resolution spectrograms and FC-pnorm networks.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Results with simplified features.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Baseline results with Kaldi.
by: Doroteo T. Toledano (2175334)
Published: (2018)

TIMIT phone duration statistics for the longest (/<i>aw</i>/) and shortest (/<i>b</i>/) phones in the set of 48 phones used for training.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Multiresolution spectrum computation.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Deep Neural Network (DNN).
by: Doroteo T. Toledano (2175334)
Published: (2018)

Mel-frequency scaled filterbank.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Spectrograms with different time-resolution trade-offs for a long phone.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Spectrograms with different time-resolution trade-offs for a short phone.
by: Doroteo T. Toledano (2175334)
Published: (2018)

Interspeech 2016 - Experiment results for Sheffield Wargame Corpora (SWC1, SWC2, SWC3)
by: Yulan Liu (1376505)
Published: (2016)

Distribution of human empathy ratings for the 200 sessions in the CTT trial.
by: Bo Xiao (347938)
Published: (2015)

Overview of processing steps for moving from audio recording of session to predicted value of empathy.
by: Bo Xiao (347938)
Published: (2015)

Summary of psychotherapy corpora and role in automatic empathy evaluation.
by: Bo Xiao (347938)
Published: (2015)

Empathy prediction performance.
by: Bo Xiao (347938)
Published: (2015)

High vs. low empathy tri-grams.
by: Bo Xiao (347938)
Published: (2015)

S1_File.pdf.
by: Bo Xiao (347938)
Published: (2015)

Detection rate in conversational vs audience-oriented files.
by: Tirza Biron (10739502)
Published: (2021)

Table 2 - Automatic detection of prosodic boundaries in spontaneous speech
by: Tirza Biron (10739502)
Published: (2021)

Evaluation of segmentation methods for spontaneous speech.
by: Tirza Biron (10739502)
Published: (2021)

Pauses comparable to (or longer than) the duration of a word mark boundaries.
by: Tirza Biron (10739502)
Published: (2021)

Frequent words are over-represented at beginnings of automatically identified phrases.
by: Tirza Biron (10739502)
Published: (2021)

Automatic and manual tagging exhibit pitch reset.
by: Tirza Biron (10739502)
Published: (2021)