Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition
Eurasip Journal on Advances in Signal Processing, ISSN: 1687-6180, Vol: 2016, Issue: 1
2016
- 5Citations
- 11Captures
Metric Options: CountsSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Article Description
We explore joint training strategies of DNNs for simultaneous dereverberation and acoustic modeling to improve the performance of distant speech recognition. There are two key contributions. First, a new DNN structure incorporating both dereverberated and original reverberant features is shown to effectively improve recognition accuracy over the conventional one using only dereverberated features as the input. Second, in most of the simulated reverberant environments for training data collection and DNN-based dereverberation, the resource data and learning targets are high-quality clean speech. With our joint training strategy, we can relax this constraint by using large-scale diversified real close-talking data as the targets which are easy to be collected via many speech-enabled applications from mobile internet users, and find the scenario even more effective. Our experiments on a Mandarin speech recognition task with 2000-h training data show that the proposed framework achieves relative word error rate reductions of 9.7 and 8.6 % over the multi-condition training systems for the cases of single-channel and multi-channel with beamforming, respectively. Furthermore, significant gains are consistently observed over the pre-processing approach using simply DNN-based dereverberation.
Bibliographic Details
http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=84982694927&origin=inward; http://dx.doi.org/10.1186/s13634-016-0384-5; https://asp-eurasipjournals.springeropen.com/articles/10.1186/s13634-016-0384-5; http://link.springer.com/content/pdf/10.1186/s13634-016-0384-5; http://link.springer.com/content/pdf/10.1186/s13634-016-0384-5.pdf; http://link.springer.com/article/10.1186/s13634-016-0384-5/fulltext.html; https://dx.doi.org/10.1186/s13634-016-0384-5
Springer Science and Business Media LLC
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know