Automatic animation of an articulatory tongue model from ultrasound images of the vocal tract

Citation data:

Speech Communication, ISSN: 0167-6393, Vol: 93, Page: 63-75

Publication Year:
Usage 47
Abstract Views 45
Link-outs 2
Captures 31
Readers 21
Exports-Saves 10
Mentions 2
Blog Mentions 1
News Mentions 1
Social Media 34
Tweets 27
Shares, Likes & Comments 7
Citations 3
Citation Indexes 3
Diandra Fabre; Thomas Hueber; Laurent Girin; Xavier Alameda-Pineda; Pierre Badin
Elsevier BV
Computer Science; Arts and Humanities; Mathematics; Social Sciences
Most Recent Tweet View All Tweets
Most Recent Blog Mention

Машинное обучение и УЗИ помогут освоить иностранный язык и восстановиться после операции

Oct. 14, 2017 | N+1: научные статьи, новости, открытия

Группа ученых из Франции и Италии разработала систему визуализации движений языка и гортани в реальном времени, для чего используется апп...

Read full Blog Mention

Most Recent News Mention
article description
Visual biofeedback is the process of gaining awareness of physiological functions through the display of visual information. As speech is concerned, visual biofeedback usually consists in showing a speaker his/her own articulatory movements, which has proven useful in applications such as speech therapy or second language learning. This article presents a novel method for automatically animating an articulatory tongue model from ultrasound images. Integrating this model into a virtual talking head enables to overcome the limitations of displaying raw ultrasound images, and provides a more complete and user-friendly feedback by showing not only the tongue, but also the palate, teeth, pharynx, etc. Altogether, these cues are expected to lead to an easier understanding of the tongue movements. Our approach is based on a probabilistic model which converts raw ultrasound images of the vocal tract into control parameters of the articulatory tongue model. We investigated several mapping techniques such as the Gaussian Mixture Regression (GMR), and in particular the Cascaded Gaussian Mixture Regression (C-GMR) techniques, recently proposed in the context of acoustic-articulatory inversion. Both techniques are evaluated on a multispeaker database. The C-GMR consists in the adaptation of a GMR reference model, trained with a large dataset of multimodal articulatory data from a reference speaker, to a new source speaker using a small set of adaptation data recorded during a preliminary enrollment session (system calibration). By using prior information from the reference model, the C-GMR approach is able (i) to maintain good mapping performance while minimizing the amount of adaptation data (and thus limiting the duration of the enrollment session), and (ii) to generalize to articulatory configurations not seen during enrollment better than the GMR approach. As a result, the C-GMR appears to be a good mapping technique for a practical system of visual biofeedback.