DualLip: A System for Joint Lip Reading and Generation

Citation DataMM 2020 - Proceedings of the 28th ACM International Conference on Multimedia, Page: 1985-1993

Publication Year2020

25
Citations
0
Usage
46
Captures
0
Mentions
6
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
25
- Citation Indexes
  25
Captures
46
- Readers
  46
Social Media
6
- Shares, Likes & Comments
  6

Conference Paper Description

Lip reading aims to recognize text from talking lip, while lip generation aims to synthesize talking lip according to text, which is a key component in talking face generation and is a dual task of lip reading. Both tasks require a large amount of paired lip video and text training data, and perform poorly in low-resource scenarios with limited paired training data. In this paper, we develop DualLip, a system that jointly improves lip reading and generation by leveraging the task duality and using unlabeled text and lip video data. The key ideas of the DualLip include: 1) Generate lip video from unlabeled text using a lip generation model, and use the pseudo data pairs to improve lip reading; 2) Generate text from unlabeled lip video using a lip reading model, and use the pseudo data pairs to improve lip generation. To leverage the benefit of DualLip on lip generation, we further extend DualLip to talking face generation with two additionally introduced components: lip to face generation and text to speech generation, which share the same duration for synchronization. Experiments on GRID and TCD-TIMIT datasets demonstrate the effectiveness of DualLip on improving lip reading, lip generation and talking face generation by utilizing unlabeled data, especially in low-resource scenarios. Specifically, on the GRID dataset, the lip generation model in our DualLip system trained with only 10% paired data and 90% unpaired data surpasses the performance of that trained with the whole paired data, and our lip reading model achieves 1.16% character error rate and 2.71% word error rate, outperforming the state-of-the-art models using the same amount of paired data.

Bibliographic Details

DOI10.1145/3394171.3413623

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85106890345&origin=inward; http://dx.doi.org/10.1145/3394171.3413623; https://dl.acm.org/doi/10.1145/3394171.3413623; https://dx.doi.org/10.1145/3394171.3413623

AUTHOR(S)

Weicong Chen; Yu Wang; Xu Tan; Yingce Xia; Tao Qin; Tie Yan Liu

PUBLISHER(S)

Association for Computing Machinery (ACM)

TAG(S)

Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know