PlumX Metrics
Embed PlumX Metrics

DualLip: A System for Joint Lip Reading and Generation

MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia, Page: 1985-1993
2020
  • 25
    Citations
  • 0
    Usage
  • 46
    Captures
  • 0
    Mentions
  • 6
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

  • Citations
    25
    • Citation Indexes
      25
  • Captures
    46
  • Social Media
    6
    • Shares, Likes & Comments
      6
      • Facebook
        6

Conference Paper Description

Lip reading aims to recognize text from talking lip, while lip generation aims to synthesize talking lip according to text, which is a key component in talking face generation and is a dual task of lip reading. Both tasks require a large amount of paired lip video and text training data, and perform poorly in low-resource scenarios with limited paired training data. In this paper, we develop DualLip, a system that jointly improves lip reading and generation by leveraging the task duality and using unlabeled text and lip video data. The key ideas of the DualLip include: 1) Generate lip video from unlabeled text using a lip generation model, and use the pseudo data pairs to improve lip reading; 2) Generate text from unlabeled lip video using a lip reading model, and use the pseudo data pairs to improve lip generation. To leverage the benefit of DualLip on lip generation, we further extend DualLip to talking face generation with two additionally introduced components: lip to face generation and text to speech generation, which share the same duration for synchronization. Experiments on GRID and TCD-TIMIT datasets demonstrate the effectiveness of DualLip on improving lip reading, lip generation and talking face generation by utilizing unlabeled data, especially in low-resource scenarios. Specifically, on the GRID dataset, the lip generation model in our DualLip system trained with only 10% paired data and 90% unpaired data surpasses the performance of that trained with the whole paired data, and our lip reading model achieves 1.16% character error rate and 2.71% word error rate, outperforming the state-of-the-art models using the same amount of paired data.

Bibliographic Details

Weicong Chen; Yu Wang; Xu Tan; Yingce Xia; Tao Qin; Tie Yan Liu

Association for Computing Machinery (ACM)

Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know