Arabic Lipreading Using YOLO and CNN Models
Lecture Notes in Networks and Systems, ISSN: 2367-3389, Vol: 1145 LNNS, Page: 13-23
2024
Metric Options: CountsSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Conference Paper Description
Lipreading is a vital aspect of human communication and requires effective computational methods. Lip movements, integral to this process, present challenges such as variability and context dependence. Recent developments in deep learning show potential for enhancing Arabic visual speech recognition (VSR) systems. This paper focuses on leveraging deep learning to assist Arabian individuals with hearing impairments, reduce their communication barriers, and enhance their quality of life. We employed our Arabic-created dataset, including YOLO version V7 as a frontend for mouth detection and CNN models (i) InceptionV3, and (ii) custom CNN model for speech classification. Our approach aims to address the complexities of lipreading. Our results show promise, with an impressive 90% speech recognition accuracy. These results underscore the capacity of deep learning to improve visual speech recognition, Facilitating the development of more effective and precise methods for detection and recognition.
Bibliographic Details
http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85211772428&origin=inward; http://dx.doi.org/10.1007/978-3-031-71848-9_2; https://link.springer.com/10.1007/978-3-031-71848-9_2; https://dx.doi.org/10.1007/978-3-031-71848-9_2; https://link.springer.com/chapter/10.1007/978-3-031-71848-9_2
Springer Science and Business Media LLC
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know