The Storyteller: Computer Vision Driven Context and Content Generation System

Citation DataSSRN, ISSN: 1556-5068

Publication Year2023

0
Citations
158
Usage
0
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Usage
158
- Abstract Views
  123
- Downloads
  35

Article Description

The human capability of detecting, understanding, and contextualizing objects in the real world by machines has always been a dream for computer scientists. Along with other important and pending challenges in computer vision, image captioning with context and content is an important research problem. In our research, we attempted to develop a human-like storytelling system that can caption images with the perspective of content, context, syntax, and knowledge. Our methodology combines Capsule Networks for image encoding, Knowledge Graphs for content and context awareness, and Transformer Neural Networks for decoding. Spatial, geometrical, and orientational details are extracted using Capsule Networks during feature extraction. The corpus is passed through the Knowledge Graph to equip our model with content, context, and semantics. The decoding phase combines Knowledge Graph and Transformer Neural Network for knowledge-driven captioning. Dynamic multi-headed attention in the decoder is used for memory optimization. Our model is trained over MSCOCO and tested over MSCOCO, Flickr16K, and Google Images. The results provide good content and context understanding with B4: 71.93, M: 39.14, C: 136.53, and R: 94.32. The usage of adverbs and adjectives within the generated sentence according to the objects' geometrical and semantic relationship is phenomenal. The primary outcome of our research is generating autonomous story-type captions for real-world images.

Bibliographic Details

DOI10.2139/ssrn.4614717

SSRN ID4614717

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85175513650&origin=inward; http://dx.doi.org/10.2139/ssrn.4614717; https://dx.doi.org/10.2139/ssrn.4614717; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4614717; https://ssrn.com/abstract=4614717

AUTHOR(S)

Anwar ul Haque; Sayeed Ghani; Muhammad Saeed; Hardy Schloer

PUBLISHER(S)

Elsevier BV

TAG(S)

Multidisciplinary; Capsule NetworksImage CaptioningKnowledge Graphs Transformer Neural NetworksContext-aware captioning

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know