The Storyteller: Computer Vision Driven Context and Content Generation System
SSRN, ISSN: 1556-5068
2023
- 158Usage
Metric Options: Counts1 Year3 YearSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Article Description
The human capability of detecting, understanding, and contextualizing objects in the real world by machines has always been a dream for computer scientists. Along with other important and pending challenges in computer vision, image captioning with context and content is an important research problem. In our research, we attempted to develop a human-like storytelling system that can caption images with the perspective of content, context, syntax, and knowledge. Our methodology combines Capsule Networks for image encoding, Knowledge Graphs for content and context awareness, and Transformer Neural Networks for decoding. Spatial, geometrical, and orientational details are extracted using Capsule Networks during feature extraction. The corpus is passed through the Knowledge Graph to equip our model with content, context, and semantics. The decoding phase combines Knowledge Graph and Transformer Neural Network for knowledge-driven captioning. Dynamic multi-headed attention in the decoder is used for memory optimization. Our model is trained over MSCOCO and tested over MSCOCO, Flickr16K, and Google Images. The results provide good content and context understanding with B4: 71.93, M: 39.14, C: 136.53, and R: 94.32. The usage of adverbs and adjectives within the generated sentence according to the objects' geometrical and semantic relationship is phenomenal. The primary outcome of our research is generating autonomous story-type captions for real-world images.
Bibliographic Details
Elsevier BV
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know