Ekanayake Mudiyanselage Chulabhaya Lankanatha Ekanayake1,2, Abubakar Sulaiman Gezawa3,*, Yunqi Lei1
CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 2941-2965, 2024, DOI:10.32604/cmc.2024.046155
- 26 March 2024
Abstract Video description generates natural language sentences that describe the subject, verb, and objects of the targeted Video. The video description has been used to help visually impaired people to understand the content. It is also playing an essential role in devolving human-robot interaction. The dense video description is more difficult when compared with simple Video captioning because of the object’s interactions and event overlapping. Deep learning is changing the shape of computer vision (CV) technologies and natural language processing (NLP). There are hundreds of deep learning models, datasets, and evaluations that can improve the gaps… More >