Image and audio caps: automated captioning of background sounds and images using deep learning

<p>Image recognition based on computers is something human beings have been working on for many years. It is one of the most difficult tasks in the field of computer science, and improvements to this system are made when we speak. In this paper, we propose a methodology to automatically propos...

Full description

Saved in:

Bibliographic Details
Main Author:	M. Poongodi (14158869) (author)
Other Authors:	Mounir Hamdi (14150652) (author), Huihui Wang (442901) (author)
Published:	2022
Subjects:	Information and computing sciences Artificial intelligence Computer vision and multimedia computation Distributed computing and systems software Machine learning Computer vision Image to caption Scene recognition Image analysis Social networks
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864513567991005184
author	M. Poongodi (14158869)
author2	Mounir Hamdi (14150652) Huihui Wang (442901)
author2_role	author author
author_facet	M. Poongodi (14158869) Mounir Hamdi (14150652) Huihui Wang (442901)
author_role	author
dc.creator.none.fl_str_mv	M. Poongodi (14158869) Mounir Hamdi (14150652) Huihui Wang (442901)
dc.date.none.fl_str_mv	2022-02-26T06:00:00Z
dc.identifier.none.fl_str_mv	10.1007/s00530-022-00902-0
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/Image_and_audio_caps_automated_captioning_of_background_sounds_and_images_using_deep_learning/21597084
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Information and computing sciences Artificial intelligence Computer vision and multimedia computation Distributed computing and systems software Machine learning Computer vision Image to caption Scene recognition Image analysis Social networks
dc.title.none.fl_str_mv	Image and audio caps: automated captioning of background sounds and images using deep learning
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<p>Image recognition based on computers is something human beings have been working on for many years. It is one of the most difficult tasks in the field of computer science, and improvements to this system are made when we speak. In this paper, we propose a methodology to automatically propose an appropriate title and add a specific sound to the image. Two models have been extensively trained and combined to achieve this effect. Sounds are recommended based on the image scene and the headings are generated using a combination of natural language processing and state-of-the-art computer vision models. A Top 5 accuracy of 67% and a Top 1 accuracy of 53% have been achieved. It is also worth mentioning that this is also the first model of its kind to make this forecast.</p><h2>Other Information</h2> <p> Published in: Multimedia Systems<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1007/s00530-022-00902-0" target="_blank">http://dx.doi.org/10.1007/s00530-022-00902-0</a></p>
eu_rights_str_mv	openAccess
id	Manara2_5ac01ca8196e99ed6e9e9f34f759b367
identifier_str_mv	10.1007/s00530-022-00902-0
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/21597084
publishDate	2022
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	Image and audio caps: automated captioning of background sounds and images using deep learningM. Poongodi (14158869)Mounir Hamdi (14150652)Huihui Wang (442901)Information and computing sciencesArtificial intelligenceComputer vision and multimedia computationDistributed computing and systems softwareMachine learningComputer visionImage to captionScene recognitionImage analysisSocial networks<p>Image recognition based on computers is something human beings have been working on for many years. It is one of the most difficult tasks in the field of computer science, and improvements to this system are made when we speak. In this paper, we propose a methodology to automatically propose an appropriate title and add a specific sound to the image. Two models have been extensively trained and combined to achieve this effect. Sounds are recommended based on the image scene and the headings are generated using a combination of natural language processing and state-of-the-art computer vision models. A Top 5 accuracy of 67% and a Top 1 accuracy of 53% have been achieved. It is also worth mentioning that this is also the first model of its kind to make this forecast.</p><h2>Other Information</h2> <p> Published in: Multimedia Systems<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1007/s00530-022-00902-0" target="_blank">http://dx.doi.org/10.1007/s00530-022-00902-0</a></p>2022-02-26T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1007/s00530-022-00902-0https://figshare.com/articles/journal_contribution/Image_and_audio_caps_automated_captioning_of_background_sounds_and_images_using_deep_learning/21597084CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/215970842022-02-26T06:00:00Z
spellingShingle	Image and audio caps: automated captioning of background sounds and images using deep learning M. Poongodi (14158869) Information and computing sciences Artificial intelligence Computer vision and multimedia computation Distributed computing and systems software Machine learning Computer vision Image to caption Scene recognition Image analysis Social networks
status_str	publishedVersion
title	Image and audio caps: automated captioning of background sounds and images using deep learning
title_full	Image and audio caps: automated captioning of background sounds and images using deep learning
title_fullStr	Image and audio caps: automated captioning of background sounds and images using deep learning
title_full_unstemmed	Image and audio caps: automated captioning of background sounds and images using deep learning
title_short	Image and audio caps: automated captioning of background sounds and images using deep learning
title_sort	Image and audio caps: automated captioning of background sounds and images using deep learning
topic	Information and computing sciences Artificial intelligence Computer vision and multimedia computation Distributed computing and systems software Machine learning Computer vision Image to caption Scene recognition Image analysis Social networks

Image and audio caps: automated captioning of background sounds and images using deep learning

Similar Items