Image and audio caps: automated captioning of background sounds and images using deep learning
<p>Image recognition based on computers is something human beings have been working on for many years. It is one of the most difficult tasks in the field of computer science, and improvements to this system are made when we speak. In this paper, we propose a methodology to automatically propos...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , |
| Published: |
2022
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1864513567991005184 |
|---|---|
| author | M. Poongodi (14158869) |
| author2 | Mounir Hamdi (14150652) Huihui Wang (442901) |
| author2_role | author author |
| author_facet | M. Poongodi (14158869) Mounir Hamdi (14150652) Huihui Wang (442901) |
| author_role | author |
| dc.creator.none.fl_str_mv | M. Poongodi (14158869) Mounir Hamdi (14150652) Huihui Wang (442901) |
| dc.date.none.fl_str_mv | 2022-02-26T06:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1007/s00530-022-00902-0 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/Image_and_audio_caps_automated_captioning_of_background_sounds_and_images_using_deep_learning/21597084 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Information and computing sciences Artificial intelligence Computer vision and multimedia computation Distributed computing and systems software Machine learning Computer vision Image to caption Scene recognition Image analysis Social networks |
| dc.title.none.fl_str_mv | Image and audio caps: automated captioning of background sounds and images using deep learning |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p>Image recognition based on computers is something human beings have been working on for many years. It is one of the most difficult tasks in the field of computer science, and improvements to this system are made when we speak. In this paper, we propose a methodology to automatically propose an appropriate title and add a specific sound to the image. Two models have been extensively trained and combined to achieve this effect. Sounds are recommended based on the image scene and the headings are generated using a combination of natural language processing and state-of-the-art computer vision models. A Top 5 accuracy of 67% and a Top 1 accuracy of 53% have been achieved. It is also worth mentioning that this is also the first model of its kind to make this forecast.</p><h2>Other Information</h2> <p> Published in: Multimedia Systems<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1007/s00530-022-00902-0" target="_blank">http://dx.doi.org/10.1007/s00530-022-00902-0</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_5ac01ca8196e99ed6e9e9f34f759b367 |
| identifier_str_mv | 10.1007/s00530-022-00902-0 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/21597084 |
| publishDate | 2022 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Image and audio caps: automated captioning of background sounds and images using deep learningM. Poongodi (14158869)Mounir Hamdi (14150652)Huihui Wang (442901)Information and computing sciencesArtificial intelligenceComputer vision and multimedia computationDistributed computing and systems softwareMachine learningComputer visionImage to captionScene recognitionImage analysisSocial networks<p>Image recognition based on computers is something human beings have been working on for many years. It is one of the most difficult tasks in the field of computer science, and improvements to this system are made when we speak. In this paper, we propose a methodology to automatically propose an appropriate title and add a specific sound to the image. Two models have been extensively trained and combined to achieve this effect. Sounds are recommended based on the image scene and the headings are generated using a combination of natural language processing and state-of-the-art computer vision models. A Top 5 accuracy of 67% and a Top 1 accuracy of 53% have been achieved. It is also worth mentioning that this is also the first model of its kind to make this forecast.</p><h2>Other Information</h2> <p> Published in: Multimedia Systems<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1007/s00530-022-00902-0" target="_blank">http://dx.doi.org/10.1007/s00530-022-00902-0</a></p>2022-02-26T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1007/s00530-022-00902-0https://figshare.com/articles/journal_contribution/Image_and_audio_caps_automated_captioning_of_background_sounds_and_images_using_deep_learning/21597084CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/215970842022-02-26T06:00:00Z |
| spellingShingle | Image and audio caps: automated captioning of background sounds and images using deep learning M. Poongodi (14158869) Information and computing sciences Artificial intelligence Computer vision and multimedia computation Distributed computing and systems software Machine learning Computer vision Image to caption Scene recognition Image analysis Social networks |
| status_str | publishedVersion |
| title | Image and audio caps: automated captioning of background sounds and images using deep learning |
| title_full | Image and audio caps: automated captioning of background sounds and images using deep learning |
| title_fullStr | Image and audio caps: automated captioning of background sounds and images using deep learning |
| title_full_unstemmed | Image and audio caps: automated captioning of background sounds and images using deep learning |
| title_short | Image and audio caps: automated captioning of background sounds and images using deep learning |
| title_sort | Image and audio caps: automated captioning of background sounds and images using deep learning |
| topic | Information and computing sciences Artificial intelligence Computer vision and multimedia computation Distributed computing and systems software Machine learning Computer vision Image to caption Scene recognition Image analysis Social networks |