Improving text-to-image generation with object layout guidance

<p>The automatic generation of realistic images directly from a story text is a very challenging problem, as it cannot be addressed using a single image generation approach due mainly to the semantic complexity of the story text constituents. In this work, we propose a new approach that decomp...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Jezia Zakraoui (14151399) (author)
مؤلفون آخرون: Moutaz Saleh (14151402) (author), Somaya Al-Maadeed (5178131) (author), Jihad Mohammed Jaam (14151411) (author)
منشور في: 2022
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513567564234752
author Jezia Zakraoui (14151399)
author2 Moutaz Saleh (14151402)
Somaya Al-Maadeed (5178131)
Jihad Mohammed Jaam (14151411)
author2_role author
author
author
author_facet Jezia Zakraoui (14151399)
Moutaz Saleh (14151402)
Somaya Al-Maadeed (5178131)
Jihad Mohammed Jaam (14151411)
author_role author
dc.creator.none.fl_str_mv Jezia Zakraoui (14151399)
Moutaz Saleh (14151402)
Somaya Al-Maadeed (5178131)
Jihad Mohammed Jaam (14151411)
dc.date.none.fl_str_mv 2022-11-22T21:13:56Z
dc.identifier.none.fl_str_mv 10.1007/s11042-021-11038-0
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Improving_text-to-image_generation_with_object_layout_guidance/21597408
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Computer vision and multimedia computation
Distributed computing and systems software
Computer Networks and Communications
Hardware and Architecture
Media Technology
Software
dc.title.none.fl_str_mv Improving text-to-image generation with object layout guidance
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>The automatic generation of realistic images directly from a story text is a very challenging problem, as it cannot be addressed using a single image generation approach due mainly to the semantic complexity of the story text constituents. In this work, we propose a new approach that decomposes the task of story visualization into three phases: semantic text understanding, object layout prediction, and image generation and refinement. We start by simplifying the text using a scene graph triple notation that encodes semantic relationships between the story objects. We then introduce an object layout module to capture the features of these objects from the corresponding scene graph. Specifically, the object layout module aggregates individual object features from the scene graph as well as averaged or likelihood object features generated by a graph convolutional neural network. All these features are concatenated to form semantic triples that are then provided to the image generation framework. For the image generation phase, we adopt a scene graph image generation framework as stage-I, which is refined using a StackGAN as stage-II conditioned on the object layout module and the generated output image from stage-I. Our approach renders object details in high-resolution images while keeping the image structure consistent with the input text. To evaluate the performance of our approach, we use the COCO dataset and compare it with three baseline approaches, namely, sg2im, StackGAN and AttnGAN, in terms of image quality and user evaluation. According to the obtained assessment results, our object layout guidance-based approach significantly outperforms the abovementioned baseline approaches in terms of the accuracy of semantic matching and realism of the generated images representing the story text sentences.</p><h2>Other Information</h2> <p> Published in: Multimedia Tools and Applications<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1007/s11042-021-11038-0" target="_blank">http://dx.doi.org/10.1007/s11042-021-11038-0</a></p>
eu_rights_str_mv openAccess
id Manara2_56dfb64bd132736edcb6d9cc40e2d579
identifier_str_mv 10.1007/s11042-021-11038-0
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/21597408
publishDate 2022
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Improving text-to-image generation with object layout guidanceJezia Zakraoui (14151399)Moutaz Saleh (14151402)Somaya Al-Maadeed (5178131)Jihad Mohammed Jaam (14151411)Information and computing sciencesComputer vision and multimedia computationDistributed computing and systems softwareComputer Networks and CommunicationsHardware and ArchitectureMedia TechnologySoftware<p>The automatic generation of realistic images directly from a story text is a very challenging problem, as it cannot be addressed using a single image generation approach due mainly to the semantic complexity of the story text constituents. In this work, we propose a new approach that decomposes the task of story visualization into three phases: semantic text understanding, object layout prediction, and image generation and refinement. We start by simplifying the text using a scene graph triple notation that encodes semantic relationships between the story objects. We then introduce an object layout module to capture the features of these objects from the corresponding scene graph. Specifically, the object layout module aggregates individual object features from the scene graph as well as averaged or likelihood object features generated by a graph convolutional neural network. All these features are concatenated to form semantic triples that are then provided to the image generation framework. For the image generation phase, we adopt a scene graph image generation framework as stage-I, which is refined using a StackGAN as stage-II conditioned on the object layout module and the generated output image from stage-I. Our approach renders object details in high-resolution images while keeping the image structure consistent with the input text. To evaluate the performance of our approach, we use the COCO dataset and compare it with three baseline approaches, namely, sg2im, StackGAN and AttnGAN, in terms of image quality and user evaluation. According to the obtained assessment results, our object layout guidance-based approach significantly outperforms the abovementioned baseline approaches in terms of the accuracy of semantic matching and realism of the generated images representing the story text sentences.</p><h2>Other Information</h2> <p> Published in: Multimedia Tools and Applications<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1007/s11042-021-11038-0" target="_blank">http://dx.doi.org/10.1007/s11042-021-11038-0</a></p>2022-11-22T21:13:56ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1007/s11042-021-11038-0https://figshare.com/articles/journal_contribution/Improving_text-to-image_generation_with_object_layout_guidance/21597408CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/215974082022-11-22T21:13:56Z
spellingShingle Improving text-to-image generation with object layout guidance
Jezia Zakraoui (14151399)
Information and computing sciences
Computer vision and multimedia computation
Distributed computing and systems software
Computer Networks and Communications
Hardware and Architecture
Media Technology
Software
status_str publishedVersion
title Improving text-to-image generation with object layout guidance
title_full Improving text-to-image generation with object layout guidance
title_fullStr Improving text-to-image generation with object layout guidance
title_full_unstemmed Improving text-to-image generation with object layout guidance
title_short Improving text-to-image generation with object layout guidance
title_sort Improving text-to-image generation with object layout guidance
topic Information and computing sciences
Computer vision and multimedia computation
Distributed computing and systems software
Computer Networks and Communications
Hardware and Architecture
Media Technology
Software