JAIST Repository >
School of Knowledge Science >
Articles >
Journal Articles >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/18793

Title: Sketch-Guided Two-Stage Text-to-Image Generation with Spatial Control
Authors: Zhang, Tianyu
Xie, Haoran
Keywords: Image generation
Sketch-guided
Two-stage model
Diffusion model
Issue Date: 2023-09-09
Publisher: 情報処理学会
Magazine name: 研究報告コンピュータグラフィックスとビジュアル情報学(CG)
Volume: 2023-CG-191
Number: 9
Start page: 1
End page: 6
Abstract: Recent text-to-image diffusion models can produce high-quality images based only on textual prompts. However, it is difficult to correctly interpret instructions specifying the layout of a compositional space using only text. We propose a sketch-based method to control the spatial relationship of corresponding objects in image generation and solve the issue of object loss in diffusion models. Our proposed method uses a pre-trained text-to-image diffusion model as the image generator and employs sketches as spatial guidance. Specifically, we divide the proposed model into two stages. In the feature extraction stage, sketches are segmented into individual objects using the image segmentation approach, and the obtained bounding boxes and labels are then used as spatial-guided inputs to the attention layers of the diffusion models. In the image generation stage, the proposed model utilizes a pre-trained text-to-image diffusion model as the generator to generate corresponding images. We evaluate the proposed method quantitatively and qualitatively with several experiments, validating the spatial control of the proposed method. In addition, we further demonstrate its versatility by changing the position relationships and relative scales in sketches.
Rights: 社団法人情報処理学会, Tianyu Zhang, Haoran Xie, 情報処理学会研究報告. CG, コンピュータグラフィックスとビジュアル情報学, 2023-CG-191 (9), 2023, pp.1-6. ここに掲載した著作物の利用に関する注意: 本著作物の著作権は(社)情報処理学会に帰属します。本著作物は著作権者である情報処理学会の許可のもとに掲載するものです。ご利用に当たっては「著作権法」ならびに「情報処理学会倫理綱領」に従うことをお願いいたします。 Notice for the use of this material: The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). This material is published on this web site with the agreement of the author (s) and the IPSJ. Please be complied with Copyright Law of Japan and the Code of Ethics of the IPSJ if any users wish to reproduce, make derivative work, distribute or make available to the public any part or whole thereof. All Rights Reserved, Copyright (C) Information Processing Society of Japan.
URI: http://hdl.handle.net/10119/18793
Material Type: publisher
Appears in Collections:a10-1. 雑誌掲載論文 (Journal Articles)

Files in This Item:

File Description SizeFormat
H-XIE-K-1201-2.pdf1070KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.

 


Contact : Library Information Section, Japan Advanced Institute of Science and Technology