EN

GANG SHEN

教授

个人信息 更多+
  • 教师英文名称: SHEN GANG
  • 性别: 男
  • 在职信息: 在职
  • 所在单位: 软件学院
  • 学历: 研究生(博士)毕业

其他联系方式

暂无内容

论文成果

当前位置: 中文主页 - 科学研究 - 论文成果

Diff-STAR: Exploring student-teacher adaptive reconstruction through diffusion-based generation for image harmonization

发布时间:2025-01-17
点击次数:
论文类型:
期刊论文
论文编号:
105254
发表刊物:
Image and Vision Computing
收录刊物:
SCI
卷号:
151
关键字:
Image harmonization Denoising diffusion model Transformer Pretraining Adaptive patch masking
DOI码:
10.1016/j.imavis.2024.105254
发表时间:
2024-11-01
影响因子:
4.2
摘要:
Image harmonization aims to seamlessly integrate foreground and background elements from distinct photos into a visually realistic composite. However, achieving high-quality image composition remains challenging in adjusting color balance, retaining fine details, and ensuring perceptual consistency. This article introduces a novel approach named Diffusion-based Student-Teacher Adaptive Reconstruction (Diff-STAR) to address foreground adjustment by framing it as an image reconstruction task. Leveraging natural photographs for model pretraining eliminates the need for data augmentation within Diff-STAR's framework. Employing the pre-trained Denoising Diffusion Implicit Model (DDIM) enhances photorealism and fidelity in generating high-quality outputs from reconstructed latent representations. By effectively identifying similarities in low-frequency style and semantic relationships across various regions within latent images, we develop a student-teacher architecture combining Transformer encoders and decoders to predict adaptively masked patches derived through diffusion processes. Evaluated on the public datasets, including iHarmony4 and RealHM, the experiment results confirm Diff-STAR's superiority over other state-of-the-art approaches based on metrics including Mean Squared Error (MSE) and Peak Signal-to-noise ratio (PSNR).