CN

SHEN GANGGANG SHEN

Professor      

  • Professional Title:Professor
  • Gender:Male
  • Status:Employed
  • Department:School of Software Engineering
  • Education Level:Postgraduate (Doctoral)

Paper Publications

Current position: 英文主页 > Scientific Research > Paper Publications

Diff-STAR: Exploring student-teacher adaptive reconstruction through diffusion-based generation for image harmonization

Release time:2025-01-17
Hits:
Indexed by:
Journal paper
Document Code:
105254
Journal:
Image and Vision Computing
Included Journals:
SCI
Volume:
151
Key Words:
Image harmonization Denoising diffusion model Transformer Pretraining Adaptive patch masking
DOI number:
10.1016/j.imavis.2024.105254
Date of Publication:
2024-11-01
Impact Factor:
4.2
Abstract:
Image harmonization aims to seamlessly integrate foreground and background elements from distinct photos into a visually realistic composite. However, achieving high-quality image composition remains challenging in adjusting color balance, retaining fine details, and ensuring perceptual consistency. This article introduces a novel approach named Diffusion-based Student-Teacher Adaptive Reconstruction (Diff-STAR) to address foreground adjustment by framing it as an image reconstruction task. Leveraging natural photographs for model pretraining eliminates the need for data augmentation within Diff-STAR's framework. Employing the pre-trained Denoising Diffusion Implicit Model (DDIM) enhances photorealism and fidelity in generating high-quality outputs from reconstructed latent representations. By effectively identifying similarities in low-frequency style and semantic relationships across various regions within latent images, we develop a student-teacher architecture combining Transformer encoders and decoders to predict adaptively masked patches derived through diffusion processes. Evaluated on the public datasets, including iHarmony4 and RealHM, the experiment results confirm Diff-STAR's superiority over other state-of-the-art approaches based on metrics including Mean Squared Error (MSE) and Peak Signal-to-noise ratio (PSNR).