Diff-STAR: Exploring student-teacher adaptive reconstruction through diffusion-based generation for image harmonization
Release time:2025-01-17
Hits:
- Indexed by:
- Journal paper
- Document Code:
- 105254
- Journal:
- Image and Vision Computing
- Included Journals:
- SCI
- Volume:
- 151
- Key Words:
- Image harmonization Denoising diffusion model Transformer Pretraining Adaptive patch masking
- DOI number:
- 10.1016/j.imavis.2024.105254
- Date of Publication:
- 2024-11-01
- Impact Factor:
- 4.2
- Abstract:
- Image harmonization aims to seamlessly integrate foreground and background elements from distinct photos into a visually realistic composite. However, achieving high-quality image composition remains challenging in adjusting color balance, retaining fine details, and ensuring perceptual consistency. This article introduces a novel approach named Diffusion-based Student-Teacher Adaptive Reconstruction (Diff-STAR) to address foreground adjustment by framing it as an image reconstruction task. Leveraging natural photographs for model pretraining eliminates the need for data augmentation within Diff-STAR's framework. Employing the pre-trained Denoising Diffusion Implicit Model (DDIM) enhances photorealism and fidelity in generating high-quality outputs from reconstructed latent representations. By effectively identifying similarities in low-frequency style and semantic relationships across various regions within latent images, we develop a student-teacher architecture combining Transformer encoders and decoders to predict adaptively masked patches derived through diffusion processes. Evaluated on the public datasets, including iHarmony4 and RealHM, the experiment results confirm Diff-STAR's superiority over other state-of-the-art approaches based on metrics including Mean Squared Error (MSE) and Peak Signal-to-noise ratio (PSNR).