SPL: Structure Preservation Loss for Pixel-Level Edge-Aware Image Editing

Official implementation of "Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss"

Overview

SPL is a training-free method for pixel-level structure-preserving image editing using latent diffusion models. Unlike existing methods that only preserve coarse layouts, SPL maintains fine-grained edge structures while allowing creative semantic edits through text prompts.

Key Features:

Pixel-Level Structure Preservation: Local Linear Model-based loss for maintaining edge details
Training-Free: Plug-and-play with pre-trained diffusion models
Versatile Editing: Supports relighting, tone adjustment, style transfer, season changes, background replacement, and more
Precise Local Control: Cross-attention mask upsampling for targeted editing
Interactive Interface: Easy-to-use Gradio web demo

Installation

Setup

Clone the repository:

git clone https://github.com/gongms00/SPL.git
cd SPL

Create a conda environment:

conda create -n spl python=3.10 -y
conda activate spl

Install PyTorch:

pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128

Install remaining dependencies:

pip install -r requirements.txt

(Optional) Set environment variables for harmonization feature:

export OPENAI_API_KEY=your_openai_key

Quick Start

Running the Interactive Demo

python app.py

Open your browser at http://localhost:7860

Basic Usage

Upload a source image
Enter source prompt (describing the current image)
Enter edit prompt (describing the desired edit)
Adjust parameters if needed
Click "Run" to generate the edited image

Method Overview

Structure Preservation Loss (SPL)

SPL quantifies structural differences between source and edited images using Local Linear Models. For each local window, the loss enforces that the edited image is a linear transformation of the source. The bidirectional constraint ensures robust structure matching, even in flat regions.

Key Properties:

Edge-preserving: image gradients are preserved through the local linear model
Computed on grayscale (intensity) channel for uniform RGB updates
Color Preservation Loss (CPL) handles chrominance preservation

Integration with Diffusion Models

At each denoising timestep, the predicted latent is decoded to image space, optimized with SPL using the source image as reference, then re-encoded back to latent space. A final post-processing step refines the decoded output to recover fine details lost during VAE decoding.

Cross-Attention Mask Upsampling

For localized editing:

Extract 16x16 attention map from cross-attention layers
Progressively upsample 2x with guided filter at each step
Source image guides edge-aligned boundary refinement
Result: high-resolution mask with sharp, accurate boundaries

Key Parameters

Structure & Color Preservation

These are the core parameters that control how much the edited image preserves the source image's structure and color.

Parameter	Default	Description
Attention control schedule	0.8	Controls coarse structure preservation via cross/self-attention replacement. Higher values enforce stronger consistency with the source image, but the edit may be applied less strongly.
Optimization schedule	0.8	Fraction of denoising steps after which SPL/CPL optimization begins. Recommended to match the attention control schedule.
Preserve structure (SPL)	On	Enables the Structure Preservation Loss to maintain edge structures during editing.
Preserve color (CPL)	Off	Enables the Color Preservation Loss to prevent unintended color shifts.
Structure loss weight	10000	Weight for SPL. Higher values enforce stronger structure preservation.
Color loss weight	1000	Weight for CPL. Try values in the range 100~10000.
Optimization iterations	100	Number of gradient descent steps per denoising timestep for SPL/CPL optimization.
Post-processing with loss	On	Applies an additional refinement step on the final decoded image using SPL/CPL to recover fine details lost during VAE decoding.

Mask Settings

When SPL or CPL is set to "Masked area" mode, these parameters control which regions of the image are preserved. Masks are extracted from cross-attention maps and upsampled using guided filtering.

Parameter	Default	Description
Preservation area	Whole image	Whether to apply the loss to the whole image or only the masked area.
Mask words (source/target)	-	Comma-separated words from the prompt whose attention maps define the mask region.
Threshold	0.5	Binarization threshold for the attention-based mask. Higher values produce smaller, more focused masks.
Invert mask	Off	Inverts the mask so that the loss is applied to the area outside the selected words.

General Settings

Parameter	Default	Description
Source/Target prompt	-	Source prompt describes the input image; target prompt describes the desired edit.
Inference steps	15	Number of denoising steps.
Source guidance scale	1	Classifier-free guidance scale for the source prompt.
Target guidance scale	2	Classifier-free guidance scale for the target prompt. Increase for stronger editing.
Seed	0	Random seed for reproducibility.

Related Work

This project builds upon:

InfEdit (Xu et al., 2024): Coarse-structure-preserving editing via attention conditioning
Prompt-to-Prompt (Hertz et al., 2023): Cross-attention control for text-driven editing

Our contribution is the Structure Preservation Loss (SPL) that adds pixel-level edge structure preservation to these coarse-structure methods.

Citation

@inproceedings{gong2026spl,
  title={Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss},
  author={Gong, Minsu and Ryu, Nuri and Ok, Jungseul and Cho, Sunghyun},
  booktitle={Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
  year={2026}
}

Acknowledgements

InfEdit for the base editing framework
Prompt-to-Prompt for attention control
LCM for efficient sampling
Guided Filter for edge-aware processing

License

This project is licensed under the Apache License 2.0. See LICENSE.txt for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
guided_filter_pytorch		guided_filter_pytorch
images		images
.env.example		.env.example
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
harmonization_gpt.py		harmonization_gpt.py
losses.py		losses.py
mask_upsample.py		mask_upsample.py
pipeline_spl.py		pipeline_spl.py
ptp_utils.py		ptp_utils.py
requirements.txt		requirements.txt
seq_aligner.py		seq_aligner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPL: Structure Preservation Loss for Pixel-Level Edge-Aware Image Editing

Overview

Installation

Setup

Quick Start

Running the Interactive Demo

Basic Usage

Method Overview

Structure Preservation Loss (SPL)

Integration with Diffusion Models

Cross-Attention Mask Upsampling

Key Parameters

Structure & Color Preservation

Mask Settings

General Settings

Related Work

Citation

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPL: Structure Preservation Loss for Pixel-Level Edge-Aware Image Editing

Overview

Installation

Setup

Quick Start

Running the Interactive Demo

Basic Usage

Method Overview

Structure Preservation Loss (SPL)

Integration with Diffusion Models

Cross-Attention Mask Upsampling

Key Parameters

Structure & Color Preservation

Mask Settings

General Settings

Related Work

Citation

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages