CODA: Contrastive Object-centric Diffusion Alignment
Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) introduces register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slotโimage correspondence.
๐ Installation
The training and evaluation code requires PyTorch. Clone the repository then use requirements.txt to install dependencies
pip install -r requirements.txt
Data preparation
All datasets will be downloaded and placed at $USER_DATA. Run the following command to get the data.
# define where to store data
export USER_DATA=...
# download the dataset
bash preprocess/download.sh voc coco movi-c movi-e
๐ฎ Training
We use the following script for training.
bash scripts/train.sh <dataset>
where dataset can be one of [voc, coco, movi-c, movi-e].
To enable logging with wandb, place your API key in a .key file.
๐ Evaluation
The diffusion pipeline can be loaded as follows.
from src.model.pipeline import DiffusionPipeline
image = <image_tensor>
model_path = <path_to_pretrained_model>
model = DiffusionPipeline.from_pretrained(model_path).to("cuda")
with torch.no_grad():
slots = model.encoder(image)
image_rec = model.sample(slots, resolution=512)
We use the following script for evaluation.
bash scripts/eval.sh <dataset>
where dataset can be one of [voc, coco, movi-c, movi-e].
๐ฅ Pretrained models are available.
๐ Citation
Please cite our paper if you find it useful in your research:
@article{nguyen2026coda,
title={Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment},
author={Bac Nguyen and Yuhta Takida and Naoki Murata and Chieh-Hsin Lai and Toshimitsu Uesaka and Stefano Ermon and Yuki Mitsufuji},
year={2026},
journal={arXiv 2601.01224},
}
Acknowledgement
We thank the authors of SlotDiffusion, Latent Slot Diffusion and Latent Diffusion Models for making their implementations publicly available.
License
CODA is released under the Apache License 2.0. See the LICENSE file for more details.
- Downloads last month
- -