diffusion-artifact-estimator

(Anti) Proof of Concept for the task of artifact estimation on early generation steps of reverse diffusion process of image generation.

Spoiler: The results are pretty bad, there are several reasons for it, which will be named in the end of this file. But mainly, it's because the training sample was too small.

To reproduce the results, start from dataset-collector.ipynb notebook and then go to demo.ipynb notebook.

Approach

This project was heavily inspired by aesthetic-scorer by kenjiqq

The idea is to train an MLP regression network on finetuned CLIP embeddings finetuned on latent representations of images in their generation progress. Due to the nature of latent space of Stable Diffusion, we can (sort of) try to represent these embeddings as a small noisy image with simple matrix multiplication.

The dataset that I was using is diffusiondb, which contains over 14M images, with their generation parameters and a different dataset AI-Horde with ratings for images in former dataset. SInce images in the dataset are fully generated and are represented in Image space, one of the goals was to encode all of the images (in training subset) in VAE latent space, and apply forward noising with scheduler to get noisy latents, and then approximate them with already mentioned matrix multiplication. It looks something like this:

For the sake of testing the hypothesis I only trained and tested on just half noisy sample (e.g. 10th step out of 20). Then, with constructed dataset, we need to obtain embeddings to train MLP. But there's a catch. Using pretrained CLIP to obtain embeddings of low-resolution noisy images would lead to low quality noisy embeddings. For that matter, I fine-tuned a CLIP model on dataset of noisy samples using open_clip library. Here's the W&B training log . Here's link to model weights. Now, on trained embeddings, we could train a simple MLP that would yield a continious value in range [0, 5] based on how much artifacts there are in the image. Here's its W&B log. It was found that my network usually returns the mean value of the range (2.5) majority of the time, therefore not leading to any good performance, but it must be mentioned that my training set only consisted of 10k samples.

Possible Improvements

First and foremost, train models on atleast 1M samples to get somewhat good results.
Reconsider the whole approach and instead of training and running inference on approximated latent space, use, for example, tiny VAE, to operate on image space.
Instead of training regression MLP for predicting the value, we could further fine-tuned our CLIP for downstream classification task with something like wise-ft.
When constructing dataset, use different schedulers for forward diffusion process and use different timesteps at which noise would be added.
Try adding random noise on each step of forward diffusion, contrary to what reverse diffusion does.
When training CLIP use greater batch size, since loss is pretty much calculated within the batch and greater batch size would converge the loss faster.
Also one of the cool ideas is that we could use some sort of Mixture of experts which operate on different timesteps and use their predictions altogether. But this is significantly more computationally expensive.
Calculate some sort of Cumulative distribution at each step, similarly to PALBERT paper, where authors can exit transformer encoders early when sum of all confidence levels on each transformer block exceeds certain threshold. We could adapt this idea and combine it with MoE to calculate how likely artifacts are to appear on final step given current step and add them together.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
artifact_estimator		artifact_estimator
assets		assets
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
clip_dataset_encoder.py		clip_dataset_encoder.py
dataset-collector.ipynb		dataset-collector.ipynb
demo.ipynb		demo.ipynb
environment.yml		environment.yml
image_noiser.py		image_noiser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

diffusion-artifact-estimator

Approach

Possible Improvements

About

Releases

Packages

Languages

License

outrun32/diffusion-artifact-estimator

Folders and files

Latest commit

History

Repository files navigation

diffusion-artifact-estimator

Approach

Possible Improvements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages