Help with Latent Diffusion Model Training #8183

MiguelHGDC · 2024-10-26T18:00:49Z

MiguelHGDC
Oct 26, 2024

Hello everyone,

I’m working on training a latent diffusion model and have encountered an issue with the model's learning time. As I understand it, the process involves two main stages: first, I train an autoencoder on a set of images, which learns to compress them into a reduced latent space. Then, I use this autoencoder to transform the original images into their latent representations, and I train a diffusion model on these "new images" (the compressed representations). The advantage of this approach is that the diffusion model works with smaller, compressed representations, which should make learning to generate new latent spaces faster. These generated latents can later be decoded by the autoencoder to retrieve the final images.

My problem is that, following a similar methodology to the one used in the MAISI model, my diffusion model seems to be struggling to learn how to generate these latent spaces. Even after 400 epochs, the results remain almost identical to those at the very beginning, which is concerning. I have verified that the autoencoder is configured correctly and works well for decoding, so I suspect the issue might lie with the diffusion model itself or perhaps with the number of training epochs.

Some details about my setup:

Dataset: 5598 images of varying sizes.
Latent space: After compression, the latent representations range from (4,32,32,32) to (4,64,128,128).
Autoencoder compression factor: 4 (same as MAISI).
Loss values: Calculated using the L1 loss function, which oscillates between 0.18 and 0.2 consistently without significant reduction.

II’m not seeing any improvement or even an attempt to generate recognizable shapes by epoch 400. In the MAISI paper, the authors report visible results by around 200 epochs. Could it be that the model simply needs more epochs to start learning, or is there another configuration aspect that might be affecting the performance?

Is it normal for a diffusion model to take this long to start learning? Could the large size of the latent space representations be contributing to this slow progress?

I would appreciate any advice or if anyone has encountered similar experiences. Thanks in advance!

Epoch 400:

Epoch 0:

Hyperparameters:
{
"spatial_dims": 3,
"image_channels": 1,
"latent_channels": 4,
"mask_generation_latent_shape": [
4,
64,
64,
64
],
"patchDiscriminator_def": {
"spatial_dims": 3,
"num_layers_d": 3,
"channels": 32,
"in_channels": 1,
"out_channels": 1,
"kernel_size": 4,
"norm": "INSTANCE",
"bias": false,
"padding": 1
},
"autoencoder_def": {
"target": "monai.apps.generation.maisi.networks.autoencoderkl_maisi.AutoencoderKlMaisi",
"spatial_dims": "@spatial_dims",
"in_channels": "@image_channels",
"out_channels": "@image_channels",
"latent_channels": "@latent_channels",
"num_channels": [
64,
128,
256
],
"num_res_blocks": [2,2,2],
"norm_num_groups": 32,
"norm_eps": 1e-06,
"attention_levels": [
false,
false,
false
],
"with_encoder_nonlocal_attn": false,
"with_decoder_nonlocal_attn": false,
"use_checkpointing": false,
"use_convtranspose": false,
"norm_float16": true,
"num_splits": 1,
"dim_split": 1
},
"diffusion_unet_def": {
"target": "monai.apps.generation.maisi.networks.diffusion_model_unet_maisi.DiffusionModelUNetMaisi",
"spatial_dims": "@spatial_dims",
"in_channels": "@latent_channels",
"out_channels": "@latent_channels",
"num_channels": [
64,
128,
256,
512
],
"attention_levels": [
false,
false,
true,
true
],
"num_head_channels": [
0,
0,
32,
32
],
"num_res_blocks": 2,
"use_flash_attention": true,
"include_top_region_index_input": false,
"include_bottom_region_index_input": false,
"include_spacing_input": false
},

KumoLiu · 2024-10-28T03:57:44Z

KumoLiu
Oct 28, 2024
Maintainer

Hi @MiguelHGDC, perhaps you can refer to the tutorial here: https://github.com/Project-MONAI/tutorials/tree/main/generation/maisi. Thanks.

1 reply

MiguelHGDC Oct 28, 2024
Author

Hi @KumoLiu, Thanks for your attention! I’ve managed to resolve the issue. The model was indeed learning, as evidenced by the decreasing loss values during training. The problem, however, was in the denoising process during inference. I was performing the denoising incorrectly, which caused the autoencoder to struggle when decoding, as it couldn’t interpret the random latent space generated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with Latent Diffusion Model Training #8183

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Help with Latent Diffusion Model Training #8183

MiguelHGDC Oct 26, 2024

Replies: 1 comment · 1 reply

KumoLiu Oct 28, 2024 Maintainer

MiguelHGDC Oct 28, 2024 Author

MiguelHGDC
Oct 26, 2024

Replies: 1 comment 1 reply

KumoLiu
Oct 28, 2024
Maintainer

MiguelHGDC Oct 28, 2024
Author