Help with Latent Diffusion Model Training #8183
Replies: 1 comment 1 reply
-
Hi @MiguelHGDC, perhaps you can refer to the tutorial here: https://github.com/Project-MONAI/tutorials/tree/main/generation/maisi. Thanks. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
I’m working on training a latent diffusion model and have encountered an issue with the model's learning time. As I understand it, the process involves two main stages: first, I train an autoencoder on a set of images, which learns to compress them into a reduced latent space. Then, I use this autoencoder to transform the original images into their latent representations, and I train a diffusion model on these "new images" (the compressed representations). The advantage of this approach is that the diffusion model works with smaller, compressed representations, which should make learning to generate new latent spaces faster. These generated latents can later be decoded by the autoencoder to retrieve the final images.
My problem is that, following a similar methodology to the one used in the MAISI model, my diffusion model seems to be struggling to learn how to generate these latent spaces. Even after 400 epochs, the results remain almost identical to those at the very beginning, which is concerning. I have verified that the autoencoder is configured correctly and works well for decoding, so I suspect the issue might lie with the diffusion model itself or perhaps with the number of training epochs.
Some details about my setup:
II’m not seeing any improvement or even an attempt to generate recognizable shapes by epoch 400. In the MAISI paper, the authors report visible results by around 200 epochs. Could it be that the model simply needs more epochs to start learning, or is there another configuration aspect that might be affecting the performance?
Is it normal for a diffusion model to take this long to start learning? Could the large size of the latent space representations be contributing to this slow progress?
I would appreciate any advice or if anyone has encountered similar experiences. Thanks in advance!
Epoch 400:
Epoch 0:
Hyperparameters:
{
"spatial_dims": 3,
"image_channels": 1,
"latent_channels": 4,
"mask_generation_latent_shape": [
4,
64,
64,
64
],
"patchDiscriminator_def": {
"spatial_dims": 3,
"num_layers_d": 3,
"channels": 32,
"in_channels": 1,
"out_channels": 1,
"kernel_size": 4,
"norm": "INSTANCE",
"bias": false,
"padding": 1
},
"autoencoder_def": {
"target": "monai.apps.generation.maisi.networks.autoencoderkl_maisi.AutoencoderKlMaisi",
"spatial_dims": "@spatial_dims",
"in_channels": "@image_channels",
"out_channels": "@image_channels",
"latent_channels": "@latent_channels",
"num_channels": [
64,
128,
256
],
"num_res_blocks": [2,2,2],
"norm_num_groups": 32,
"norm_eps": 1e-06,
"attention_levels": [
false,
false,
false
],
"with_encoder_nonlocal_attn": false,
"with_decoder_nonlocal_attn": false,
"use_checkpointing": false,
"use_convtranspose": false,
"norm_float16": true,
"num_splits": 1,
"dim_split": 1
},
"diffusion_unet_def": {
"target": "monai.apps.generation.maisi.networks.diffusion_model_unet_maisi.DiffusionModelUNetMaisi",
"spatial_dims": "@spatial_dims",
"in_channels": "@latent_channels",
"out_channels": "@latent_channels",
"num_channels": [
64,
128,
256,
512
],
"attention_levels": [
false,
false,
true,
true
],
"num_head_channels": [
0,
0,
32,
32
],
"num_res_blocks": 2,
"use_flash_attention": true,
"include_top_region_index_input": false,
"include_bottom_region_index_input": false,
"include_spacing_input": false
},
Beta Was this translation helpful? Give feedback.
All reactions