About VAE channels #56

pokameng · 2024-09-28T07:53:34Z

@LTH14
Hello Bro
I found the VAE in mar is KL-16, the latent dimension is [B 16 16 16], and when use KL-8, the latent dimension is [B 4 32 32].
I have a question: if I use the SD model or other big diffusion model instead of MLP, I need to change the vae right? The SD take an input of [B 4 H W].

pokameng · 2024-09-28T07:57:16Z

@LTH14
How can I download the kl- 8?

LTH14 · 2024-09-28T13:32:44Z

You can directly use this one for the kl-8 https://huggingface.co/stabilityai/sd-vae-ft-ema

pokameng · 2024-10-05T04:40:33Z

@LTH14
Hi,bro!
I have a question: if I use the SD model or other big diffusion model instead of MLP, I need to change the vae right? The SD take an input of [B 4 H W].

LTH14 · 2024-10-05T05:10:10Z

Yes -- if you want to use a pre-trained SD model, then you should use SD's KL-8 tokenizer.

pokameng · 2024-10-05T05:14:03Z

Yes -- if you want to use a pre-trained SD model, then you should use SD's KL-8 tokenizer.
well, the output of mar is [B L D],so i need to convert it to [B 4 32 32] if i want to use a pre-trained SD model(e.g. controlnet),right?
and the vae i need to use kl-8 tokenier from [https://huggingface.co/stabilityai/sd-vae-ft-ema]?

LTH14 · 2024-10-05T05:19:01Z

If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.

The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.

pokameng · 2024-10-05T05:23:01Z

If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.

The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.

Yes, I know it. I want to use sd-1.5, so which tokenier I should use? I am using KL-8 tokenier from [https://huggingface.co/stabilityai/sd-vae-ft-ema]

pokameng · 2024-10-05T05:26:44Z

If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.

The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.

In other words, if I want to use sd-1.5, which tokenier should I choose? I want to use a non-quantified tokenizer so that it is consistent with yours

LTH14 · 2024-10-05T05:30:01Z

For sd-1.5 you can use https://huggingface.co/stabilityai/sd-vae-ft-ema or https://huggingface.co/stabilityai/sd-vae-ft-mse. They both have the same tokenizer encoder (so the latent is the same) but different tokenizer decoders.

pokameng · 2024-10-05T05:34:35Z

For sd-1.5 you can use https://huggingface.co/stabilityai/sd-vae-ft-ema or https://huggingface.co/stabilityai/sd-vae-ft-mse. They both have the same tokenizer encoder (so the latent is the same) but different tokenizer decoders.

Yes! I am using the tokenier from https://huggingface.co/stabilityai/sd-vae-ft-ema

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About VAE channels #56

About VAE channels #56

pokameng commented Sep 28, 2024

pokameng commented Sep 28, 2024

LTH14 commented Sep 28, 2024

pokameng commented Oct 5, 2024

LTH14 commented Oct 5, 2024

pokameng commented Oct 5, 2024

LTH14 commented Oct 5, 2024 •

edited

Loading

pokameng commented Oct 5, 2024

pokameng commented Oct 5, 2024

LTH14 commented Oct 5, 2024

pokameng commented Oct 5, 2024

About VAE channels #56

About VAE channels #56

Comments

pokameng commented Sep 28, 2024

pokameng commented Sep 28, 2024

LTH14 commented Sep 28, 2024

pokameng commented Oct 5, 2024

LTH14 commented Oct 5, 2024

pokameng commented Oct 5, 2024

LTH14 commented Oct 5, 2024 • edited Loading

pokameng commented Oct 5, 2024

pokameng commented Oct 5, 2024

LTH14 commented Oct 5, 2024

pokameng commented Oct 5, 2024

LTH14 commented Oct 5, 2024 •

edited

Loading