Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About VAE channels #56

Open
pokameng opened this issue Sep 28, 2024 · 10 comments
Open

About VAE channels #56

pokameng opened this issue Sep 28, 2024 · 10 comments

Comments

@pokameng
Copy link

@LTH14
Hello Bro
I found the VAE in mar is KL-16, the latent dimension is [B 16 16 16], and when use KL-8, the latent dimension is [B 4 32 32].
I have a question: if I use the SD model or other big diffusion model instead of MLP, I need to change the vae right? The SD take an input of [B 4 H W].

@pokameng
Copy link
Author

@LTH14
How can I download the kl- 8?

@LTH14
Copy link
Owner

LTH14 commented Sep 28, 2024

You can directly use this one for the kl-8 https://huggingface.co/stabilityai/sd-vae-ft-ema

@pokameng
Copy link
Author

pokameng commented Oct 5, 2024

@LTH14
Hi,bro!
I have a question: if I use the SD model or other big diffusion model instead of MLP, I need to change the vae right? The SD take an input of [B 4 H W].

@LTH14
Copy link
Owner

LTH14 commented Oct 5, 2024

Yes -- if you want to use a pre-trained SD model, then you should use SD's KL-8 tokenizer.

@pokameng
Copy link
Author

pokameng commented Oct 5, 2024

Yes -- if you want to use a pre-trained SD model, then you should use SD's KL-8 tokenizer.
well, the output of mar is [B L D],so i need to convert it to [B 4 32 32] if i want to use a pre-trained SD model(e.g. controlnet),right?
and the vae i need to use kl-8 tokenier from [https://huggingface.co/stabilityai/sd-vae-ft-ema]?

@LTH14
Copy link
Owner

LTH14 commented Oct 5, 2024

If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.

The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.

@pokameng
Copy link
Author

pokameng commented Oct 5, 2024

If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.

The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.

Yes, I know it. I want to use sd-1.5, so which tokenier I should use? I am using KL-8 tokenier from [https://huggingface.co/stabilityai/sd-vae-ft-ema]

@pokameng
Copy link
Author

pokameng commented Oct 5, 2024

If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.

The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.

In other words, if I want to use sd-1.5, which tokenier should I choose? I want to use a non-quantified tokenizer so that it is consistent with yours

@LTH14
Copy link
Owner

LTH14 commented Oct 5, 2024

For sd-1.5 you can use https://huggingface.co/stabilityai/sd-vae-ft-ema or https://huggingface.co/stabilityai/sd-vae-ft-mse. They both have the same tokenizer encoder (so the latent is the same) but different tokenizer decoders.

@pokameng
Copy link
Author

pokameng commented Oct 5, 2024

For sd-1.5 you can use https://huggingface.co/stabilityai/sd-vae-ft-ema or https://huggingface.co/stabilityai/sd-vae-ft-mse. They both have the same tokenizer encoder (so the latent is the same) but different tokenizer decoders.

Yes! I am using the tokenier from https://huggingface.co/stabilityai/sd-vae-ft-ema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants