-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About VAE channels #56
Comments
@LTH14 |
You can directly use this one for the kl-8 https://huggingface.co/stabilityai/sd-vae-ft-ema |
@LTH14 |
Yes -- if you want to use a pre-trained SD model, then you should use SD's KL-8 tokenizer. |
|
If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32. The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers. |
Yes, I know it. I want to use sd-1.5, so which tokenier I should use? I am using KL-8 tokenier from [https://huggingface.co/stabilityai/sd-vae-ft-ema] |
In other words, if I want to use sd-1.5, which tokenier should I choose? I want to use a non-quantified tokenizer so that it is consistent with yours |
For sd-1.5 you can use https://huggingface.co/stabilityai/sd-vae-ft-ema or https://huggingface.co/stabilityai/sd-vae-ft-mse. They both have the same tokenizer encoder (so the latent is the same) but different tokenizer decoders. |
Yes! I am using the tokenier from https://huggingface.co/stabilityai/sd-vae-ft-ema |
@LTH14
Hello Bro
I found the VAE in mar is KL-16, the latent dimension is [B 16 16 16], and when use KL-8, the latent dimension is [B 4 32 32].
I have a question: if I use the SD model or other big diffusion model instead of MLP, I need to change the vae right? The SD take an input of [B 4 H W].
The text was updated successfully, but these errors were encountered: