Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the rescale problem #18

Open
Frankie123421 opened this issue Nov 3, 2022 · 2 comments
Open

Questions about the rescale problem #18

Frankie123421 opened this issue Nov 3, 2022 · 2 comments

Comments

@Frankie123421
Copy link

Hi, Xu. Thanks for sharing the code. I've noticed the discussion here (https://github.com/MinkaiXu/GeoDiff/issues/11) and carefully read the code line by line. Just as what you stated in the issue 11, the "diffusion" process in the code is actually rescaled compared to the paper, i.e., $\mathcal{C}^t = \frac{1}{\sqrt{\alpha_t}}(\sqrt{\alpha_t}C^0 + \sqrt{1-\alpha_t}\epsilon)$. Based on the paper ScoreSDE (https://arxiv.org/abs/2011.13456), DDPM is a variance preserving process and DSM is a variance exploding one. I think maybe there might be some typos in your answer to issue 11 which cause contradiction: "2) use the alpha to rescale the data to achieve variation preserving" and "the problem of variation preserving is: it will change the scale of coordinates". In my perspective, after rescaling, $\mathcal{C}^t = C^0 + \frac{\sqrt{1-\alpha_t}}{\sqrt{\alpha_t}}\epsilon$ is a DSM process with variance increasing along with $t$. So I am confused about why this rescaling method will hold the scale of coordinates since in my view it seems to corrupt the scale (increase the variance) instead.

@MinkaiXu
Copy link
Owner

MinkaiXu commented Apr 5, 2023

Hi,
Thanks for your interest! I think I can fully understand and actually agree with your statement.
By "change the scale of coordinates", I refer to that in my experiment, the data is usually on scales larger than 1. Then in my case, variance preserving will shrink the data, while variance exploring won't. The difference here is, by contrast, for DSM and DDPM papers the image is usually rescaled to [0,1].
And overall, I would say the rescale is just a trick I found during the implementation. By my discussion at #11, I just want to clarify that indeed there is an underlying variance-preserving process.

@Frankie123421
Copy link
Author

Thanks for your kind reply. Yep, I kind of got it later. The rescale one keeps the mean $\mathcal{C}^0$ while the original DDPM shrinks the data by the factor $\sqrt{\alpha_t}$ (which makes the construction of radius graph failed), though I still think the rescale one is variance exploding as $\frac{\sqrt{1-\alpha_t}}{\sqrt{\alpha_t}} \rightarrow \infty, t \rightarrow \infty$.
Another question is that in the chain-rule method, the calculation of $\nabla \log_{d^t} q(d^t|d^0)$ is approximated by $-\frac{\sqrt{\alpha_t}(d^t-d^0)}{1 - \alpha_t}$. In my perspective, this approximation indeed assumes that the pairwise distance follows the same perturbation process of the coordinates, i.e., $d^t = \frac{1}{\sqrt{\alpha_t}}(\sqrt{\alpha_t}d^0 + \sqrt{1-\alpha_t}\epsilon)$, and the "transformed" score of positions $\nabla \log_{C^t} q(C^t|C^0)$ is obtained by chain-rule. I wonder how this assumption is made and why it's reasonable. Though this transformation makes the transformed added noise (score) equivariant to $C^t$, it obscures the relation of the truly added score and the transformed one. In some cases like I need to recenter the (truly) added noises at their CoM, it seems that the transformed ones can't naturally satisfy this requirements. And if I further force the transformed ones and the outputs of model to be also recentered at their CoMs during the training, the testing result will be bad. I guess it's because the recentered transformed noises fall away from the truly ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants