Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss almost does not change #7

Open
Oguzhanercan opened this issue Sep 13, 2024 · 8 comments
Open

Loss almost does not change #7

Oguzhanercan opened this issue Sep 13, 2024 · 8 comments

Comments

@Oguzhanercan
Copy link

Oguzhanercan commented Sep 13, 2024

Hi, I am trying to experience different losses. I Have implement a face similarity loss and disabled all of other losses. But The loss almost does not changed (%1 decreased). At my method, I am looking at the cosine similarity between the reference image and generated image like RectifID(https://github.com/feifeiobama/RectifID). Do you have any experiments with that or do you have any suggestions about it?

Face recognition model: Arcface Resnet200
Loss: (1 - Cosine simlarity)
Lr: 1 to 30 (I tried different settings on this)
SD Model: Pixart

@sgk98
Copy link
Collaborator

sgk98 commented Sep 13, 2024

Hi, while we had discussed similar experiments, I don't think we have ever tried this out, so I am quite curious to see what you get with this.

Coming to the specific details, I'd suggest you use either sd-turbo or sdxl-turbo, since it's reasonably fast at 512 resolution (vs HyperSDXL which does generation at 1024), and generates images which are of much better quality than pixart.
I am not sure if you removed the norm regularization loss, but you could first try to remove that to allow for more aggressive optimization of the objective (even if there is some amount of artifacts that could be introduced). You could also remove the gradient clipping to increase the effect of the optimization even further.

But apart from this, I would be very surprised if the loss doesn't go down at all (especially if you increase the number of iterations even further). From our experience with optimizing different objectives, the loss inevitably goes down quite easily, although in some cases this may not correspond to the changes/improvements that you desire.

@Oguzhanercan
Copy link
Author

Then I will try your suggestions and see if it is working. I will be sharing new experimental results on this. The reason why I used pixart is I cannot run the code with other models. But I will be upgrading diffusers from 0.30 to newer version and see if it is working by following your suggestion at the other issue.

@sgk98
Copy link
Collaborator

sgk98 commented Sep 14, 2024

Sure, I think most recent versions of diffusers (even 0.24 worked a few months ago I think) would work smoothly. If you are interested, I can provide the pip freeze output, so that you can check if there's any other library that's causing the issue with the U-Nets.

@Oguzhanercan
Copy link
Author

I have tried many things to decrease the loss but it is not changing, not even fraction. Is it possible that this is caused by memsave argument?

@sgk98
Copy link
Collaborator

sgk98 commented Sep 17, 2024

Sure, you can remove the memsave argument (and its application in the model directly if you prefer). There could be issues that it's causing (especially if you're also applying it to the feature extractor for the reference image.

I would also suggest switching to SD/SDXL-Turbo for much better optimization (even the results in the paper show that Pixart-alpha-DMD is a much worse one-step model). With SD-Turbo, we were able to even optimize a segmentation loss, although the resulting images were not perfectly aligned, the segmentation loss did go down a lot.

Also, if you can share your code (maybe in a fork of the repo), I can also have a look at it and see if there's any other issue/if I'm able to make changes to fix this issue.

@Oguzhanercan
Copy link
Author

I could not remove the memsave while I was using pixart and SDXL-Turbo because I have 24GB gpu memory, when I tried it with SD-Turbo - w/wo memsave the loss just wiggles, so memsave is not the cause of the problem. I checked the code again but I could not find any problem, the code is on a remote computer, when I get cp permission, I will share it with you.

@Oguzhanercan
Copy link
Author

I solved the problem, at some point, I convert the output image of diffusion model to np array to get the face embedding (onnx model), at that stage gradient chain is become broken.

@Oguzhanercan
Copy link
Author

Anathor problem that I have faced is, I was using a JiT model for getting face embeddings, which I used for loss calculation (cosine similarity), this was causing that gradients to disappear.

@Oguzhanercan Oguzhanercan reopened this Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants