What are good ways to evaluate ControlNet? #320
themrzmaster
started this conversation in
General
Replies: 1 comment 3 replies
-
usually we just use the "guess mode" to evaluate because non-prompt mode is more challenging and the performance difference between different models can be very obvious. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Suppose I finetune Controlnet-mlsd on a specific dataset.
What are some qualitative metrics that I can use to compare it and see if my fine-tuning was worth it?
Looking at this tutorial https://huggingface.co/docs/diffusers/main/en/conceptual/evaluation we have a specific session of evaluation of text and image conditioned models, based on the CLIP directional similarity. For that, we need a previous image (in our case the conditional image, source) and the generated image (output of the ControlNet). We also need 2 captions, which in our case is not always available (we usually have only one prompt, that is going to be used to generate the new image). Would it make sense to use this metric? I mean, we could generate a prompt to the original image using BLIP (for example), but is it the best way?
Another approach would be to use the FID score of the training the data and the generated images, ignoring the texts. Does it make sense? As we usually change only a little of the image, would it work?
Any new ideas?
Beta Was this translation helpful? Give feedback.
All reactions