[Illustrative; not for merge] How to prefer float16 as the main float type #1802
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, if your intention is to build a CoreML model which targets float16 (very likely if you're targeting ANE):
the process for tracing+converting the model is "trace in float32; casts to float16 will be added during conversion, then we try to optimize away those casts". I think.
This does have an (admittedly minor) downside of having to load a float32 model (and start with 32-bit weights), only to ultimately throw half of them away.
It also relies on optimization passes being effective in eliding all the casts. And you have to wait for them (slightly).
The main downside for me was when trying to debug failures: the ops were complicated by lots of casts, and looked more different from the initial torchscript than they needed to be.
To simplify all this: I changed everywhere I could find, the convention:
"Python floats will be interpreted as
np.float32
np.float16
".I'm not proposing to merge this or anything, but it actually wasn't many places that the changes needed to be made, in order to successfully compile stable-diffusion, using a model that was traced by torchscript in float16.
Note: on PyTorch, the CPU device doesn't implement float16 operations, so this trick requires one to trace the model in float16 via the MPS device instead.
If it turns out it's not just me who finds this useful:
I wonder whether this could be exposed somehow as a configurable option? "default float width".