-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Lion optimizer and a "skip step" version of it #43
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I want to log the step_factor, do I lose all the benefits of avoiding host-device sync?
if self._grad_norms: | ||
grad_norm_std = torch.std(torch.stack(self._grad_norms[:-1])) | ||
return ( | ||
self.latest_loss <= self.sigma_factor * loss_std |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Karpathy thing actually filters out when loss/grad norm is sigma_factor
standard deviations above the mean (this does above 0). A number that is s
standard deviations above the mean is said to have a "z-score" of s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, duh, that's embarrassing after having spent 4 years in a stats PhD program
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot the .abs()
but just pushed a fix for that.
I don't really intend to use the Lion optimizer, this is more meant as a proof of concept for two things: