Almost never a desirable behavior to run LayerNormGeneral in FP16 #17

WangTaoAs · 2024-11-13T09:46:11Z

Hi, thanks for your great work
I try to use your Network to train in a custom dataset and it works well. however, when i use FP16 to do inference, the performance drops a lot. I find that the self-implemented LayerNormGeneral function contribute to big errors between FP16 and FP32. and I try to use LayerNorm implemented by apex, the output can be the same. Is there any solution to solve this problem?

`class LayerNormGeneral(nn.Module):
def init(self, affine_shape=None, normalized_dim=(-1, ), scale=True,
bias=True, eps=1e-5):
super().init()
self.normalized_dim = normalized_dim
self.use_scale = scale
self.use_bias = bias
self.weight = nn.Parameter(torch.ones(affine_shape)) if scale else None
self.bias = nn.Parameter(torch.zeros(affine_shape)) if bias else None
self.eps = eps

def forward(self, x):
    c = x - x.mean(self.normalized_dim, keepdim=True)
    s = c.pow(2).mean(self.normalized_dim, keepdim=True)
    x = c / torch.sqrt(s + self.eps)
    # if self.use_scale:
    x = x * self.weight
    # if self.use_bias:
        # x = x + self.bias
    return x`

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Almost never a desirable behavior to run LayerNormGeneral in FP16 #17

Almost never a desirable behavior to run LayerNormGeneral in FP16 #17

WangTaoAs commented Nov 13, 2024 •

edited

Loading

Almost never a desirable behavior to run LayerNormGeneral in FP16 #17

Almost never a desirable behavior to run LayerNormGeneral in FP16 #17

Comments

WangTaoAs commented Nov 13, 2024 • edited Loading

WangTaoAs commented Nov 13, 2024 •

edited

Loading