You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 15).
#1599
Open
CharisWg opened this issue
Jul 24, 2024
· 1 comment
C:\Users\LocalAdmin\anaconda3\envs\lightlyyolo\python.exe D:\Charis\SSL-yolo8\lightly-master\examples\pytorch\mmcr_yolo.py
WARNING ⚠️ no model scale passed. Assuming scale='n'.
class_name is: MMCR
save_path is: D:\Charis\SSL-yolo8\lightly-master\runs\MMCR
Starting Training
epoch: 00, loss: -2415920191337764664519950336.00000
after training
tensor([ -0.7926, -2.2815, -0.7858, -14.8213, -16.7507], device='cuda:0')
tensor([ -0.7926, -2.2815, -0.7858, -14.8213, -16.7507], device='cuda:0')
tensor([ -0.7926, -2.2815, -0.7858, -14.8213, -16.7507], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
after saving training + has backbone.load_state_dict
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
save full_path is: D:\Charis\SSL-yolo8\lightly-master\runs\MMCR\MMCR_coca_alldcm_MMCRTransform.pth
Saving model for MMCR_coca_alldcm_MMCRTransform.pth at Epoch 1
Finding optimal model params. Loss is dropping from -2415920191337764664519950336.0000 to -2415920191337764664519950336.0000
D:\Charis\SSL-yolo8\lightly-master\lightly\loss\mmcr_loss.py:60: UserWarning: torch.linalg.svd: During SVD computation with the selected cusolver driver, batches 0, 1, 2, 3, 4, and other 123 batches failed to converge. A more accurate method will be used to compute the SVD as a fallback. Check doc at https://pytorch.org/docs/stable/generated/torch.linalg.svd.html (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\linalg\BatchLinearAlgebraLib.cpp:703.)
_, S_z, _ = svd(z)
Traceback (most recent call last):
File "D:\Charis\SSL-yolo8\lightly-master\examples\pytorch\mmcr_yolo.py", line 158, in
loss = criterion(z_o, z_m)
File "C:\Users\LocalAdmin\anaconda3\envs\lightlyyolo\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\LocalAdmin\anaconda3\envs\lightlyyolo\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Charis\SSL-yolo8\lightly-master\lightly\loss\mmcr_loss.py", line 60, in forward
_, S_z, _ = svd(z)
torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 15).
Process finished with exit code 1
The text was updated successfully, but these errors were encountered:
Hi, sorry for the late reply. It looks like your loss is way too large (2415920191337764664519950336.00000). Maybe try decreasing the learning rate or check your gradient values (clip them if necessary).
C:\Users\LocalAdmin\anaconda3\envs\lightlyyolo\python.exe D:\Charis\SSL-yolo8\lightly-master\examples\pytorch\mmcr_yolo.py⚠️ no model scale passed. Assuming scale='n'.
WARNING
class_name is: MMCR
save_path is: D:\Charis\SSL-yolo8\lightly-master\runs\MMCR
Starting Training
epoch: 00, loss: -2415920191337764664519950336.00000
after training
tensor([ -0.7926, -2.2815, -0.7858, -14.8213, -16.7507], device='cuda:0')
tensor([ -0.7926, -2.2815, -0.7858, -14.8213, -16.7507], device='cuda:0')
tensor([ -0.7926, -2.2815, -0.7858, -14.8213, -16.7507], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
after saving training + has backbone.load_state_dict
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
save full_path is: D:\Charis\SSL-yolo8\lightly-master\runs\MMCR\MMCR_coca_alldcm_MMCRTransform.pth
Saving model for MMCR_coca_alldcm_MMCRTransform.pth at Epoch 1
Finding optimal model params. Loss is dropping from -2415920191337764664519950336.0000 to -2415920191337764664519950336.0000
D:\Charis\SSL-yolo8\lightly-master\lightly\loss\mmcr_loss.py:60: UserWarning: torch.linalg.svd: During SVD computation with the selected cusolver driver, batches 0, 1, 2, 3, 4, and other 123 batches failed to converge. A more accurate method will be used to compute the SVD as a fallback. Check doc at https://pytorch.org/docs/stable/generated/torch.linalg.svd.html (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\linalg\BatchLinearAlgebraLib.cpp:703.)
_, S_z, _ = svd(z)
Traceback (most recent call last):
File "D:\Charis\SSL-yolo8\lightly-master\examples\pytorch\mmcr_yolo.py", line 158, in
loss = criterion(z_o, z_m)
File "C:\Users\LocalAdmin\anaconda3\envs\lightlyyolo\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\LocalAdmin\anaconda3\envs\lightlyyolo\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Charis\SSL-yolo8\lightly-master\lightly\loss\mmcr_loss.py", line 60, in forward
_, S_z, _ = svd(z)
torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 15).
Process finished with exit code 1
The text was updated successfully, but these errors were encountered: