torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 15). #1599

CharisWg · 2024-07-24T11:36:57Z

C:\Users\LocalAdmin\anaconda3\envs\lightlyyolo\python.exe D:\Charis\SSL-yolo8\lightly-master\examples\pytorch\mmcr_yolo.py
WARNING ⚠️ no model scale passed. Assuming scale='n'.
class_name is: MMCR
save_path is: D:\Charis\SSL-yolo8\lightly-master\runs\MMCR
Starting Training
epoch: 00, loss: -2415920191337764664519950336.00000
after training
tensor([ -0.7926, -2.2815, -0.7858, -14.8213, -16.7507], device='cuda:0')
tensor([ -0.7926, -2.2815, -0.7858, -14.8213, -16.7507], device='cuda:0')
tensor([ -0.7926, -2.2815, -0.7858, -14.8213, -16.7507], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
after saving training + has backbone.load_state_dict
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
tensor([-0.4687, -0.7416, -0.3247, -4.7035, -5.2732], device='cuda:0')
save full_path is: D:\Charis\SSL-yolo8\lightly-master\runs\MMCR\MMCR_coca_alldcm_MMCRTransform.pth
Saving model for MMCR_coca_alldcm_MMCRTransform.pth at Epoch 1
Finding optimal model params. Loss is dropping from -2415920191337764664519950336.0000 to -2415920191337764664519950336.0000
D:\Charis\SSL-yolo8\lightly-master\lightly\loss\mmcr_loss.py:60: UserWarning: torch.linalg.svd: During SVD computation with the selected cusolver driver, batches 0, 1, 2, 3, 4, and other 123 batches failed to converge. A more accurate method will be used to compute the SVD as a fallback. Check doc at https://pytorch.org/docs/stable/generated/torch.linalg.svd.html (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\linalg\BatchLinearAlgebraLib.cpp:703.)
_, S_z, _ = svd(z)
Traceback (most recent call last):
File "D:\Charis\SSL-yolo8\lightly-master\examples\pytorch\mmcr_yolo.py", line 158, in
loss = criterion(z_o, z_m)
File "C:\Users\LocalAdmin\anaconda3\envs\lightlyyolo\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\LocalAdmin\anaconda3\envs\lightlyyolo\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Charis\SSL-yolo8\lightly-master\lightly\loss\mmcr_loss.py", line 60, in forward
_, S_z, _ = svd(z)
torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 15).

Process finished with exit code 1

guarin · 2024-08-16T06:54:18Z

Hi, sorry for the late reply. It looks like your loss is way too large (2415920191337764664519950336.00000). Maybe try decreasing the learning rate or check your gradient values (clip them if necessary).

SauravMaheshkar added the question label Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 15). #1599

torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 15). #1599

CharisWg commented Jul 24, 2024

guarin commented Aug 16, 2024

torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 15). #1599

torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 15). #1599

Comments

CharisWg commented Jul 24, 2024

guarin commented Aug 16, 2024