You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "train_edge.py", line 506, in<module>trainer.train_edge()
File "train_edge.py", line 290, in train_edge
self.optimizer.step(main_loss)
File "/home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor/optim.py", line 305, in step
self.pre_step(loss, retain_graph=False)
File "/home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor/optim.py", line 220, in pre_step
self.backward(loss, retain_graph)
File "/home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor/optim.py", line 173, in backward
grads = jt.grad(loss, params_has_grad, retain_graph)
File "/home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor/__init__.py", line 445, in grad
return core.grad(loss, targets, retain_graph)
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.grad)).
(fdlnet_j) llf@XY-TITAN-RTX:/home/ubuntu/hdd2/llf/fdlnet_jittor/scripts$ python train_edge.py --model fdlnet --backbone resnet50 --dataset night --aux
[i 0705 15:10:15.537224 52 compiler.py:956] Jittor(1.3.8.5) src: /home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor
[i 0705 15:10:15.545380 52 compiler.py:957] g++ at /usr/bin/g++(5.5.0)
[i 0705 15:10:15.545582 52 compiler.py:958] cache_path: /home/llf/.cache/jittor/jt1.3.8/g++5.5.0/py3.8.19/Linux-4.15.0-1x37/IntelRXeonRGolx4e/default
[i 0705 15:10:15.579173 52 install_cuda.py:93] cuda_driver_version: [12, 1]
[i 0705 15:10:15.579814 52 install_cuda.py:81] restart /home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/bin/python ['train_edge.py', '--model', 'fdlnet', '--backbone', 'resnet50', '--dataset', 'night', '--aux']
[i 0705 15:10:15.903714 16 compiler.py:956] Jittor(1.3.8.5) src: /home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor
[i 0705 15:10:15.910872 16 compiler.py:957] g++ at /usr/bin/g++(5.5.0)
[i 0705 15:10:15.911057 16 compiler.py:958] cache_path: /home/llf/.cache/jittor/jt1.3.8/g++5.5.0/py3.8.19/Linux-4.15.0-1x37/IntelRXeonRGolx4e/default
[i 0705 15:10:15.944564 16 install_cuda.py:93] cuda_driver_version: [12, 1]
[i 0705 15:10:15.954342 16 __init__.py:411] Found /home/llf/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc(11.2.152) at /home/llf/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc.
[i 0705 15:10:16.037728 16 __init__.py:411] Found gdb(8.1.1) at /usr/bin/gdb.
[i 0705 15:10:16.046927 16 __init__.py:411] Found addr2line(2.30) at /usr/bin/addr2line.
[i 0705 15:10:16.301866 16 compiler.py:1011] cuda key:cu11.2.152_sm_75
[i 0705 15:10:16.767486 16 __init__.py:227] Total mem: 125.56GB, using 16 procs for compiling.
[i 0705 15:10:16.866903 16 jit_compiler.cc:28] Load cc_path: /usr/bin/g++
[i 0705 15:10:17.003635 16 init.cc:62] Found cuda archs: [75,]
[i 0705 15:10:17.038976 16 __init__.py:411] Found mpicc(2.1.1) at /usr/bin/mpicc.
[i 0705 15:10:18.680663 16 cuda_flags.cc:49] CUDA enabled.
2024-07-05 15:10:18,788 test INFO: Using 1 GPUs
2024-07-05 15:10:18,788 test INFO: Namespace(att_weight=0.01, aux=True, aux_weight=0.4, backbone='resnet50', base_size=512, batch_size=2, best_recode={'epoch': -1, 'mean_iu': 0}, crop_size=384, dataset='night', date_str='2024_07_05_15_10_18', device='cuda', distributed=False, edge_weight=0.01, epochs=260, flip=False, joint_edgeseg_loss=False, jpu=False, l2_weight=0, last_recode={}, local_rank=0, log_dir='../runs/logs/', log_iter=20, lr=0.005, manual_seed=40171, model='fdlnet', momentum=0.9, no_cuda=False, num_gpus=1, resume=None, save_dir='../runs/ckpt', save_epoch=20, seg_weight=1.0, skip_val=False, start_epoch=0, use_ohem=False, val_epoch=1, warmup_factor=0.3333333333333333, warmup_iters=0, warmup_method='linear', weight_decay=0.0005, workers=12)
Found 2998 images in the folder ../../datasets/night/images/train
Found 1299 images in the folder ../../datasets/night/images/val
[w 0705 15:10:19.370889 16 nn.py:2280] The `Parameter` interface isn't needed in Jittor, this interfacedoes nothings and it is just used for compatible.A Jittor Var is a Parameterwhen it is a member of Module, if you don't want a Jittor
Var menber is treated as a Parameter, just name it startswith
underscore `_`.
2024-07-05 15:10:19,373 test INFO: Start training, Total Epochs: 260 = Total Iterations 389740
type of threshold_index: <class 'jittor.jittor_core.Var'>, shape of threshold_index: [1,]
type of threshold_index: <class 'jittor.jittor_core.Var'>, shape of threshold_index: [1,]
type of threshold_index: <class 'jittor.jittor_core.Var'>, shape of threshold_index: [1,]
Compiling Operators(1/1) used: 2.96s eta: 0s
Compiling Operators(1/1) used: 2.95s eta: 0s
Traceback (most recent call last):
File "train_edge.py", line 506, in<module>trainer.train_edge()
File "train_edge.py", line 290, in train_edge
self.optimizer.step(main_loss)
File "/home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor/optim.py", line 305, in step
self.pre_step(loss, retain_graph=False)
File "/home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor/optim.py", line 220, in pre_step
self.backward(loss, retain_graph)
File "/home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor/optim.py", line 173, in backward
grads = jt.grad(loss, params_has_grad, retain_graph)
File "/home/ubuntu/hdd2/llf/miniconda3/envs/fdlnet_j/lib/python3.8/site-packages/jittor/__init__.py", line 445, in grad
return core.grad(loss, targets, retain_graph)
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.grad)).
Types of your inputs are:
self = module,
args = (Var, list, bool, ),
The functiondeclarations are:
vector<VarHolder*> _grad(VarHolder* loss, const vector<VarHolder*>& targets, bool retain_graph=true)
Failed reason:[f 0705 15:10:28.107652 16 cublas_batched_matmul_op.cc:34] Check failed: a->dtype().dsize() == b->dtype().dsize() Something wrong... Could you please report this issue?type of two inputs should be the same
Describe the bug
在执行self.optimizer.step(main_loss)时报错如下:
其中optimizer选择的是SGD
传入的loss:jt.Var([4.33252289], dtype=float64)
Full Log
Minimal Reproduce
The text was updated successfully, but these errors were encountered: