Can't train tutorial on shoes. #204

DanielVlasic · 2019-09-17T21:23:23Z

I downloaded the shoe data (https://github.com/RobotLocomotion/pytorch-dense-correspondence/blob/master/config/dense_correspondence/dataset/composite/shoes_all.yaml) and tried going through the tutorial training with it.

I've received a "warning, empty mask b”, followed by “float division by zero” error.

Also, not sure which training config is appropriate for the shoe data.

manuelli · 2019-09-17T23:26:24Z

Looks like one of the masks might be empty, one of the logs may be corrupted. Could you post the full error message? In general training the shoes can be done in the same way as the caterpillar in the tutorial if you want a class-consistent shoe network.

For some of the code used in the shoe experiments you can take a look at https://github.com/RobotLocomotion/pytorch-dense-correspondence/blob/master/dense_correspondence/experiments/shoes_consistent/training_shoes.ipynb.

DanielVlasic · 2019-09-18T14:39:14Z

Here are some details.

I executed the tutorial:
dense_correspondence/training/training_tutorial.ipynb.

I set the config to:
config_filename = os.path.join(utils.getDenseCorrespondenceSourceDir(), 'config', 'dense_correspondence', 'dataset', 'composite', 'shoes_all.yaml')

Here is the full output of the training cell:

training descriptor of dimension 3
using SINGLE_OBJECT_WITHIN_SCENE
logging_dir: /home/dvlasic/data/pdc/trained_models/tutorials/shoes_3
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /home/dvlasic/.cache/torch/checkpoints/resnet34-333f7ec4.pth
100.0%
/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py:2622: UserWarning: nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.")
/home/dvlasic/code/modules/dense_correspondence_manipulation/utils/utils.py:258: RuntimeWarning: invalid value encountered in arccos
theta = 2np.arccos(2 * np.dot(q,r)**2 - 1)
/home/dvlasic/code/modules/dense_correspondence_manipulation/utils/utils.py:258: RuntimeWarning: invalid value encountered in arccos
theta = 2np.arccos(2 * np.dot(q,r)**2 - 1)

empty data, continuing

/home/dvlasic/code/modules/dense_correspondence_manipulation/utils/utils.py:258: RuntimeWarning: invalid value encountered in arccos
theta = 2*np.arccos(2 * np.dot(q,r)**2 - 1)

empty data, continuing

warning, empty mask b

ZeroDivisionError Traceback (most recent call last)
in ()
5 print "training descriptor of dimension %d" %(d)
6 train = DenseCorrespondenceTraining(dataset=dataset, config=train_config)
----> 7 train.run()
8 print "finished training descriptor of dimension %d" %(d)

/home/dvlasic/code/dense_correspondence/training/training.pyc in run(self, loss_current_iteration, use_pretrained)
340 masked_non_matches_a, masked_non_matches_b,
341 background_non_matches_a, background_non_matches_b,
--> 342 blind_non_matches_a, blind_non_matches_b)
343
344

/home/dvlasic/code/dense_correspondence/loss_functions/loss_composer.pyc in get_loss(pixelwise_contrastive_loss, match_type, image_a_pred, image_b_pred, matches_a, matches_b, masked_non_matches_a, masked_non_matches_b, background_non_matches_a, background_non_matches_b, blind_non_matches_a, blind_non_matches_b)
31 masked_non_matches_a, masked_non_matches_b,
32 background_non_matches_a, background_non_matches_b,
---> 33 blind_non_matches_a, blind_non_matches_b)
34
35 if (match_type == SpartanDatasetDataType.SINGLE_OBJECT_ACROSS_SCENE).all():

/home/dvlasic/code/dense_correspondence/loss_functions/loss_composer.pyc in get_within_scene_loss(pixelwise_contrastive_loss, image_a_pred, image_b_pred, matches_a, matches_b, masked_non_matches_a, masked_non_matches_b, background_non_matches_a, background_non_matches_b, blind_non_matches_a, blind_non_matches_b)
82 matches_a, matches_b,
83 masked_non_matches_a, masked_non_matches_b,
---> 84 M_descriptor=pcl._config["M_masked"])
85
86 if pcl._config["use_l2_pixel_loss_on_background_non_matches"]:

/home/dvlasic/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in get_loss_matched_and_non_matched_with_l2(self, image_a_pred, image_b_pred, matches_a, matches_b, non_matches_a, non_matches_b, M_descriptor, M_pixel, non_match_loss_weight, use_l2_pixel_loss)
83
84
---> 85 match_loss, _, _ = PCL.match_loss(image_a_pred, image_b_pred, matches_a, matches_b)
86
87

/home/dvlasic/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in match_loss(image_a_pred, image_b_pred, matches_a, matches_b)
163 matches_b_descriptors = matches_b_descriptors.unsqueeze(0)
164
--> 165 match_loss = 1.0 / num_matches * (matches_a_descriptors - matches_b_descriptors).pow(2).sum()
166
167 return match_loss, matches_a_descriptors, matches_b_descriptors

ZeroDivisionError: float division by zero

peteflorence · 2019-09-19T19:16:58Z

Thanks, I can fix this

peteflorence · 2019-10-04T17:28:23Z

Hi Daniel, sorry to be so slow.
Does this commit fix your issue? ad541fc
We have fixed this issue in our private branch, I think this should be all you need.
Let me know if doesn't work.

peteflorence · 2019-10-04T17:28:58Z

Also I am working on getting the new code open sourced, should be soon.

* Fixes this issue: RobotLocomotion/pytorch-dense-correspondence#204 * Add gpus flag for nvidia-docker. Some refactoring * Update data_paths * Add APLoss class and RingSampler class * Add support for AP loss * Add comments * Fix loss.cuda()

peteflorence mentioned this issue Oct 24, 2019

ZeroDivisionError: float division by zero in loss function #203

Open

ghost pushed a commit to swiatkowski/general-dense-object-nets that referenced this issue Apr 1, 2020

Fixes this issue: RobotLocomotion/pytorch-dense-correspondence#204

8e38c46

ghost mentioned this issue Apr 2, 2020

Ap loss swiatkowski/general-dense-object-nets#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't train tutorial on shoes. #204

Can't train tutorial on shoes. #204

DanielVlasic commented Sep 17, 2019

manuelli commented Sep 17, 2019

DanielVlasic commented Sep 18, 2019

peteflorence commented Sep 19, 2019

peteflorence commented Oct 4, 2019

peteflorence commented Oct 4, 2019

Can't train tutorial on shoes. #204

Can't train tutorial on shoes. #204

Comments

DanielVlasic commented Sep 17, 2019

manuelli commented Sep 17, 2019

DanielVlasic commented Sep 18, 2019

warning, empty mask b

peteflorence commented Sep 19, 2019

peteflorence commented Oct 4, 2019

peteflorence commented Oct 4, 2019