Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't train tutorial on shoes. #204

Open
DanielVlasic opened this issue Sep 17, 2019 · 5 comments
Open

Can't train tutorial on shoes. #204

DanielVlasic opened this issue Sep 17, 2019 · 5 comments

Comments

@DanielVlasic
Copy link

I downloaded the shoe data (https://github.com/RobotLocomotion/pytorch-dense-correspondence/blob/master/config/dense_correspondence/dataset/composite/shoes_all.yaml) and tried going through the tutorial training with it.

I've received a "warning, empty mask b”, followed by “float division by zero” error.

Also, not sure which training config is appropriate for the shoe data.

@manuelli
Copy link
Collaborator

Looks like one of the masks might be empty, one of the logs may be corrupted. Could you post the full error message? In general training the shoes can be done in the same way as the caterpillar in the tutorial if you want a class-consistent shoe network.

For some of the code used in the shoe experiments you can take a look at https://github.com/RobotLocomotion/pytorch-dense-correspondence/blob/master/dense_correspondence/experiments/shoes_consistent/training_shoes.ipynb.

@DanielVlasic
Copy link
Author

Here are some details.

I executed the tutorial:
dense_correspondence/training/training_tutorial.ipynb.

I set the config to:
config_filename = os.path.join(utils.getDenseCorrespondenceSourceDir(), 'config', 'dense_correspondence', 'dataset', 'composite', 'shoes_all.yaml')

Here is the full output of the training cell:

training descriptor of dimension 3
using SINGLE_OBJECT_WITHIN_SCENE
logging_dir: /home/dvlasic/data/pdc/trained_models/tutorials/shoes_3
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /home/dvlasic/.cache/torch/checkpoints/resnet34-333f7ec4.pth
100.0%
/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py:2622: UserWarning: nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.")
/home/dvlasic/code/modules/dense_correspondence_manipulation/utils/utils.py:258: RuntimeWarning: invalid value encountered in arccos
theta = 2np.arccos(2 * np.dot(q,r)**2 - 1)
/home/dvlasic/code/modules/dense_correspondence_manipulation/utils/utils.py:258: RuntimeWarning: invalid value encountered in arccos
theta = 2
np.arccos(2 * np.dot(q,r)**2 - 1)

empty data, continuing

/home/dvlasic/code/modules/dense_correspondence_manipulation/utils/utils.py:258: RuntimeWarning: invalid value encountered in arccos
theta = 2*np.arccos(2 * np.dot(q,r)**2 - 1)

empty data, continuing

empty data, continuing

empty data, continuing

warning, empty mask b

ZeroDivisionError Traceback (most recent call last)
in ()
5 print "training descriptor of dimension %d" %(d)
6 train = DenseCorrespondenceTraining(dataset=dataset, config=train_config)
----> 7 train.run()
8 print "finished training descriptor of dimension %d" %(d)

/home/dvlasic/code/dense_correspondence/training/training.pyc in run(self, loss_current_iteration, use_pretrained)
340 masked_non_matches_a, masked_non_matches_b,
341 background_non_matches_a, background_non_matches_b,
--> 342 blind_non_matches_a, blind_non_matches_b)
343
344

/home/dvlasic/code/dense_correspondence/loss_functions/loss_composer.pyc in get_loss(pixelwise_contrastive_loss, match_type, image_a_pred, image_b_pred, matches_a, matches_b, masked_non_matches_a, masked_non_matches_b, background_non_matches_a, background_non_matches_b, blind_non_matches_a, blind_non_matches_b)
31 masked_non_matches_a, masked_non_matches_b,
32 background_non_matches_a, background_non_matches_b,
---> 33 blind_non_matches_a, blind_non_matches_b)
34
35 if (match_type == SpartanDatasetDataType.SINGLE_OBJECT_ACROSS_SCENE).all():

/home/dvlasic/code/dense_correspondence/loss_functions/loss_composer.pyc in get_within_scene_loss(pixelwise_contrastive_loss, image_a_pred, image_b_pred, matches_a, matches_b, masked_non_matches_a, masked_non_matches_b, background_non_matches_a, background_non_matches_b, blind_non_matches_a, blind_non_matches_b)
82 matches_a, matches_b,
83 masked_non_matches_a, masked_non_matches_b,
---> 84 M_descriptor=pcl._config["M_masked"])
85
86 if pcl._config["use_l2_pixel_loss_on_background_non_matches"]:

/home/dvlasic/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in get_loss_matched_and_non_matched_with_l2(self, image_a_pred, image_b_pred, matches_a, matches_b, non_matches_a, non_matches_b, M_descriptor, M_pixel, non_match_loss_weight, use_l2_pixel_loss)
83
84
---> 85 match_loss, _, _ = PCL.match_loss(image_a_pred, image_b_pred, matches_a, matches_b)
86
87

/home/dvlasic/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in match_loss(image_a_pred, image_b_pred, matches_a, matches_b)
163 matches_b_descriptors = matches_b_descriptors.unsqueeze(0)
164
--> 165 match_loss = 1.0 / num_matches * (matches_a_descriptors - matches_b_descriptors).pow(2).sum()
166
167 return match_loss, matches_a_descriptors, matches_b_descriptors

ZeroDivisionError: float division by zero

@peteflorence
Copy link
Collaborator

Thanks, I can fix this

@peteflorence
Copy link
Collaborator

Hi Daniel, sorry to be so slow.
Does this commit fix your issue? ad541fc
We have fixed this issue in our private branch, I think this should be all you need.
Let me know if doesn't work.

@peteflorence
Copy link
Collaborator

Also I am working on getting the new code open sourced, should be soon.

ghost pushed a commit to swiatkowski/general-dense-object-nets that referenced this issue Apr 1, 2020
ghost pushed a commit to swiatkowski/general-dense-object-nets that referenced this issue Apr 2, 2020
* Fixes this issue: RobotLocomotion/pytorch-dense-correspondence#204

* Add gpus flag for nvidia-docker. Some refactoring

* Update data_paths

* Add APLoss class and RingSampler class

* Add support for AP loss

* Add comments

* Fix loss.cuda()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants