You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was playing around with your google colab and wanted to few-nerd on encoder_id = "microsoft/deberta-v3-large"
but when I reach the train part if fails with RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 2:
Tokenizing the train dataset: 100%
131767/131767 [01:16<00:00, 1661.38 examples/s]
This SpanMarker model will ignore 0.339320% of all annotated entities in the train dataset. This is caused by the SpanMarkerModel maximum entity length of 8 words and the maximum model input length of 256 tokens.
These are the frequencies of the missed entities due to maximum entity length out of 340387 total entities:
- 486 missed entities with 9 words (0.142779%)
- 245 missed entities with 10 words (0.071977%)
- 119 missed entities with 11 words (0.034960%)
- 92 missed entities with 12 words (0.027028%)
- 57 missed entities with 13 words (0.016746%)
- 36 missed entities with 14 words (0.010576%)
- 17 missed entities with 15 words (0.004994%)
- 14 missed entities with 16 words (0.004113%)
- 10 missed entities with 17 words (0.002938%)
- 4 missed entities with 18 words (0.001175%)
- 5 missed entities with 19 words (0.001469%)
- 3 missed entities with 20 words (0.000881%)
- 4 missed entities with 21 words (0.001175%)
- 1 missed entities with 22 words (0.000294%)
- 2 missed entities with 23 words (0.000588%)
- 3 missed entities with 24 words (0.000881%)
- 2 missed entities with 25 words (0.000588%)
- 2 missed entities with 26 words (0.000588%)
- 1 missed entities with 27 words (0.000294%)
- 1 missed entities with 29 words (0.000294%)
Additionally, a total of 51 (0.014983%) entities were missed due to the maximum input length.
Spreading data between multiple samples: 100%
131767/131767 [00:18<00:00, 7164.64 examples/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 trainer.train()
File /shared/jupyter/.venv/lib/python3.10/site-packages/transformers/trainer.py:1537, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1535 hf_hub_utils.enable_progress_bars()
1536 else:
-> 1537 return inner_training_loop(
1538 args=args,
1539 resume_from_checkpoint=resume_from_checkpoint,
1540 trial=trial,
1541 ignore_keys_for_eval=ignore_keys_for_eval,
1542 )
File /shared/jupyter/.venv/lib/python3.10/site-packages/transformers/trainer.py:1854, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1851 self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
1853 with self.accelerator.accumulate(model):
-> 1854 tr_loss_step = self.training_step(model, inputs)
1856 if (
1857 args.logging_nan_inf_filter
1858 and not is_torch_tpu_available()
1859 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1860 ):
1861 # if loss is nan or inf simply add the average of previous logged losses
1862 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File /shared/jupyter/.venv/lib/python3.10/site-packages/transformers/trainer.py:2735, in Trainer.training_step(self, model, inputs)
2732 return loss_mb.reduce_mean().detach().to(self.args.device)
2734 with self.compute_loss_context_manager():
-> 2735 loss = self.compute_loss(model, inputs)
2737 if self.args.n_gpu > 1:
2738 loss = loss.mean() # mean() to average on multi-gpu parallel training
File /shared/jupyter/.venv/lib/python3.10/site-packages/transformers/trainer.py:2758, in Trainer.compute_loss(self, model, inputs, return_outputs)
2756 else:
2757 labels = None
-> 2758 outputs = model(**inputs)
2759 # Save past state if it exists
2760 # TODO: this needs to be fixed and made cleaner later.
2761 if self.args.past_index >= 0:
File /shared/jupyter/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
1509 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1510 else:
-> 1511 return self._call_impl(*args, **kwargs)
File /shared/jupyter/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
1515 # If we don't have any hooks, we want to skip the rest of the logic in
1516 # this function, and just call forward.
1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1518 or _global_backward_pre_hooks or _global_backward_hooks
1519 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520 return forward_call(*args, **kwargs)
1522 try:
1523 result = None
File /shared/jupyter/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py:687, in convert_outputs_to_fp32.<locals>.forward(*args, **kwargs)
686 def forward(*args, **kwargs):
--> 687 return model_forward(*args, **kwargs)
File /shared/jupyter/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py:675, in ConvertOutputsToFp32.__call__(self, *args, **kwargs)
674 def __call__(self, *args, **kwargs):
--> 675 return convert_to_fp32(self.model_forward(*args, **kwargs))
File /shared/jupyter/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:16, in autocast_decorator.<locals>.decorate_autocast(*args, **kwargs)
13 @functools.wraps(func)
14 def decorate_autocast(*args, **kwargs):
15 with autocast_instance:
---> 16 return func(*args, **kwargs)
File /shared/jupyter/.venv/lib/python3.10/site-packages/span_marker/modeling.py:153, in SpanMarkerModel.forward(self, input_ids, attention_mask, position_ids, start_marker_indices, num_marker_pairs, labels, num_words, document_ids, sentence_ids, **kwargs)
136 """Forward call of the SpanMarkerModel.
137
138 Args:
(...)
150 SpanMarkerOutput: The output dataclass.
151 """
152 token_type_ids = torch.zeros_like(input_ids)
--> 153 outputs = self.encoder(
154 input_ids,
155 attention_mask=attention_mask,
156 token_type_ids=token_type_ids,
157 position_ids=position_ids,
158 )
159 last_hidden_state = outputs[0]
160 last_hidden_state = self.dropout(last_hidden_state)
File /shared/jupyter/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
1509 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1510 else:
-> 1511 return self._call_impl(*args, **kwargs)
File /shared/jupyter/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
1515 # If we don't have any hooks, we want to skip the rest of the logic in
1516 # this function, and just call forward.
1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1518 or _global_backward_pre_hooks or _global_backward_hooks
1519 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520 return forward_call(*args, **kwargs)
1522 try:
1523 result = None
File /shared/jupyter/.venv/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:1062, in DebertaV2Model.forward(self, input_ids, attention_mask, token_type_ids, position_ids, inputs_embeds, output_attentions, output_hidden_states, return_dict)
1059 if token_type_ids is None:
1060 token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)
-> 1062 embedding_output = self.embeddings(
1063 input_ids=input_ids,
1064 token_type_ids=token_type_ids,
1065 position_ids=position_ids,
1066 mask=attention_mask,
1067 inputs_embeds=inputs_embeds,
1068 )
1070 encoder_outputs = self.encoder(
1071 embedding_output,
1072 attention_mask,
(...)
1075 return_dict=return_dict,
1076 )
1077 encoded_layers = encoder_outputs[1]
File /shared/jupyter/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
1509 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1510 else:
-> 1511 return self._call_impl(*args, **kwargs)
File /shared/jupyter/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
1515 # If we don't have any hooks, we want to skip the rest of the logic in
1516 # this function, and just call forward.
1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1518 or _global_backward_pre_hooks or _global_backward_hooks
1519 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520 return forward_call(*args, **kwargs)
1522 try:
1523 result = None
File /shared/jupyter/.venv/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:900, in DebertaV2Embeddings.forward(self, input_ids, token_type_ids, position_ids, mask, inputs_embeds)
897 mask = mask.unsqueeze(2)
898 mask = mask.to(embeddings.dtype)
--> 900 embeddings = embeddings * mask
902 embeddings = self.dropout(embeddings)
903 return embeddings
RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 2
Any ideas if this encoder will work and how to make it work? Thanks! No issues if I run it on roberta-large for example
The text was updated successfully, but these errors were encountered:
Hi there!
I was playing around with your google colab and wanted to few-nerd on
encoder_id = "microsoft/deberta-v3-large"
but when I reach the train part if fails with
RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 2
:Any ideas if this encoder will work and how to make it work? Thanks! No issues if I run it on
roberta-large
for exampleThe text was updated successfully, but these errors were encountered: