You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running with deep speed strategy, it gives me: Invalidate trace cache @ step 327: expected module 365, but got module 365, which seems to also slow down the deep speed evaluation. (tried both one or multiple GPUs with the following config and both have the same alert)
trainer=L.Trainer(
# Hardware Setup# --------------------------------devices=self.num_gpus_per_node,
num_nodes=self.num_nodes,
accelerator="gpu",
# Training Configuration# --------------------------------strategy=DeepSpeedStrategy(config=self.args.deepspeed), # the path to the json config above
)
trainer.model is the model containing the metrics above
Expected behavior
Expect the metric to be on cuda:0.
No warning alert appears like: Invalidate trace cache @ step 327: expected module 365, but got module 365
Environment
TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): 1.3.1
Lightning version: 2.2.1
Python & PyTorch Version (e.g., 1.0): 3.10.2 & .2.1.2.1+gita8e7c98
Any other relevant information such as OS (e.g., Linux):
🐛 Bug
version 1.3.1
when running the following code:
I got:
Invalidate trace cache @ step 327: expected module 365, but got module 365
, which seems to also slow down the deep speed evaluation. (tried both one or multiple GPUs with the following config and both have the same alert)The deep speed config is:
Trainer created via:
trainer.model
is the model containing the metrics aboveExpected behavior
Invalidate trace cache @ step 327: expected module 365, but got module 365
Environment
conda
,pip
, build from source): 1.3.1The text was updated successfully, but these errors were encountered: