todo list [] Validation of parameters for combinations that won't work things that are known not to work FSDP offload and gradient_checkpointing - pytorch/pytorch#82203 adamw_bnb_8bit doesn't play well with FSDP offload