-
This could be a bit anecdotal evidence, but It seem for me, there is no benefit in computing the gradients for the jacobian. It makes the training a bit unstable (again just my observations). Passing the gradient through Again, based on my observations, using cholesky for the dense solver makes the system a bit unstable, when the matrix is not well-conditioned. Replacing this with QR seems to be better, but a bit slower (while also taking lesser memory). Does it make sense to provide a more stable solver than cholesky as a user option?
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @raulTrial,
|
Beta Was this translation helpful? Give feedback.
Hi @raulTrial,