You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the values and rewards are equal to the old_values, cause they use the same model to compute the score.
I will be very grateful if you guys can answer my confuse.
The text was updated successfully, but these errors were encountered:
zhuweipg99
changed the title
Puzzled about the update of the critic model
[chatllama]Puzzled about the update of the critic model
Apr 28, 2023
When I looked into the compute the value loss in trainer.py line 1012-1017,
I think the values and rewards are equal to the old_values, cause they use the same model to compute the score.
I will be very grateful if you guys can answer my confuse.
The text was updated successfully, but these errors were encountered: