[chatllama]Puzzled about the update of the critic model #338

zhuweipg99 · 2023-04-28T07:41:44Z

When I looked into the compute the value loss in trainer.py line 1012-1017,

value_loss_clipped = old_values + (values - old_values).clamp(-critic_eps_clip, critic_eps_clip)
value_loss1 = (value_loss_clipped - rewards) ** 2
value_loss2 = (values - rewards) ** 2
value_loss = torch.max(value_loss1, value_loss2).mean()

I think the values and rewards are equal to the old_values, cause they use the same model to compute the score.
I will be very grateful if you guys can answer my confuse.

The text was updated successfully, but these errors were encountered:

zhuweipg99 changed the title ~~Puzzled about the update of the critic model~~ [chatllama]Puzzled about the update of the critic model Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chatllama]Puzzled about the update of the critic model #338

[chatllama]Puzzled about the update of the critic model #338

zhuweipg99 commented Apr 28, 2023

[chatllama]Puzzled about the update of the critic model #338

[chatllama]Puzzled about the update of the critic model #338

Comments

zhuweipg99 commented Apr 28, 2023