You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently each training loop has an evaluation loop but it is not debugged nor used so far.
It needs to be generalised to be launched also outside the training activities, and to support specific language modelling metrics.
It would be nice if a report can be generated highlighting the performance achieved also in comparison with other models.
TODO
Understand that libraries such as openai/evals or FastChat can be adapted to be used as an evaluation tool
Debug Evaluation of the model.
Collect and Compute relevant metrics.
Launch the evaluation loop also outside the training.
Produce a meaningful report that can compare the performance of one or more models.
The text was updated successfully, but these errors were encountered:
Description
Currently each training loop has an evaluation loop but it is not debugged nor used so far.
It needs to be generalised to be launched also outside the training activities, and to support specific language modelling metrics.
It would be nice if a report can be generated highlighting the performance achieved also in comparison with other models.
TODO
The text was updated successfully, but these errors were encountered: