You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Eval Harness enablement - to be scoped and broken up further.
The goal with this ticket is to leverage the LLM Eval Harness code to tie in eval benchmarking into the performance benchmarking that is done by GuideLLM.
Eval Benchmarking on public/private datasets generally can take a long time, we will have the user plug in how long they want to run a benchmark for. Can’t use a specific amount of time. - short, medium, long being the amount of time/ each benchmark can be a task.
The challenges here are developing subsets that are representative of the massive original datasets in order to get accurate benchmarks. Due to this reality, the first task here to be done by the research team is to split up these larger benchmark datasets into smaller, benchmarkable subsets so we can run evals in a matter of minutes vs. hours.
Mark to lay out what we need there to extend out the backend.
The text was updated successfully, but these errors were encountered:
Eval Harness enablement - to be scoped and broken up further.
The goal with this ticket is to leverage the LLM Eval Harness code to tie in eval benchmarking into the performance benchmarking that is done by GuideLLM.
Eval Benchmarking on public/private datasets generally can take a long time, we will have the user plug in how long they want to run a benchmark for. Can’t use a specific amount of time. - short, medium, long being the amount of time/ each benchmark can be a task.
The challenges here are developing subsets that are representative of the massive original datasets in order to get accurate benchmarks. Due to this reality, the first task here to be done by the research team is to split up these larger benchmark datasets into smaller, benchmarkable subsets so we can run evals in a matter of minutes vs. hours.
Mark to lay out what we need there to extend out the backend.
The text was updated successfully, but these errors were encountered: