Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get Accuracy Metrics reported #63

Open
parfeniukink opened this issue Oct 28, 2024 · 0 comments · May be fixed by #64
Open

Get Accuracy Metrics reported #63

parfeniukink opened this issue Oct 28, 2024 · 0 comments · May be fixed by #64
Assignees

Comments

@parfeniukink
Copy link
Contributor

Eval Harness enablement - to be scoped and broken up further.

The goal with this ticket is to leverage the LLM Eval Harness code to tie in eval benchmarking into the performance benchmarking that is done by GuideLLM.

Eval Benchmarking on public/private datasets generally can take a long time, we will have the user plug in how long they want to run a benchmark for. Can’t use a specific amount of time. - short, medium, long being the amount of time/ each benchmark can be a task.

The challenges here are developing subsets that are representative of the massive original datasets in order to get accurate benchmarks. Due to this reality, the first task here to be done by the research team is to split up these larger benchmark datasets into smaller, benchmarkable subsets so we can run evals in a matter of minutes vs. hours.

Mark to lay out what we need there to extend out the backend.

@parfeniukink parfeniukink self-assigned this Oct 28, 2024
@parfeniukink parfeniukink linked a pull request Oct 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant