Use llama-1b for faster and more accessible examples #924

kylesayrs · 2024-11-19T03:43:55Z

Purpose

Make examples more accessible to users without large GPU resources. Since these are just examples, they should be accessible as possible to encourage easy use and adoption of llm-compressor.

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2024-11-19T03:44:09Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

dsikka

I think we actually want to keep the larger models as most of our questions on vllm are around compressing larger models to run them there. So I dont think we should be making this change as these are an easy reference and are our most common case.

These examples are also used directly in our testing which help identify cases that would otherwise be ignored when dealing with larger memory requirements.

I wouldn't be opposed to adding a smaller model in addition to the lager models.

horheynm · 2024-11-19T17:16:57Z

I think if the model structure is the same we can use a smaller model.
for example if the difference from the large and small model is N attention heads only then it should be ok, bc the vllm code execution path pathway will be the same.

If the larger model has different architecture, then we should keep the larger model, the execution path will be different

Use llama-1b for faster and more accessible examples

fbb2d64

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs self-assigned this Nov 19, 2024

kylesayrs requested a review from mgoin November 19, 2024 03:44

dsikka requested changes Nov 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use llama-1b for faster and more accessible examples #924

Use llama-1b for faster and more accessible examples #924

kylesayrs commented Nov 19, 2024

github-actions bot commented Nov 19, 2024

dsikka left a comment •

edited

Loading

horheynm commented Nov 19, 2024 •

edited

Loading

Use llama-1b for faster and more accessible examples #924

Are you sure you want to change the base?

Use llama-1b for faster and more accessible examples #924

Conversation

kylesayrs commented Nov 19, 2024

Purpose

github-actions bot commented Nov 19, 2024

dsikka left a comment • edited Loading

Choose a reason for hiding this comment

horheynm commented Nov 19, 2024 • edited Loading

dsikka left a comment •

edited

Loading

horheynm commented Nov 19, 2024 •

edited

Loading