-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q3 ROADMAP #30
Comments
Looking forward to |
This paper can be helpful for "understanding why the BoS token has an impact". IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact It is found that such huge outliers usually occur at the [BOS] token and some other uninformative initial tokens (e.g., "." or ",") at particular channels, regardless of the rest of the input sequence. We thus name these tokens pivot tokens given their dominating values in the activation. The attention scores will be concentrated on these pivot tokens than the rest ones, a.k.a attention sinks (Xiao et al., 2024). |
May I ask when AWQ will be supported |
We are actively working on this now. Ideally in a week or so |
* initial commit * is this a version problem * or wrong find_packages logic * all_right * initial commit * add load_compress func * More tests (loading dense tensors) * simplify UX * cosmetic changes * finishing the PR * finalize the PR * Update src/compressed_tensors/compressors/sparse_bitmask.py * disable ipynb test
SUMMARY:
oneshot
compressed-tensors
intoAutoModelForCausalLM
(deprecateSparseAutoModel
)run-compressed
inAutoModelForCausalLM
sequential_update=True
MoE
models end-to-end throughvllm
XXForConditionalLM
and embedding modelsSparseAutoModel
The text was updated successfully, but these errors were encountered: