Don't disable uneven k to support more headdims #21

njhill · 2024-09-27T23:52:59Z

Originally disabled by @WoosukKwon in eee8e47

WoosukKwon · 2024-09-30T18:29:06Z

Adding the flag increased the wheel size of vllm-flash-attn from 107 MB to 143MB (not sure whether this is before compression).

njhill · 2024-09-30T18:37:10Z

Thanks @WoosukKwon, I guess that's fairly significant.

The motivation here is that we depend of flash attention currently for certain features like multi-step scheduling, so it's a big performance downside for models which have these other head sizes (in particular some of the IBM Granite models use head_dim 80)

WoosukKwon · 2024-09-30T18:42:58Z

@njhill We can just add the head size 80 to

flash-attention/csrc/flash_attn/src/generate_kernels.py

Line 18 in 344c988

HEAD_DIMENSIONS = [32, 64, 96, 128, 160, 192, 224, 256]

Would that work for you?

njhill · 2024-09-30T19:00:35Z

@WoosukKwon yes, I think for our immediate needs that would be great, if you're sure that will be sufficient!

njhill · 2024-10-02T17:45:22Z

@WoosukKwon I've opened #22 for this

WoosukKwon · 2024-10-02T20:44:34Z

@njhill Since the idea above doesn't work, can you check how much this PR affects the vllm wheel size?

Don't disable uneven k to support more headdims

e065663

njhill mentioned this pull request Sep 27, 2024

[Core] Support all head sizes up to 256 with FlashAttention backend vllm-project/vllm#8910

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't disable uneven k to support more headdims #21

Don't disable uneven k to support more headdims #21

njhill commented Sep 27, 2024

WoosukKwon commented Sep 30, 2024 •

edited

Loading

njhill commented Sep 30, 2024

WoosukKwon commented Sep 30, 2024

njhill commented Sep 30, 2024

njhill commented Oct 2, 2024

WoosukKwon commented Oct 2, 2024

Don't disable uneven k to support more headdims #21

Are you sure you want to change the base?

Don't disable uneven k to support more headdims #21

Conversation

njhill commented Sep 27, 2024

WoosukKwon commented Sep 30, 2024 • edited Loading

njhill commented Sep 30, 2024

WoosukKwon commented Sep 30, 2024

njhill commented Sep 30, 2024

njhill commented Oct 2, 2024

WoosukKwon commented Oct 2, 2024

WoosukKwon commented Sep 30, 2024 •

edited

Loading