Do Transformers Need Three Projections? Systematic Study of QKV Variants?
This ICML 2026 paper introduces “Projective Sharing” in transformer self-attention, systematically showing that sharing key and value projections (K=V) achieves a 50% KV cache memory reduction with negligible accuracy loss. Combined with Multi-Query Attention, the technique reduces KV cache requirements by up to 96.9%, offering a practical path to efficient long-context and edge-device LLM deployment













