Do Transformers Need Three Projections? Systematic Study of QKV Variants?

This ICML 2026 paper introduces “Projective Sharing” in transformer self-attention, systematically showing that sharing key and value projections (K=V) achieves a 50% KV cache memory reduction with negligible accuracy loss. Combined with Multi-Query Attention, the technique reduces KV cache requirements by up to 96.9%, offering a practical path to efficient long-context and edge-device LLM deployment

  • Be among the first to receive news, events, and updates!
    For investor relations, please use IR@brainchip.com.