Activation sparsity unlocking efficient deep learning at scale

Do Transformers Need Three Projections? Systematic Study of QKV Variants?

This ICML 2026 paper introduces “Projective Sharing” in transformer self-attention, systematically showing that sharing key and value projections (K=V) achieves a 50% KV cache memory reduction with negligible accuracy loss. Combined with Multi-Query Attention, the technique reduces KV cache requirements by up to 96.9%, offering a practical path to efficient long-context and edge-device LLM deployment

Download Full Research Paper

Be among the first to receive news, events, and updates!
For investor relations, please use IR@brainchip.com.
Name*
First Last
Email*
Marketing Communications
No thanks, please do not contact me with marketing or promotional communications.
BrainChip may contact you with information about products, services, developer resources, events, and technical updates relevant to your organization. You can opt out of these communications at any time.

Provenance Networks: End-to-End Exemplar-Based Explainability
The Illusion of Computation: Why LLMs Are Not Universal Turing Machines
Hardwired MLP Architectures for Computing Lp Norms and Distances: A Hardware-Friendly Approach
A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera

Sensing Capabilities

Akida Product Portfolio

IP Cores

Neural Models

Hardware Chips

Dev Tools

Reference Platforms

About BrainChip

Company

Careers

Investor Relations

News & PR

Events

Videos

Podcasts

Blog

Publications

White Papers

Patents

Developer Hub

Get Started

Sensing Capabilities

Akida Product Portfolio

IP Cores

Neural Models

Hardware Chips

Dev Tools

Reference Platforms

About BrainChip

Company

Careers

Investor Relations

News & PR

Events

Videos

Podcasts

Blog

Publications

White Papers

Patents

Developer Hub

Get Started

Developer Hub

Get Started

Developer Hub

Get Started

Do Transformers Need Three Projections? Systematic Study of QKV Variants?

Related Posts