Reducing Transformer Key-Value Cache Size with Cross-Layer AttentionWilliam BrandonMayank Mishraet al.2024NeurIPS 2024
Granite code models: A family of open foundation models for code intelligenceMayank MishraMatthew Stalloneet al.2024arXiv