Christian
Ng
Toggle navigation
about
blog
ctrl k
Sparse Threshold Attention
Contents
Introduction
Empirical Mongeness and Monotone Structure in Attention Matrices
Exponential Softmax Structure
Scaling Sparsity
Faster Decode on a 4090, H100, and Beyond