1.
A GPU with 4,000 cores can only use 100 of them when computing sequential inner products on 100-dimensional vectors — and that idle capacity is why the SVD reparameterization, theoretically faster than computing the SVD itself, has never worked in practice. FastH fixes it, speeding up matrix inversion 2.7× and determinant 3.5× on a single RTX 2080 Ti.