[curves] Sandy2x

Trevor Perrin trevp at trevp.net
Tue Sep 29 13:29:41 PDT 2015

Tung Chou's "Sandy2x" code for 25519 on Sandy Bridge and Ivy Bridge is
around 10-20% faster than other implementations:


Speedup is attributed to using the 2-way 32x32->64 vectorized
multiplier (vpmuludq) instead of the 64x64->128 serialized multiplier.

The paper doesn't say whether this strategy also pays off on Haswell
(which seems to be lagging in 25519 performance?):



