[curves] Sandy2x

Tung Chou blueprint at crypto.tw
Wed Sep 30 06:16:45 PDT 2015

Hi Trevor,

Sandy2x takes 156076 Haswell cycles for X25519 shared-secret
computation. This is very close to the Ivy Bridge cycles. Note that,
however, the non-vectorized implementation from the Ed25519
paper performs much better on Haswell than on Ivy Bridge:
161648 cycles versus 182708 cycles.

Armando Faz-Hernández and Julio López have a Latincrypt paper
this year about an X25519 implementation targeting for Haswell.
They claim 1565xx Haswell cycles for shared-secret computation.
They use a 4-way vectorized multiplier to perform 2 field
multiplications/squarings at the same time. I think a better
approach would be to find 4 independent multiplications/squarings
in the formula and vectorize across them, but I haven't tried.

Best regards,
Tung Chou

On Tue, Sep 29, 2015 at 10:29 PM, Trevor Perrin <trevp at trevp.net> wrote:

> Tung Chou's "Sandy2x" code for 25519 on Sandy Bridge and Ivy Bridge is
> around 10-20% faster than other implementations:
> https://eprint.iacr.org/2015/943
> Speedup is attributed to using the 2-way 32x32->64 vectorized
> multiplier (vpmuludq) instead of the 64x64->128 serialized multiplier.
> The paper doesn't say whether this strategy also pays off on Haswell
> (which seems to be lagging in 25519 performance?):
> https://docs.google.com/spreadsheets/d/1SO3NGX-EgIZ1slw9uExb5FoeFy5TVkuA2lEutP6roYI/edit#gid=0
> Trevor
> _______________________________________________
> Curves mailing list
> Curves at moderncrypto.org
> https://moderncrypto.org/mailman/listinfo/curves
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/curves/attachments/20150930/afd15839/attachment.html>

More information about the Curves mailing list