Trevor Perrin trevp at trevp.net
Wed Sep 30 16:30:49 PDT 2015

On Wed, Sep 30, 2015 at 6:16 AM, Tung Chou <blueprint at crypto.tw> wrote:
> Hi Trevor,
> Sandy2x takes 156076 Haswell cycles for X25519 shared-secret
> computation.

Thanks, noted!


> Note that,
> however, the non-vectorized implementation from the Ed25519
> paper performs much better on Haswell than on Ivy Bridge:
> 161648 cycles versus 182708 cycles.

Yeah, 156K vs 162 Kcycles is only a small improvement on the 2011 numbers.

(There was debate earlier about how recent Haswell 25519
implementations change the FourQ:25519 speedup ratio.  The answer
seems to be not much - maybe the speedup is 2.65x instead of 2.75x).

I'm curious why the 25519 implementations in above spreadsheet compare
better with (Hamburg's 448, Gueron's P-256) on Sandy Bridge than
Haswell, if anyone knows.


