sneves at dei.uc.pt
Thu Oct 23 05:04:27 PDT 2014
On 22-10-2014 23:22, Trevor Perrin wrote:
> Robert Granger and Michael Scott report a fast E-521 implementation:
> Based on Haswell numbers, its efficiency seems similar to Goldilocks:
> DJB also timed it on Sandy Bridge, though his numbers are worse than
> I'd expect; not sure why:
I have now had the chance to try to reproduce these timings on both microarchitectures. The paper states that the code
is rather fragile with respect to compilers and their different switches---I can certainly corroborate that. I got the
best results using one of 'g++-4.7 -O3 -fwrapv -fomit-frame-pointer -march=native' or 'g++-4.8 -O2 -fwrapv
-fomit-frame-pointer -march=native'. Both clang and icc were significantly slower regardless of which compiler
optimizations were enabled.
The Haswell cycle counts mentioned in the paper do not take Turbo Boost into account, and therefore are lower than the
real number; taking into account that the Core i7 4770 chip was used (3.4 to 3.9 GHz overclocking), the Haswell cycle
count should be ~893000. I have been able to get this slightly down to ~884000.
On Sandy Bridge, I get somewhat better timings than reported by DJB: ~1030000 cycles. According to your spreadsheet,
this changes the score of E-521 to be better on Sandy Bridge than on Haswell (2.29 vs 2.07).
More information about the Curves