[curves] E-521

Michael Hamburg mike at shiftleft.org
Thu Oct 23 18:38:45 PDT 2014


> On Oct 23, 2014, at 10:05 AM, Trevor Perrin <trevp at trevp.net> wrote:
> 
> On Thu, Oct 23, 2014 at 5:04 AM, Samuel Neves <sneves at dei.uc.pt> wrote:
>> 
>> The Haswell cycle counts mentioned in the paper do not take Turbo Boost into account, and therefore are lower than the
>> real number; taking into account that the Core i7 4770 chip was used (3.4 to 3.9 GHz overclocking), the Haswell cycle
>> count should be ~893000.  I have been able to get this slightly down to ~884000.
>> 
>> On Sandy Bridge, I get somewhat better timings than reported by DJB: ~1030000 cycles.
> 
> Thanks!, updated [1].
> 
> By that scoring, Mike's Goldilocks implementation retains the
> "relative efficiency" crown.  But the E-521 numbers are without ASM
> optimization.  And their 9 limbs / 58-bit radix seems impressive
> (Goldlilocks uses 8 limbs / 56-bit radix).
> 
> So this seems pretty close, I wonder what a better-optimized 521 could do...
> 
> 
> Trevor
> 
> 
> [1] https://docs.google.com/a/trevp.net/spreadsheet/ccc?key=0Aiexaz_YjIpddFJuWlNZaDBvVTRFSjVYZDdjakxoRkE&usp=sharing#gid=0

The Goldilocks code is almost ready to support E-521.  As a warmup non-Ed448 curve, I took preliminary benchmarks for Ed480-Ridinghood.  From one benchmark run (not SUPERCOP, etc):
        Goldilocks: 178kcy keygen, 536kcy ecdh
        Ridinghood: 193kcy keygen, 617kcy ecdh
Difference = +8%, +15%.

The +15% reflects some sections which aren’t optimized yet, along the lines of if (EDWARDS_D > 0) { do something slow; } or if (Mike hasn’t calculated the carry handling limits yet) { reduce just to be safe; }

I also have a 521-bit multiplier which takes 145 Haswell cycles in preliminary benchmarks.  Like Granger-Scott, it uses 9 limbs of 58 bits each.  It’s still using 3-way Chung-Hasan, so it does more multiplies and fewer adds than the Granger-Scott technique.  Its speed advantage, if it actually has one, is probably from tighter tuning.  But if that’s accurate it might be comparably fast to what Granger and Scott quoted (but measured properly, with TurboBoost off).

— Mike


More information about the Curves mailing list