[curves] Balancing reduced-radix and full-radix performance for extra-strength primes

Mon Jan 19 22:09:55 PST 2015

On 01/19/2015 08:26 PM, Trevor Perrin wrote:
> So his argument that we should consider full-radix when choosing 
> performance-oriented primes seems reasonable to me. Seems like we 
> should try to get more analysis and timings for full-radix Ted37919, 
> Curve41417, Goldilocks, E-521, etc.
I think it's reasonable.  Here's some analysis for those primes.

The point of the Goldilocks prime is that Karatsuba is a win at that 
size, and Karatsuba is a pain on full-radix.  So even though Ed448 is 
suitably aligned for a full-radix implementation on 32 bits, that would 
take 14^2 = 196 muls vs 16^2 * 3/4 = 192 for Karatsuba.  The extra adds 
for Karatsuba cost less than the carry handling for full-radix unless 
you have some sort of accelerator, and I don't think UMAAL is enough of 
an accelerator to make up that difference. 64-bit is similar except the 
coefficients aren't aligned.  So it's almost always a win to use 
Karatsuba and reduced-radix.

E-521 takes 9 limbs on 64-bit in either full- or reduced-radix, so 
there's no point in doing full-radix.  On 32-bit, it's 1 - 17^2/18^2 ~ 
11% fewer muls to do full-radix, but again you lose the ability to use 
Karatsuba or Chung-Hasan or the Granger-Scott IBDWT, so even if 
carry-handling is free that's not a win unless you want the simplest 
implementation.  For further comparison, MS would rather pick an 
8-or-16-limb prime (on full-radix) and get a 21% reduction in muls, but 
they still pay a carry-handling penalty and lose Karatsuba/CH/IBDWT.

This analysis doesn't support Longa's point, since it shows cases where 
reduced radix is very fast, even faster than full-radix on a 
comparable-sized prime on a machine where that's fast.  I think the 
point is more valid for 2^389-21, since it's just over a power of 2^32.

I don't know about Curve41417 and Ted37919.  It would probably be close 
for Ted37919, since there the cost reduction is 1 - 6^2/7^2 ~ 27% and 
Karatsuba doesn't apply (at least on 64-bit).
>> On Mon, Jan 19, 2015 at 5:03 PM, D. J. Bernstein <djb at cr.yp.to> wrote:
>> There's a long history of hard-to-catch errors in full-radix software.
>> https://cryptojedi.org/papers/#verify25519 is the first step towards
>> confidently getting out of this mess---and it's significantly _easier_
>> for reduced-radix software than for full-radix software.
> That's interesting and not intuitive to me.  I'd think reduced-radix
> was trickier to implement and analyze.  Huh...
The carry chains in full-radix software can be a pain to get right. In 
my experience, it's not so bad when you have some headroom (2^255-19 and 
2^379-19).  But if the prime is very close to a power of the word size 
(2^256-189 or the NIST Solinas primes), then carries can propagate 
really far.  This costs performance, and it's tempting to come up with a 
clever, hard-to-analyze trick to limit the propagation.

-- Mike