[curves] Balancing reduced-radix and full-radix performance for extra-strength primes
mike at shiftleft.org
Mon Jan 19 22:09:55 PST 2015
On 01/19/2015 08:26 PM, Trevor Perrin wrote:
> So his argument that we should consider full-radix when choosing
> performance-oriented primes seems reasonable to me. Seems like we
> should try to get more analysis and timings for full-radix Ted37919,
> Curve41417, Goldilocks, E-521, etc.
I think it's reasonable. Here's some analysis for those primes.
The point of the Goldilocks prime is that Karatsuba is a win at that
size, and Karatsuba is a pain on full-radix. So even though Ed448 is
suitably aligned for a full-radix implementation on 32 bits, that would
take 14^2 = 196 muls vs 16^2 * 3/4 = 192 for Karatsuba. The extra adds
for Karatsuba cost less than the carry handling for full-radix unless
you have some sort of accelerator, and I don't think UMAAL is enough of
an accelerator to make up that difference. 64-bit is similar except the
coefficients aren't aligned. So it's almost always a win to use
Karatsuba and reduced-radix.
E-521 takes 9 limbs on 64-bit in either full- or reduced-radix, so
there's no point in doing full-radix. On 32-bit, it's 1 - 17^2/18^2 ~
11% fewer muls to do full-radix, but again you lose the ability to use
Karatsuba or Chung-Hasan or the Granger-Scott IBDWT, so even if
carry-handling is free that's not a win unless you want the simplest
implementation. For further comparison, MS would rather pick an
8-or-16-limb prime (on full-radix) and get a 21% reduction in muls, but
they still pay a carry-handling penalty and lose Karatsuba/CH/IBDWT.
This analysis doesn't support Longa's point, since it shows cases where
reduced radix is very fast, even faster than full-radix on a
comparable-sized prime on a machine where that's fast. I think the
point is more valid for 2^389-21, since it's just over a power of 2^32.
I don't know about Curve41417 and Ted37919. It would probably be close
for Ted37919, since there the cost reduction is 1 - 6^2/7^2 ~ 27% and
Karatsuba doesn't apply (at least on 64-bit).
>> On Mon, Jan 19, 2015 at 5:03 PM, D. J. Bernstein <djb at cr.yp.to> wrote:
>> There's a long history of hard-to-catch errors in full-radix software.
>> https://cryptojedi.org/papers/#verify25519 is the first step towards
>> confidently getting out of this mess---and it's significantly _easier_
>> for reduced-radix software than for full-radix software.
> That's interesting and not intuitive to me. I'd think reduced-radix
> was trickier to implement and analyze. Huh...
The carry chains in full-radix software can be a pain to get right. In
my experience, it's not so bad when you have some headroom (2^255-19 and
2^379-19). But if the prime is very close to a power of the word size
(2^256-189 or the NIST Solinas primes), then carries can propagate
really far. This costs performance, and it's tempting to come up with a
clever, hard-to-analyze trick to limit the propagation.
More information about the Curves