[curves] Balancing reduced-radix and full-radix performance for extra-strength primes
trevp at trevp.net
Mon Jan 19 20:26:13 PST 2015
On Mon, Jan 19, 2015 at 5:03 PM, D. J. Bernstein <djb at cr.yp.to> wrote:
> http://bench.cr.yp.to/impl-scalarmult/curve25519.html compares speeds of
> various Curve25519 implementations on various platforms. Longa was
> presenting a small extract from this data.
> For "64-bit" platforms the best non-vectorized radix is sometimes 2^64,
> sometimes 2^51, for reasons explained in the Ed25519 paper. Vectorized
> speed records---e.g., speed records on typical smartphones---typically
> use radix 2^25.5.
> Longa seemed to be trying to mislead people into believing that this
> platform variation in _implementation_ techniques creates a platform
> variation in _prime_ choices. It should be obvious that these data
> points say nothing about prime choice: they're all for one prime.
You're disputing his Slide 9 I think ("Comparison of x64
implementations Unsaturated versus Saturated"). That slide may be
simplistic, but his overall point seems correct that platform
variation in implementation techniques might favor different primes.
It seems like you and Mike agree that full-radix might be more
efficient on some platforms.
It also seems true that some primes will be more or less efficient in
a full-radix implementation, For example, I would think primes that
slightly spill over into the next word are less efficient in
full-radix (like 2^389-21, or 2^2521-1).
So his argument that we should consider full-radix when choosing
performance-oriented primes seems reasonable to me. Seems like we
should try to get more analysis and timings for full-radix Ted37919,
Curve41417, Goldilocks, E-521, etc.
> A closer study of implementation techniques shows that choosing mediocre
> primes such as 2^256-189
He was using this argument for 2^379-19, not 2^256-189. Obviously
2^256-189 was chosen based on different criteria, with less emphasis
> In other words, switching to mediocre primes hurts some platforms and
> doesn't help others. To avoid seeing this damage, one has to ignore all
> the reduced-radix implementations and ignore all the platforms where
> those implementations are best---and even within these blinders there's
> no advantage to the mediocre primes. Not A Tough Decision(tm).
> I'm not saying that at _every_ security level there's a unique standout
> cross-platform prime; I'm just saying that Longa wasn't looking at the
> right data for the decision that he claimed to be interested in.
>> 2) Full-radix may be safer and easier to implement, since
>> reduced-radix requires "Bound analysis" to prevent inadvertent word
>> spilling, thus is "error prone, errors are more difficult to catch".
> There's a long history of hard-to-catch errors in full-radix software.
> https://cryptojedi.org/papers/#verify25519 is the first step towards
> confidently getting out of this mess---and it's significantly _easier_
> for reduced-radix software than for full-radix software.
That's interesting and not intuitive to me. I'd think reduced-radix
was trickier to implement and analyze. Huh...
More information about the Curves