[curves] The great debate over point formats
rransom.8774 at gmail.com
Fri Jan 31 01:57:31 PST 2014
On 1/31/14, Mike Hamburg <mike at shiftleft.org> wrote:
> You can access ADC by using __uint128_t on clang and gcc (and possibly on
> other platforms). It's still faster in assembly or intrinsics, mostly due
> to the register allocator barfing on EC numerics code, but it still pretty
> much works in C.
But not in ADC's full generality. As far as I know, there is no way
to use ADC to propagate a carry all the way through a
larger-than-128-bit number from C, even with the 128-bit type.
(ADC is also available in that limited sense in 32-bit mode using the
> ADC is also passably fast on most processors, though it works better on AMD
> than Intel I've heard. At 256 bits, it's not necessarily worth having extra
> limbs to reduce the number of ADC instructions.
The Ed25519 paper reports that ADC can be used only once every two
cycles on then-recent Intel processors, compared to up to three ADDs
in every cycle; and that because of that limitation on ADC, 5 51-bit
limbs are indeed faster than 4 64-bit limbs on those processors.
(But Samuel Neves reports that Intel processors are improving.)
> At 384 bits, it may be
> worth going to extra limbs, and also Karatsuba may be profitable. By 448
> bits, you almost definitely want both reduced-radix and Karatsuba. I think
> 2^521-1 probably wants 9x58-bit limbs and 3-way Karatsuba, but I haven't
> tuned my implementation yet.
> Diego, have you implemented arithmetic mod the primes in your paper? Do you
> know whether they're fast or not, and with what implementations, and maybe
> even on what platforms, or are you speculating?
I don't know about him, but I'm speculating (though with a few
calculations to support them).
My main interest (for curves for long-term security) is in simple C
implementations, though I do want to choose curves which I know will
be efficiently implementable on NEON vector units.
More information about the Curves