[curves] The great debate over point formats

Thu Jan 30 06:04:20 PST 2014

On 1/30/14, Paulo S. L. M. Barreto <pbarreto at larc.usp.br> wrote:
> Hello to all,
>
> On Wed Jan 29 18:07:34 PST 2014, Robert Ransom <rransom.8774 at gmail.com>
> wrote:
>
>> My main problem with the ‘Brazil’ curves is that all of them except
>> M-221 (even the E-* curves) have really *ugly* coordinate fields.
>> They make the NSA fields look nice by comparison (and at least those
>> would have the advantage of requiring less extra hardware within a
>> TPM, as someone mentioned on one of the IETF lists
>
> I'm just now coming back from vacations so this message is very brief, but
> you
> got my attention. Could you please state your 'beauty' metric
> quantitatively?

Curve operations modulo a number of the form 2^b - k are easy to
implement efficiently if there is a regular arrangement of limb sizes
small enough that an operation of the form (a+b)*(c+d) can be
performed without carrying the inputs to the multiplication, and with
reduction modulo x^l - k performed during the multiplication, for both
signed 64-bit and unsigned 128-bit multiplication-result lengths.

For example:

* Curve25519 coordinates can be represented using a repeating sequence
  (26, 25, 26, 25, ...) of 10 limb sizes on 32-bit processors or 5
  uniform 51-bit limbs on 64-bit processors.  Considering the 32-bit
  representation: the sum of two coordinates has at most 27 bits per
  limb, k=19 fits in 5 bits, and the number of limbs 10 fits in 4
  bits, so multiplication can produce 27*2 + 5 + 4 = 63-bit limbs,
  which is clearly safe; I don't have to find a more precise estimate
  to justify implementing that multiplication routine.

* Curve1174 coordinates (mod 2^251 - 9) can be represented using a
  sequence of (26, 25, 25, 25, ...)-bit limbs (with no repeating
  pattern) on 32-bit processors, or (51, 50, 50, 50, 50)-bit limbs on
  64-bit processors.  The 64-bit pattern is tolerable, though it
  requires a specialized multiply routine, but I don't see a good way
  to vectorize the 32-bit computation as-is.  Since 9 is a really
  small number, it's possible to add another bit (so the modulus is
  2^252 - 18, and there's a 5-limb pattern of (26,25,25,25,25)-bit
  limbs) to make carries vectorizable in the 32-bit computation, but
  since there is no longer a significant benefit of Curve1174 over
  Curve25519, it's not worth implementing.

* Curve3617 coordinates (mod 2^414 - 17) can be represented in four
  reasonable ways: a uniform sequence of 9 46-bit or 18 23-bit limbs
  (trivial to implement), two repetitions of (52,52,52,51)-bit limbs
  (faster than the 9-limb representation), or two repetitions of
  (26,26,26,26,26,26,26,25)-bit limbs (faster than the 18-limb
  representation, and still (just barely) safe).  Carries are
  vectorizable for all of these representations.

* E-382 coordinates (mod 2^382 - 105) have a vectorizable 16-limb
  representation for 32-bit processors (same shape as that for
  Curve3617, with two fewer bits per limb), but that's as long as for
  Curve3617, and to obtain a lower security level.  Because 105 is so
  large, even a 15-limb representation requires that reduction modulo
  x^15 - 105 be separated from multiplication, and carries do not
  appear to be easily vectorizable.

* M-383 coordinates (mod 2^383 - 187) are worse than E-382: no obvious
  way to vectorize carries for the 16-limb representation, and 187 is
  even bigger than 105.

* E-222 coordinates (mod 2^222 - 117) just barely allow a 9-limb
  representation (which should be a repeating pattern of (25, 25, 24)
  bits) on 32-bit processors, and allows a 4-limb representation (56,
  55, 56, 55) on 64-bit processors.  M-221 allows 32-bit processors to
  use an 8-limb representation (not obviously vectorizable) for a
  group two bits smaller, and can be expanded to 2^222 - 6 to use the
  same 9-limb pattern as E-222 if that turns out to be faster.  The
  coordinate field of M-221 is strictly better than E-222 now that
  Elligator 2 is available.

The NSA curves' coordinate fields essentially require an
assembly-language implementation, but have two benefits: (a) there is
software already available to operate in those fields, and (b) there
are hardware devices which implement those fields, and it should be
possible for manufacturers to extend future versions of them to
support a Montgomery/Edwards curve over the same field with somewhat
less cost than over a different field.

> Also, the curve over F_{2^521 - 1} (which was, in a different form,
> independently discovered by Mike Hamburg, and then also by Dan Bernstein
> and
> Tanja Lange) is ugly?

I had forgotten about E-521.

Initially, I had thought that 2^521 - 1 would have a non-vectorizable
limb pattern on 32-bit processors (where vectorization currently
matters).  I have since noticed that the modulus could be enlarged to
2^522 - 2, which has a vectorizable 20-limb pattern on 32-bit
processors.  It's still not as implementation-friendly a field as
Curve3617 has, but it's not as ugly as 2^b - k with k>100.

Robert Ransom