<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jan 30, 2014 at 12:04 PM, Robert Ransom <span dir="ltr"><<a href="mailto:rransom.8774@gmail.com" target="_blank">rransom.8774@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">On 1/30/14, Paulo S. L. M. Barreto <<a href="mailto:pbarreto@larc.usp.br">pbarreto@larc.usp.br</a>> wrote:<br>
> Hello to all,<br>
><br>
> On Wed Jan 29 18:07:34 PST 2014, Robert Ransom <<a href="mailto:rransom.8774@gmail.com">rransom.8774@gmail.com</a>><br>
> wrote:<br>
><br>
>> My main problem with the ‘Brazil’ curves is that all of them except<br>
>> M-221 (even the E-* curves) have really *ugly* coordinate fields.<br>
>> They make the NSA fields look nice by comparison (and at least those<br>
>> would have the advantage of requiring less extra hardware within a<br>
>> TPM, as someone mentioned on one of the IETF lists<br>
><br>
> I'm just now coming back from vacations so this message is very brief, but<br>
> you<br>
> got my attention. Could you please state your 'beauty' metric<br>
> quantitatively?<br>
<br>
</div>Curve operations modulo a number of the form 2^b - k are easy to<br>
implement efficiently if there is a regular arrangement of limb sizes<br>
small enough that an operation of the form (a+b)*(c+d) can be<br>
performed without carrying the inputs to the multiplication, and with<br>
reduction modulo x^l - k performed during the multiplication, for both<br>
signed 64-bit and unsigned 128-bit multiplication-result lengths.<br></blockquote><div><br></div><div>Dear Robert,</div><div><br></div><div>Your comments are very relevant, but let me justify some of our design choices. We picked field sizes similar to NIST curves, trying to provide something closer to drop-in replacements. Additionally, we considered not only vector or hardware implementations, but also the fast integer multipliers already available to software implementations in many platforms. Of course, these could require specialized assembly-language multipliers for optimal performance. You can find some brief notes below.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">* Curve3617 coordinates (mod 2^414 - 17) can be represented in four<br>
reasonable ways: a uniform sequence of 9 46-bit or 18 23-bit limbs<br>
(trivial to implement), two repetitions of (52,52,52,51)-bit limbs<br>
(faster than the 9-limb representation), or two repetitions of<br>
(26,26,26,26,26,26,26,25)-bit limbs (faster than the 18-limb<br>
representation, and still (just barely) safe). Carries are<br>
vectorizable for all of these representations.<br>
<br>
* E-382 coordinates (mod 2^382 - 105) have a vectorizable 16-limb<br>
representation for 32-bit processors (same shape as that for<br>
Curve3617, with two fewer bits per limb), but that's as long as for</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Curve3617, and to obtain a lower security level. Because 105 is so<br>
large, even a 15-limb representation requires that reduction modulo<br>
x^15 - 105 be separated from multiplication, and carries do not<br>
appear to be easily vectorizable.<br></blockquote><div><br></div><div>The difference in security levels is rather small, only ~16 bits, which is less significant at the 192-bit level. However, conventional non-vectorized implementations would be penalized by the extra limb in Curve3617. If reduction gets too expensive, we have more room for lazy reduction.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
* M-383 coordinates (mod 2^383 - 187) are worse than E-382: no obvious<br>
way to vectorize carries for the 16-limb representation, and 187 is<br>
even bigger than 105.<br></blockquote><div><br></div><div>Similar to the above, but with less room for lazy reduction.</div><div><br></div><div>Best,</div><div>--<br>Diego de Freitas Aranha<br>Department of Computer Science - University of Brasília<br>
<a href="http://www.cic.unb.br/~dfaranha" target="_blank">http://www.cic.unb.br/~dfaranha</a></div></div></div></div>