[curves] Curve19119: A legacy-level little brother of Curve25519

Tue Aug 1 14:35:52 PDT 2017

Thank you, Mike, for your reply!

 >In addition to FourQ and Curve19119, other fast-ish options include 
2^252-2^232-1 (but maybe not on tiny micros);
 >NIST’s 2^192-2^64-1 (but again maybe not on the M0?); and the 
Goldi-like 2^216-2^108-1. But I agree that you
 >should stick with Curve25519 if you can afford it.

I've looked a bit closer at FourQ in the last days. I came to the 
conclusion that the speedup factor of ~2 reported by Liu, Longa, 
Pereira, Reparaz and Seo [LLPRS] in https://eprint.iacr.org/2017/434.pdf
are realistic for the AVR and MSP430. I moreover expect that this might 
hold also for the M0 target (not reported by LLPRS).

It must, however, be clear that one buys this with significantly larger 
RAM requirements (that is most precious, particularly in the very small 
targets) and I believe that it will also be much harder to protect FourQ 
against side channel attacks, especially regarding fault injection, 
without loosing the speed advantage.

>Elligator 2 works for any elliptic curve (over a field of odd characteristic IIRC) with at least one point of
>order 2.  So you can use it on FourQ.

Thank you for pointing this out. I think that this makes FourQ for these 
targets also an interresting choice for PAKE.

I don't believe, however that the claimed huge speedup factor for the M4 
from LLPRS actually holds. The Curve25519 figures taken for the 
comparision with FourQ for the M4 (~1.4 million cycles cor X25519) are 
in my perception much less optimized than the LLPRS implementation for 
their "complex" composed field.

I expect that actual speed advantage of FourQ on the M4 might in fact 
only be in the range of ~25%. For the M4, the UMAAL instruction gives 
you carry accumulation essentially for free and I know that 
constant-time field arithmetics on the M4 for the highly regular fe25519 
field is actually a factor of ~1.5 faster for Curve25519 than the values 
reported for FourQ by LLPRS. I think that this will eat up most of the 
advantage from the endomorphisms. Actually this might hold also for 
other "larger" targets bigger than the M4, where the regular 255 bit 
field could be implemented ay faster than the more complex "complex" 
field of FourQ?
> In addition to FourQ and Curve19119, other fast-ish options include

>NIST’s 2^192-2^64-1 (but again maybe not on the M0?);

The problem in my opinion is to implement the many conditional additions 
for the solinas prime in constant time. At least, I did try it and I did 
not find a way to implement that efficiently.
> and the Goldi-like 2^216-2^108-1 or 2^252-2^232-1 (but maybe not on tiny micros);
Here I expect that the fact that the field is not really much smaller 
than for 2^255 - 1 will be the reason that prevents significant speedups 
in comparison to Curve25519.

> I realized at some point that “SPAKE2-EE” is actually an existing protocol called “PAK”, which is why I didn’t ever write it up.
>
> On a constrained device, a SPEKE variant (like PACE) is a faster and simpler choice.  Also I thought the SPEKE patent expired, so why not use that?  SPAKE2 (-EE and otherwise) has a stronger security proof, supports augmentation, and is potentially faster with precomputation.
Actually I did mix up EKE and SPEKE regarding the patent that has 
expired this spring. We had chosen PACE predominantly because our first 
products came on the market way earlier than the expiration date. I 
think also that it is a good idea to make both parties contribute 
entropy for generating a session-specific ephemeral generator, such as 
done in PACE.

When using such an ephemeral generator, one will need to mix in the 
entropy from both parties together with the password somehow. I think 
that the method used by PACE for doing this is not so bad. We do want 
predominantly protect the password. Note that regarding side channels it 
might be beneficial to actually use the password only in a symmetric 
encryption primitive (such as in PACE). I expect that it will be much 
easier to protect a symmetric cypher against side channel leakage (e.g. 
by masking) than e.g. a SHA variant function used for mixing the entropies.
> I don’t know if there is an augmented version of SPEKE.
Yes it is possible to augment SPEKE. We have ourselves developed an 
augmented version of PACE for our bluetooth product line. It's not yet 
published but we are working on a paper on that topic.
> It is possible to compute Elligator straight to affine with only one exponentiation, using the inverse square root trick.
>
> You want x = -A/(1+ur^2), or -A-that, so you need to simultaneously divide by 1+ur^2 and compute the Legendre symbol of the putative y^2 = x(x^2+Ax+1).  If you compute that projectively to get y^2 = a/b, then L(a/b) = L(ab), so you don’t need to do the division first.  Let c=ab, d=1+ur^2.  Then you can compute
>
> s := (cd^2) ^ ((p-3)/2)
>
> Then (s^2 * cd^2) = (cd^2)^(p-2) = 1/cd^2, so that 1/d = (s^2 * cd^2) * cd unless c=0 or d=0.  But if c=0 then the point is (0,0) or infinity, and if d=0 then it’s infinity, so in either case you should return 0 anyway.  At the same time, (s * cd^2) = (cd^2)^((p-1)/2) = L(cd^2) = L(c) unless d=0.  So that gives you the inverse and Legendre symbol at the same time.  A similar trick with (cd^2)^((p-5)/8) can give you the square root, but you don’t actually need that here if you’re using an x-only ladder.
Thank you for pointing this out. I did not yet find time to analyze the 
details, but this will indeed give an additional speedup for any SPEKE 
and PACE implementation running on an curve for that Elligator2 is suitable.

Thank you again, Mike, for your valuable feedback.

Yours,

Björn.