[curves] New curve25519/ed25519 library
Mehdi Sotoodeh
mehdisotoodeh at gmail.com
Tue Jun 30 16:42:19 PDT 2015
Hi Peter, Mike,
Thank you both for your comments and valuable suggestions.
>The first aspect is that the Curve25519 implementation uses secretly
>indexed memory access which is a possible source for timing attacks.
>State-of-the-art Curve25519 implementations avoid this by using
>constant-time conditional swaps.
>Similar statements apply to the table lookup in the fixed-basepoint
>scalar multiplication, but those would be much more expensive to
>protect.
The constant time implementation of field operations added approximately
20% to the timings. Applying recommendations regarding indexing has a way
bigger impact on the overall performance. I am trying to find a compromised
solution based on blinding that can justify usage of indexing.
The library currently implements randomization of the starting point as
well as using (a+b)*P + B instead of a*P where b is random and B = -b*P.
Another method that can be built into pre-calculated tables is to replace
base point P with Q = (1/r mod BPO)*P and then use (a*r mod BPO)*Q instead
of a*P. The combination of these two methods is another option: ((a-b)*r
mod BPO)*Q + S, where S = -b*Q.
For embedded applications such as IOT, the randomized Q tables can be
generated and programmed into devices during manufacturing and random
blinding b generated during power ups.
Is this is good enough approach? Does it need to be updated time to time?
Please note that I am a software engineer and relying on crypto experts
like you to evaluate the security level and come up with recommendations.
>The second thing is: It's great to hear about new speed records!
>Are you planning to support the SUPERCOP API ...
I am planning to support SUPERCOP API once I am happy with the library.
>My understanding is that the technique that you call folding is only
>efficient for signing; the verification part needs a precomputation
>which is just way too expensive: it needs 384 point doublings! You will
>probably answer that the ed25519_Verify_Init needs to be done just once
>for many verifications, but then a huge expanded public key is sitting
>around in memory and if I'm not totally mistaken, a sliding-window
>method would still be faster.
The folding technique benefits Signing and Key Gen as well as
Verify(partially) for ed25519. Currently curve25519-DH public key
calculation also uses folding but I am going to switch back to Montgomery
ladder. There is no pre-calculation for these since the tables are part of
the code.
For the verify operation, the pre-calculation needs 191D and 11A
(verify_init function). I do not see how you came up with 384D. The total
(pre+post) is 255D and 2W+11 A's where W is the bit size of each fold.
There are some potential saving on the 2W part due to 8-folds of base point
but I have not figure it out yet.
@Mike,
I checked Pippenger's algorithm and did not see any similarity. I am going
to have a look at other references. Thanks for providing this comprehensive
list.
>3) Using multiple tables (so the inner loop is double; add; add; add)
lets you reduce the number of doubles without making the tables too wide.
Can you elaborate on this? Based on the pre-calculations, it is
double-add(P)-add(Q). Are you considering 8 vs 4 combs here?
Best,
Mehdi.
On Tue, Jun 30, 2015 at 11:41 AM, Michael Hamburg <mike at shiftleft.org>
wrote:
> I should add to this that I haven’t actually seen a complete comb
> implementation for Curve25519 before, though there might be one somewhere.
> I think Donna (including Floodyberry’s Donna fork) uses the older windowed
> technique. So if you make a nicely tuned implementation, you will be the
> first. I’m working on one as well, but it uses Decaf so it won’t be
> directly compatible with curve25519.
>
> To tune your comb implementation, here are a few ideas:
>
> 1) As Peter said, you should probably use constant-time lookups for
> signing, though it isn’t necessary for verification. This will limit you
> to a smaller table.
>
> 2) Signed binary uses half the table space vs unsigned binary, and also is
> half as expensive if you’re using constant-time lookups. You can convert a
> scalar to signed binary by adding a certain constant (2^width-1 IIRC) and
> halving mod the curve order.
>
> 3) Using multiple tables (so the inner loop is double; add; add; add) lets
> you reduce the number of doubles without making the tables too wide.
>
> 4) For any given size, there's a tradeoff between more tables and bigger
> tables. You can optimize that tradeoff with a simple script or spreadsheet.
>
> Cheers,
> — Mike
>
> On Jun 30, 2015, at 10:56 AM, Mike Hamburg <mike at shiftleft.org> wrote:
>
> Hi Mehdi,
>
> I think your "folding" trick has been invented and re-invented many
> times. I came up with it in 2012, and thought it should be called
> "combs". I Googled the term, and found that Pippenger invented it in 1979.
> Lim and Lee reinvented it in 1994. Hankerson-Menezes-Vanstone improved it
> in 2004 (by using multiple tables), as did Hedabou (by using signed
> all-bits-set binary); Feng et al published a different version in 2006
> (signed-MSB-set). The first Ed25519 implementation used a less efficient,
> non-combed variant of the same technique.
>
> Some of the variants are patented, too, but I think your variant is
> similar to Lim-Lee and so should probably be out of patent by now.
>
> I think the best variant these days is to combine the 2004 techniques:
> multiple combs, signed-all-bits-set. It's simpler than signed-MSB-set, and
> I think it's the fastest once you have the table. The table is slightly
> slower to compute than signed-MSB-set, which makes that algorithm better in
> a few limited circumstances, but SMSBS is also patented (by Microsoft) and
> SABS is unpatented to the best of my knowledge.
>
> You're right that fixed-point techniques can be useful for verification
> and not just signing. For example, in a vehicle-to-vehicle setting, you
> will see many signed messages from each car going the same direction as
> you. Verification latency is important, so precomputing a small table on
> another core may be worthwhile if memory permits. A small table can be
> worthwhile even for only 2 signatures, and a larger one for 3.
>
> -- Mike
>
> On 06/30/2015 10:26 AM, Peter Schwabe wrote:
>
> Mehdi Sotoodeh <mehdisotoodeh at gmail.com> <mehdisotoodeh at gmail.com> wrote:
>
> Dear Mehdi,
>
>
> I would like to introduce a remarkable implementation of x25519 and ed25519
> library. The sources are hosted at: https://github.com/msotoodeh/curve25519
>
> The code is experimental but rather stable. It is compact, portable
> and uses simple design logic. On the security front, it employs
> several measures for side-channel security.
>
> I only took a quick look at the software, but two things immediately
> caught my eye.
>
> The first aspect is that the Curve25519 implementation uses secretly
> indexed memory access which is a possible source for timing attacks.
> State-of-the-art Curve25519 implementations avoid this by using
> constant-time conditional swaps.
> Similar statements apply to the table lookup in the fixed-basepoint
> scalar multiplication, but those would be much more expensive to
> protect.
>
>
> But the most remarkable feature is speed. This library sets new speed
> records. It uses a new technique I call it FOLDING for achieving this goal.
> FOLDING chops the scalar multiplier into n pieces (or folds) and operates
> on the folds simultaneously reducing number of point operations by a factor
> of 4 or 8. For example, ed25519 signature takes 31 point doubling and 31
> point additions.
>
> The second thing is: It's great to hear about new speed records!
> Are you planning to support the SUPERCOP API and submit to eBATS so that
> the software can be publicly benchmarked on a large bunch of computers?
> My understanding is that the technique that you call folding is only
> efficient for signing; the verification part needs a precomputation
> which is just way too expensive: it needs 384 point doublings! You will
> probably answer that the ed25519_Verify_Init needs to be done just once
> for many verifications, but then a huge expanded public key is sitting
> around in memory and if I'm not totally mistaken, a sliding-window
> method would still be faster.
>
> Best regards,
>
> Peter
>
>
>
> _______________________________________________
> Curves mailing listCurves at moderncrypto.orghttps://moderncrypto.org/mailman/listinfo/curves
>
>
> _______________________________________________
> Curves mailing list
> Curves at moderncrypto.org
> https://moderncrypto.org/mailman/listinfo/curves
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/curves/attachments/20150630/a55c5242/attachment.html>
More information about the Curves
mailing list