[curves] New curve25519/ed25519 library

Tue Jun 30 16:42:19 PDT 2015

Hi Peter, Mike,

Thank you both for your comments and valuable suggestions.

    >The first aspect is that the Curve25519 implementation uses secretly
    >indexed memory access which is a possible source for timing attacks.
    >State-of-the-art Curve25519 implementations avoid this by using
    >constant-time conditional swaps.
    >Similar statements apply to the table lookup in the fixed-basepoint
    >scalar multiplication, but those would be much more expensive to
    >protect.

The constant time implementation of field operations added approximately
20% to the timings. Applying recommendations regarding indexing  has a way
bigger impact on the overall performance. I am trying to find a compromised
solution based on blinding that can justify usage of indexing.
The library currently implements randomization of the starting point as
well as using (a+b)*P + B instead of a*P where b is random and B = -b*P.
Another method that can be built into pre-calculated tables is to replace
base point P with Q = (1/r mod BPO)*P and then use (a*r mod BPO)*Q instead
of a*P. The combination of these two methods is another option: ((a-b)*r
mod BPO)*Q + S, where S = -b*Q.
For embedded applications such as IOT, the randomized Q tables can be
generated and programmed into devices during manufacturing and random
blinding b generated during power ups.

Is this is good enough approach? Does it need to be updated time to time?
Please note that I am a software engineer and relying on crypto experts
like you to evaluate the security level and come up with recommendations.

    >The second thing is: It's great to hear about new speed records!
    >Are you planning to support the SUPERCOP API ...

I am planning to support SUPERCOP API once I am happy with the library.

    >My understanding is that the technique that you call folding is only
    >efficient for signing; the verification part needs a precomputation
    >which is just way too expensive: it needs 384 point doublings! You will
    >probably answer that the ed25519_Verify_Init needs to be done just once
    >for many verifications, but then a huge expanded public key is sitting
    >around in memory and if I'm not totally mistaken, a sliding-window
    >method would still be faster.

The folding technique benefits Signing and Key Gen as well as
Verify(partially) for ed25519. Currently curve25519-DH public key
calculation also uses folding but I am going to switch back to Montgomery
ladder. There is no pre-calculation for these since the tables are part of
the code.
For the verify operation, the pre-calculation needs 191D and 11A
(verify_init function). I do not see how you came up with 384D. The total
(pre+post) is 255D and 2W+11 A's where W is the bit size of each fold.
There are some potential saving on the 2W part due to 8-folds of base point
but I have not figure it out yet.

@Mike,
I checked Pippenger's algorithm and did not see any similarity. I am going
to have a look at other references. Thanks for providing this comprehensive
list.

    >3) Using multiple tables (so the inner loop is double; add; add; add)
lets you reduce the number of doubles without making the tables too wide.

Can you elaborate on this? Based on the pre-calculations, it is
double-add(P)-add(Q). Are you considering 8 vs 4 combs here?

Best,
Mehdi.

On Tue, Jun 30, 2015 at 11:41 AM, Michael Hamburg <mike at shiftleft.org>
wrote:

> I should add to this that I haven’t actually seen a complete comb
> implementation for Curve25519 before, though there might be one somewhere.
> I think Donna (including Floodyberry’s Donna fork) uses the older windowed
> technique.  So if you make a nicely tuned implementation, you will be the
> first.  I’m working on one as well, but it uses Decaf so it won’t be
> directly compatible with curve25519.
>
> To tune your comb implementation, here are a few ideas:
>
> 1) As Peter said, you should probably use constant-time lookups for
> signing, though it isn’t necessary for verification.  This will limit you
> to a smaller table.
>
> 2) Signed binary uses half the table space vs unsigned binary, and also is
> half as expensive if you’re using constant-time lookups.  You can convert a
> scalar to signed binary by adding a certain constant (2^width-1 IIRC) and
> halving mod the curve order.
>
> 3) Using multiple tables (so the inner loop is double; add; add; add) lets
> you reduce the number of doubles without making the tables too wide.
>
> 4) For any given size, there's a tradeoff between more tables and bigger
> tables.  You can optimize that tradeoff with a simple script or spreadsheet.
>
> Cheers,
> — Mike
>
> On Jun 30, 2015, at 10:56 AM, Mike Hamburg <mike at shiftleft.org> wrote:
>
>  Hi Mehdi,
>
> I think your "folding" trick has been invented and re-invented many
> times.  I came up with it in 2012, and thought it should be called
> "combs".  I Googled the term, and found that Pippenger invented it in 1979.
> Lim and Lee reinvented it in 1994. Hankerson-Menezes-Vanstone improved it
> in 2004 (by using multiple tables), as did Hedabou (by using signed
> all-bits-set binary); Feng et al published a different version in 2006
> (signed-MSB-set).  The first Ed25519 implementation used a less efficient,
> non-combed variant of the same technique.
>
> Some of the variants are patented, too, but I think your variant is
> similar to Lim-Lee and so should probably be out of patent by now.
>
> I think the best variant these days is to combine the 2004 techniques:
> multiple combs, signed-all-bits-set.  It's simpler than signed-MSB-set, and
> I think it's the fastest once you have the table.  The table is slightly
> slower to compute than signed-MSB-set, which makes that algorithm better in
> a few limited circumstances, but SMSBS is also patented (by Microsoft) and
> SABS is unpatented to the best of my knowledge.
>
> You're right that fixed-point techniques can be useful for verification
> and not just signing.  For example, in a vehicle-to-vehicle setting, you
> will see many signed messages from each car going the same direction as
> you.  Verification latency is important, so precomputing a small table on
> another core may be worthwhile if memory permits.  A small table can be
> worthwhile even for only 2 signatures, and a larger one for 3.
>
> -- Mike
>
> On 06/30/2015 10:26 AM, Peter Schwabe wrote:
>
> Mehdi Sotoodeh <mehdisotoodeh at gmail.com> <mehdisotoodeh at gmail.com> wrote:
>
> Dear Mehdi,
>
>
>  I would like to introduce a remarkable implementation of x25519 and ed25519
> library. The sources are hosted at: https://github.com/msotoodeh/curve25519
>
> The code is experimental but rather stable. It is compact, portable
> and uses simple design logic.  On the security front, it employs
> several measures for side-channel security.
>
>  I only took a quick look at the software, but two things immediately
> caught my eye.
>
> The first aspect is that the Curve25519 implementation uses secretly
> indexed memory access which is a possible source for timing attacks.
> State-of-the-art Curve25519 implementations avoid this by using
> constant-time conditional swaps.
> Similar statements apply to the table lookup in the fixed-basepoint
> scalar multiplication, but those would be much more expensive to
> protect.
>
>
>  But the most remarkable feature is speed. This library sets new speed
> records. It uses a new technique I call it FOLDING for achieving this goal.
> FOLDING chops the scalar multiplier into n pieces (or folds) and operates
> on the folds simultaneously reducing number of point operations by a factor
> of 4 or 8. For example, ed25519 signature takes 31 point doubling and 31
> point additions.
>
>  The second thing is: It's great to hear about new speed records!
> Are you planning to support the SUPERCOP API and submit to eBATS so that
> the software can be publicly benchmarked on a large bunch of computers?
> My understanding is that the technique that you call folding is only
> efficient for signing; the verification part needs a precomputation
> which is just way too expensive: it needs 384 point doublings! You will
> probably answer that the ed25519_Verify_Init needs to be done just once
> for many verifications, but then a huge expanded public key is sitting
> around in memory and if I'm not totally mistaken, a sliding-window
> method would still be faster.
>
> Best regards,
>
> Peter
>
>
>
> _______________________________________________
> Curves mailing listCurves at moderncrypto.orghttps://moderncrypto.org/mailman/listinfo/curves
>
>
>  _______________________________________________
> Curves mailing list
> Curves at moderncrypto.org
> https://moderncrypto.org/mailman/listinfo/curves
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/curves/attachments/20150630/a55c5242/attachment.html>