[curves] Compact square root algorithm for annoying primes

Michael Scott mike.scott at miracl.com
Fri Jan 22 04:20:24 PST 2016


Have a look at the references in http://www.google.com/patents/US9148282

Particularly Siguna Mueller's Lucas-function based method..

Not sure about the cost..


Mike


On Thu, Jan 21, 2016 at 11:30 PM, Mike Hamburg <mike at shiftleft.org> wrote:

> Hello Curves,
>
> I was porting an embedded EC library to work with other curves, and I was
> wondering if anyone knew a compact, fast, deterministic algorithm for
> square roots mod NIST p224 (or other annoying primes)?  I’m not worried
> about constant-time operation, since I probably have to blind anyway to
> reduce other side channels, but a deterministic algorithm is preferable
> because of possible real-time constraints.
>
> OpenSSL’s implementation takes something like 4700 multiplications on
> average, which is more than a point*scalar multiply.
>
> Dan’s paper [1] gives a variant of the Tonelli-Shanks algorithm which is
> fast and deterministic, requiring only 364 multiplies.  But it’s very
> heavy, with 1024 precomputed elements.  Reducing the table size makes the
> algorithm considerably slower.
>
> I’ve tried to combine Dan’s Tonelli-Shanks variant with Sutherland’s
> algorithm, below.  A variant with 4 precomputed points, a 64-element
> discrete log table (not as big as it sounds) and 10 or so field elements in
> memory takes 900 multiplies worst-case (~760 average-case).
>
> Cipolla’s algorithm is nondeterministic.
>
> Is Pocklington the way to go?  It’s technically nondeterministic as well,
> but if I read [1] correctly the failure probability is 2^-96.  A
> straightforward implementation probably takes more than 900 multiplies, but
> hopefully it’s simpler and uses less memory, right?
>
> Is there some other algorithm I’m missing?
>
> Thanks,
> — Mike
>
> [1] http://cr.yp.to/papers/sqroot.pdf
>
> ####################################################
>
> p = 2^224 - 2^96 + 1
> F = GF(p)
> qnr = F(11)^(2^128-1)
>
> # 3-element precomputed table with roots of unity
> precmp = dict((
>     (3*2^i, qnr^2^(96-3*2^i))
>     for i in [2,3,4]
> ))
>
> # 64-element discrete log table
> tbits = 6
> tlogger = qnr^2^(96-tbits)
> dlog_table = dict((
>     (tlogger^i,2^tbits-i) for i in xrange(2^tbits)
> ))
>
> def isqrt(x):
>     # inverse square root of x, assuming x is a QR
>     # First, DJB’s addition chain
>     s = x^3
>     s = s^2 * x # 2^3 - 1
>     u = s^2^3 * s # 2^6 - 1
>     s = u^2^6 * u # 2^12 - 1
>     t = s^2^12 * s # 2^24 - 1
>     s = t^2^24 * t # 2^48 - 1
>     s = s^2^48 * s # 2^96 - 1
>     s = s^2^24 * t # 2^120 - 1
>     s = s^2^6 * u # 2^126 - 1
>     s = s^2 * x
>     # here s == x^(2^127-1)
>
>     ret = [s,s^2*x] # working powers of return value
>     rou = [qnr,qnr^2] # working roots of unity
>     stack = [95]
>
>     def leaf(n):
>         dlog = dlog_table[ret.pop()]>>(tbits-n)
>         for i in xrange(len(rou)):
>             for j in xrange(n):
>                 if dlog & (1<<j): ret[i] = ret[i] * rou[i]
>                 rou[i] = rou[i]^2
>
>     # max stack depth is 4, so 5+5 field elements in ret+rou
>     while len(stack):
>         n = stack.pop()
>         ret.append(ret[-1]^2^floor(n/2))
>
>         if n <= 2*tbits:
>             leaf(ceil(n/2))
>             rou.pop()
>             leaf(floor(n/2))
>         else:
>             rou.append(precmp[ceil(n/2)])
>             stack.append(floor(n/2))
>             stack.append(ceil(n/2))
>
>     return ret[0]
>
> _______________________________________________
> Curves mailing list
> Curves at moderncrypto.org
> https://moderncrypto.org/mailman/listinfo/curves
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/curves/attachments/20160122/4b80972b/attachment.html>


More information about the Curves mailing list