[curves] Compact square root algorithm for annoying primes
Michael Scott
mike.scott at miracl.com
Fri Jan 22 04:20:24 PST 2016
Have a look at the references in http://www.google.com/patents/US9148282
Particularly Siguna Mueller's Lucas-function based method..
Not sure about the cost..
Mike
On Thu, Jan 21, 2016 at 11:30 PM, Mike Hamburg <mike at shiftleft.org> wrote:
> Hello Curves,
>
> I was porting an embedded EC library to work with other curves, and I was
> wondering if anyone knew a compact, fast, deterministic algorithm for
> square roots mod NIST p224 (or other annoying primes)? I’m not worried
> about constant-time operation, since I probably have to blind anyway to
> reduce other side channels, but a deterministic algorithm is preferable
> because of possible real-time constraints.
>
> OpenSSL’s implementation takes something like 4700 multiplications on
> average, which is more than a point*scalar multiply.
>
> Dan’s paper [1] gives a variant of the Tonelli-Shanks algorithm which is
> fast and deterministic, requiring only 364 multiplies. But it’s very
> heavy, with 1024 precomputed elements. Reducing the table size makes the
> algorithm considerably slower.
>
> I’ve tried to combine Dan’s Tonelli-Shanks variant with Sutherland’s
> algorithm, below. A variant with 4 precomputed points, a 64-element
> discrete log table (not as big as it sounds) and 10 or so field elements in
> memory takes 900 multiplies worst-case (~760 average-case).
>
> Cipolla’s algorithm is nondeterministic.
>
> Is Pocklington the way to go? It’s technically nondeterministic as well,
> but if I read [1] correctly the failure probability is 2^-96. A
> straightforward implementation probably takes more than 900 multiplies, but
> hopefully it’s simpler and uses less memory, right?
>
> Is there some other algorithm I’m missing?
>
> Thanks,
> — Mike
>
> [1] http://cr.yp.to/papers/sqroot.pdf
>
> ####################################################
>
> p = 2^224 - 2^96 + 1
> F = GF(p)
> qnr = F(11)^(2^128-1)
>
> # 3-element precomputed table with roots of unity
> precmp = dict((
> (3*2^i, qnr^2^(96-3*2^i))
> for i in [2,3,4]
> ))
>
> # 64-element discrete log table
> tbits = 6
> tlogger = qnr^2^(96-tbits)
> dlog_table = dict((
> (tlogger^i,2^tbits-i) for i in xrange(2^tbits)
> ))
>
> def isqrt(x):
> # inverse square root of x, assuming x is a QR
> # First, DJB’s addition chain
> s = x^3
> s = s^2 * x # 2^3 - 1
> u = s^2^3 * s # 2^6 - 1
> s = u^2^6 * u # 2^12 - 1
> t = s^2^12 * s # 2^24 - 1
> s = t^2^24 * t # 2^48 - 1
> s = s^2^48 * s # 2^96 - 1
> s = s^2^24 * t # 2^120 - 1
> s = s^2^6 * u # 2^126 - 1
> s = s^2 * x
> # here s == x^(2^127-1)
>
> ret = [s,s^2*x] # working powers of return value
> rou = [qnr,qnr^2] # working roots of unity
> stack = [95]
>
> def leaf(n):
> dlog = dlog_table[ret.pop()]>>(tbits-n)
> for i in xrange(len(rou)):
> for j in xrange(n):
> if dlog & (1<<j): ret[i] = ret[i] * rou[i]
> rou[i] = rou[i]^2
>
> # max stack depth is 4, so 5+5 field elements in ret+rou
> while len(stack):
> n = stack.pop()
> ret.append(ret[-1]^2^floor(n/2))
>
> if n <= 2*tbits:
> leaf(ceil(n/2))
> rou.pop()
> leaf(floor(n/2))
> else:
> rou.append(precmp[ceil(n/2)])
> stack.append(floor(n/2))
> stack.append(ceil(n/2))
>
> return ret[0]
>
> _______________________________________________
> Curves mailing list
> Curves at moderncrypto.org
> https://moderncrypto.org/mailman/listinfo/curves
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/curves/attachments/20160122/4b80972b/attachment.html>
More information about the Curves
mailing list