[curves] Ed448-Goldilocks SHAKE

David Leon Gil coruus at gmail.com
Wed Jul 16 17:37:43 PDT 2014


(Collision resistance in a separate message, as that requires a longer
explanation.)

> Is this on Sandy Bridge or something else?  What do your benches look like?

Have only run it on my (very-ancient) Nehalem laptop thus far; output
of build/bench attached. (I'll try to get some Sandy Bridge numbers on
AWS at some point.)

(More than fast enough for me, btw.)

> Is it fairly expensive?  I didn’t see a significant difference in benchmarks, between doing that step and leaving it out.

No, you're right! I thought it was, but it appears to be within noise.

> Keccak is good stuff, but I think SHA-512 is currently more conservative and respected, and it’s definitely more widely deployed.  Blake2 just hasn’t had enough time in the spotlight.  Of course, in your fork you can do whatever you want :-)

(My problem with SHA2-512: essentially same strategies as SHA-1
(believed to) work to attack it, but are computationally costly. Too
expensive for academic cryptographers, well-within potential
adversaries' budgets. As a result, I think that Keccak has seen more
(public) study by this point than SHA2 has. But de gustibus. Agreed on
BLAKE2; only a good choice if performance is really critical.)

> Hardware Keccak may be somewhat difficult to come by.  You can’t just make an instruction which does one round of it on a vector register, because the state is 1600 bits.  Even for SHA, only two shipping processors that I know of (Apple A7 and VIA’s chips) have instructions for SHA2, and that’s only SHA256.  Having a separate one-per-chip accelerator core is a pain because of context switches and such.

For Intel, not likely to see until after AVX-512; will need to use
multiple registers, but this isn't problematic. (Intel is shipping
SHA1&2 extensions soonish, btw.)

(Can, alternatively, have an instruction that operates on memory and
occupy hardware (rather than the virtual named) registers to ship data
to execution units; as I understand it, this is essentially what's
done for some instructions that produce microcode loops at present.)
-------------- next part --------------
Nehalem 2.66 GHz

mul:          76.3ns
sqr:          51.5ns
mul dep:      71.6ns
mulw:         20.6ns
rand448:     191.3ns
SHAKE256 1blk: 777.6ns
SHAKE256 blk:  832.3ns (153.79 MB/s)
isr auto:     24.5µs
elligator:    25.5µs
decompress:   25.4µs
compress:     25.5µs
barrett red: 282.5ns
barrett mac: 1162.3ns
exti+niels:  530.4ns
exti+pniels: 600.6ns
exti dbl:    452.4ns
i->a isog:   444.6ns
a->i isog:   455.7ns
monty step:  587.1ns
full ladder: 302.1µs
edwards smz: 288.6µs
edwards svl: 263.8µs
edwards smc: 309.2µs
edwards vtm: 251.4µs
wnaf6 pre:   106.3µs
edwards vt6: 224.8µs
wnaf4 pre:    47.0µs
edwards vt4: 234.5µs
wnaf5 pre:    62.1µs
edwards vt5: 225.7µs
vt vf combo: 289.0µs
edwards sm:  342.4µs
pre(5,5,18): 348.2µs
pre(3,5,30): 290.1µs
pre(5,3,30): 249.6µs
pre(15,3,10):324.1µs
pre(8,4,14): 321.1µs
com(5,5,18):  73.3µs
com(3,5,30):  77.7µs
com(8,4,14):  76.7µs
com(5,3,30): 102.0µs
com(15,3,10): 92.4µs

Goldilocks:
keygen:       99.2µs
ecdh:        310.3µs
sign:        104.1µs
verify:      341.3µs
precompute:  374.7µs
verify pre:  137.7µs
ecdh pre:    102.0µs



mul:          74.5ns
sqr:          50.0ns
mul dep:      69.4ns
mulw:         20.4ns
rand448:     214.0ns
SHAKE256 1blk: 843.5ns
SHAKE256 blk:  964.5ns (132.71 MB/s)
isr auto:     24.3µs
elligator:    25.6µs
decompress:   24.9µs
compress:     24.4µs
barrett red: 284.0ns
barrett mac: 1207.6ns
exti+niels:  521.0ns
exti+pniels: 584.9ns
exti dbl:    445.4ns
i->a isog:   450.5ns
a->i isog:   449.3ns
monty step:  599.0ns
full ladder: 295.5µs
edwards smz: 268.4µs
edwards svl: 255.5µs
edwards smc: 293.4µs
edwards vtm: 241.9µs
wnaf6 pre:   103.5µs
edwards vt6: 217.8µs
wnaf4 pre:    42.5µs
edwards vt4: 223.8µs
wnaf5 pre:    64.0µs
edwards vt5: 216.9µs
vt vf combo: 288.4µs
edwards sm:  338.9µs
pre(5,5,18): 314.0µs
pre(3,5,30): 281.9µs
pre(5,3,30): 247.1µs
pre(15,3,10):316.9µs
pre(8,4,14): 317.5µs
com(5,5,18):  72.1µs
com(3,5,30):  77.1µs
com(8,4,14):  77.1µs
com(5,3,30): 101.6µs
com(15,3,10): 92.0µs

Goldilocks:
keygen:       97.7µs
ecdh:        306.1µs
sign:        102.6µs
verify:      328.5µs
precompute:  362.7µs
verify pre:  135.1µs
ecdh pre:     99.9µs

Testing...


More information about the Curves mailing list