[curves] Ed448-Goldilocks SHAKE

Wed Jul 16 17:47:30 PDT 2014

On Jul 16, 2014, at 5:37 PM, David Leon Gil <coruus at gmail.com> wrote:

> (Collision resistance in a separate message, as that requires a longer
> explanation.)
> 
>> Is this on Sandy Bridge or something else?  What do your benches look like?
> 
> Have only run it on my (very-ancient) Nehalem laptop thus far; output
> of build/bench attached. (I'll try to get some Sandy Bridge numbers on
> AWS at some point.)
> 
> (More than fast enough for me, btw.)

OK, thanks.  So it looks like it’s 20-30% more cycles than SBR, which is 20-30% more cycles than Haswell.  That sounds about right.

I have SBR numbers at least for slightly older versions (I sold my SBR machine), so don’t sweat it if it’s annoying.

>> Is it fairly expensive?  I didn’t see a significant difference in benchmarks, between doing that step and leaving it out.
> 
> No, you're right! I thought it was, but it appears to be within noise.
> 
>> Keccak is good stuff, but I think SHA-512 is currently more conservative and respected, and it’s definitely more widely deployed.  Blake2 just hasn’t had enough time in the spotlight.  Of course, in your fork you can do whatever you want :-)
> 
> (My problem with SHA2-512: essentially same strategies as SHA-1
> (believed to) work to attack it, but are computationally costly. Too
> expensive for academic cryptographers, well-within potential
> adversaries' budgets. As a result, I think that Keccak has seen more
> (public) study by this point than SHA2 has. But de gustibus. Agreed on
> BLAKE2; only a good choice if performance is really critical.)

Makes sense to me.

>> Hardware Keccak may be somewhat difficult to come by.  You can’t just make an instruction which does one round of it on a vector register, because the state is 1600 bits.  Even for SHA, only two shipping processors that I know of (Apple A7 and VIA’s chips) have instructions for SHA2, and that’s only SHA256.  Having a separate one-per-chip accelerator core is a pain because of context switches and such.
> 
> For Intel, not likely to see until after AVX-512; will need to use
> multiple registers, but this isn't problematic. (Intel is shipping
> SHA1&2 extensions soonish, btw.)
> 
> (Can, alternatively, have an instruction that operates on memory and
> occupy hardware (rather than the virtual named) registers to ship data
> to execution units; as I understand it, this is essentially what's
> done for some instructions that produce microcode loops at present.)
> <bench.txt>

Sure.  It would be like VIA’s “rep montmul” and “rep xcrypt aes” instructions.

Cheers,
— Mike