[noise] Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s first message benchmarks
trevp at trevp.net
Tue May 23 09:03:38 PDT 2017
On Tue, May 23, 2017 at 12:58 PM, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
> Hey folks,
> [Noise-related, but CCing curves@, since this essentially amounts to a
> benchmark of 25519.]
Only replying to Noise:
> I added multi-core handshake processing to WireGuard this afternoon.
> With that in place, I decided to run some tests on how many real life
> network packets could be handled. To do this, I simply replayed the
> same valid initiation packet over and over, from localhost, which
> means the processing of the packet went all the way through up to the
> timestamp/counter in the payload, when it then saw it was a replay and
> discarded. This means that pretty much all the Noise calculations were
> being executed. Measurements below are in kilo-packets per second;
> each packet requires 2 ECDH() calls and a bunch of hashing.
> Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz
> AVX-accelerated ChaCha20Poly1305, Blake2s, Curve25519 (sandy2x):
> multi-core: 48k/second
> single-core: 10k/second
If we assume all the compute costs are 25519, this would be 140K
cycles/op, and a 10% speedup between Sandy2x and Donna.
I don't know about Skylake, but I recall Sandy2X was citing ~160K
cycles on Sandy Bridge / Ivy Bridge without TurboBoost (and maybe
Haswell too?), and also ~10% speedup vs Floodyberry's version of Donna
(which might not be exactly what you're using, I'm not sure the
So this seems consistent with most time going into a well-optimized
25519, though you'd have to profile in more detail to be sure.
More information about the Noise