[curves] Curve448 implementation questions

Fri Nov 13 05:44:02 PST 2015

Hi,

I have some questions about implementing Curve448, having implemented
it for mbedTLS in a pull request here:
https://github.com/ARMmbed/mbedtls/pull/348

I've been following the discussion around the IRTF standard, but I'm
still not quite sure what the recommended behaviour is for validating
public points.

In the latest standard, regarding public u-values:

"When receiving such an array, implementations of X25519 (but not
X448) MUST mask the most-significant bit in the final byte."
(https://tools.ietf.org/html/draft-irtf-cfrg-curves-11#section-4.2

This suggests, but doesn't state, that implementations shouldn't do
any masking for Curve448, but should instead just reduce the public
value mod P448 (or issue an error if it's not in canonical form,
probably my preferred implementation choice). Is that correct?

Secondly, I have a question about the implementation of the arithmetic
itself. I've had a hunt for Mike's various papers and presentations on
Ed448-Goldilocks, and I think I understand the rationale for the
choice of prime. What I can't find though is a simple do-this-do-that
guide for implementers, like NIST publishes for their primes.

In the end, I've come up with my own modular reduction formula after
playing around with the equations, and it seems to be reasonable
(three 448-sized additions), but I wondered if I'm missing any tricks
that should make it even quicker.

I'm aware that for 32-bit machines, you can split the numbers up into
32-bit limbs and write q=2^32, p = q^14 - q^7 - 1 and come up with a
big complex formula to explictly do your reduction in terms of the
limb values.

I'm more interested in optimising for 64-bit though, so I've chosen to
split the input into four "limbs" of 2^224 and do a few bignum
additions on those limbs, which will use the bignum library's 64-bit
CPU operations, and may come out quicker. The code is certainly
simpler, and I don't think anyone deploying Curve448 really cares
about performance as long as it's "good enough".

Let N = A0 + 2^448 A1, let A1 = B0 + 2^224 B1. Then N (or N+P) mod P
is A0+A1+B1+(B0+B1)*2^224. This works on paper, and produces the
expected output on the test vectors from the IRTF spec.

So my question is whether this is the expected reduction formula, or
whether there's some method which is simpler still.

Thanks for your help,
Nick

----
Nicholas Wilson: nicholas at nicholaswilson.me.uk