[curves] Generalizing EdDSA

Thu Sep 7 13:50:04 PDT 2017

Hi,

This is tangentially related to the topic. I was thinking about 
adopting Ristretto lately, and these two ways of doing
Schnorr signature came to my mind:

Form (a):

P is a pubkey,
(R,s) is a signature (point R, scalar s),
verifier checks that point (s*B) equals (R + Hash(msg || P || R)*P).

Form (b):

P is a pubkey,
(k,s) is a signature (scalars k, s),
verifier checks that Hash(msg || P || s*B - k*P) is equal to k.

From the performance perspective, both seem to be equivalent, 
because point comparison in (a) is typically done by encoding
both points to compressed form and compare the resulting bytes. 
In fact, maybe (b) is slightly faster because you have encode 
just one point instead of two.

However, (a) might be actually faster in case Ristretto points are 
used. This is because point equality is defined by Ristretto without
the need to fully encode two points, saving a ton of computational 
overhead. In fact, no points need to be encoded at all, only decoded 
(for R and P).

The only conceptual drawback of method (a) is that it seems to be 
incompatible with ring schnorr signatures. If form (b) is used instead,
it is naturally extended to ring version of the signature by computing 
k1, k2 etc from the signature consisting of (k0, s0, s1, s2...) and 
in the end checking that k_n equals k0.

Are there some performance considerations that I missed that make (a)
more performant (maybe allowing batch verification?) in existing EdDSA?

> On 27 Jun 2017, at 09:40, Trevor Perrin <trevp at trevp.net> wrote:
> 
> Hi all,
> 
> There are different generalizations of Ed25519, e.g. "EdDSA" from the
> Ed25519 authors, "EdDSA" from RFC 8032, XEdDSA/VXEdDSA, and SchnorrQ.
> 
> Despite all this I think there's room for improvement.  So I'm
> thinking of factoring the XEdDSA spec into a more generalized EdDSA
> framework, with a few ideas:
> 
> (A) "Synthetic nonces" based on hashing the message, private key, and
> some randomness, to combine the security benefits of randomization
> with "deterministic"-style nonces.
> 
> (B) A signing function that takes a user-specified private scalar
> (instead of Ed25519-style key derivation) to support extensions like
> XEdDSA where signing uses an existing X25519 private key; or Bitcoin's
> Hierarchical Deterministic key derivation.
> 
> (C) Features to help extension to other Schnorr signature variants (like VRFs).
> 
> (D) A naming scheme and simple rules for combining curves and hashes, e.g
>   - EdDSA_25519_SHA512 = compatible with Ed25519
>   - VEdDSA_25519_SHA512 = VRF extension
>   - EdDSA_448_BLAKE2b
>   - VEdDSA_P384_SHA3/512, etc
> 
> (E) XEdDSA would be a thin wrapper around this generalized EdDSA,
> e.g. "XEdDSA_25519_SHA512" would just be "EdDSA_25519_SHA512" verified
> with an X25519 public key, and created with an X25519 private key.
> 
> I'm looking for feedback, and wondering if anyone else would use this.
> Below I'll quickly sketch the algorithms, then give more rationale.
> 
> 
> Algorithm sketch
> =================
> 
> Recall Ed25519:
> 
> # (K,k) : Signer's (public key, private scalar)
> # X : Signer's master secret key, byte sequence of length 32
> # M : Message, byte sequence of arbitrary length
> # N : Nonce derivation key, byte sequence of length 32
> # (R,r) : Nonce (R = public value aka "commitment", r = private value)
> # B : Elliptic curve base point of order q
> # q : Subgroup order
> # h : Schnorr challenge
> # hash() : SHA-512
> 
> ed25519_sign(K, X, M):
>    k || N = hash(X)
>    r = hash(N || M) (mod q)
>    R = r*B
>    h = hash(R || K || M) (mod q)
>    s = r + (k*h) (mod q)
>    return R || s
> 
> 
> As a first step we'll turn this into what I'll call a "synthetic
> nonce" EdDSA signing algorithm, for goals (A) and (B):
> 
> # hash() : Any cryptographic hash with HASHLEN at least 64 bits larger
> than bitlen(q).
> # Z : Output from an RNG, byte sequence of length 32
> # empty_labelset : byte sequence of length 3 = (0x02, 0x00, 0x00),
> explained later
> # pad1, pad2: padding filled with zero bytes to align the next hash
> input with a hash block (128-byte alignment for SHA-512).
> 
> synthetic_eddsa_sign((K,k), Z, M):
>    r = hash(B || empty_labelset || Z || pad1 || k || pad2 ||
> empty_labelset || K || M) (mod q)
>    R = r*B
>    h = hash(R || K || M) (mod q)
>    s = r + (k*h) (mod q)
>    return R || s
> 
> 
> So this is the same as Ed25519 except it takes a private scalar k as
> argument, and performs a different nonce derivation.  To roughly
> explain the nonce derivation (more later):
> 
> * The secret nonce r must be uniformly random to an attacker, and
> must take a different value whenever h does.  For security redundancy,
> this is done in two ways: first, by hashing random data (Z); second,
> by hashing the same inputs that are hashed into h (K and M) plus the
> secret key k.  We'll discuss the value of this redundancy later.
> 
> * B is hashed at the beginning to provide an indepent random oracle
> from the challenge hash h=hash(R...).  Also explained more later.
> 
> * The labelset is used to provide independent random oracles in more
> complex protocols.  It can contain identifiers for the protocol, a
> user-specified customization label, and identifiers for the hash
> instance.  For collision-resilience we hash the labelset a second time
> after the secret values (k and Z) have been mixed in.
> 
> * k is hashed in its own hash block (due to padding) to provide a
> "cascade" PRF as in HMAC or AMAC: If Z is a random secret value, then
> r is the output of a PRF keyed by Z.  If Z is public (e.g. a bad RNG)
> then r is the output of a PRF keyed by k.
> 
> To make this more extensible, we'll rewrite it and expand the "labelset" idea:
> 
> 
> generalized_eddsa_sign((K,k), Z, M, customization_label):
>    labelset = new_labelset("", customization_label)
>    R, r   = commit(labelset, "", (K,k), Z, M)
>    h      = challenge(labelset, "", R, K, M)
>    s      = prove(r, k, h)
>    return R || s
> 
> commit(labelset, extra, (K,k), Z, M):
>    r = hash(B || labelset || Z || pad1 || k || pad2 || labelset || K
> || extra || M) (mod q)
>    R = r*B
>    return (R, r)
> 
> challenge(labelset, extra, R, K, M):
>    if is_labelset_empty(labelset):
>        return hash(R || K || M) (mod q)
>    else:
>        return hash(B || labelset || R || labelset || K || extra || M) (mod q)
> 
> prove(r, k, h):
>    return r + (k*h) (mod q)
> 
> new_labelset(protocol_name, customization_label):
>    labelset[0] = 2
>    labelset = labelset || len(protocol_name) || protocol_name
>    labelset = labelset || len(customization_label) || customization_label
> 
> is_labelset_empty(labelset):
>    return len(labelset) == 3
> 
> 
> A labelset contains a sequence of byte-strings, with the "empty"
> labelset containing two zero-length byte strings for the protocol name
> and a customization label.
> 
> Generalized EdDSA would use an empty protocol name and customization
> label by default, thus would be compatible with Ed25519, since the
> challenge would be h=hash(R || K || M).
> 
> A non-empty labelset modifies the challenge hash:  The labelset is
> hashed following B for domain separation, and is repeated following R
> for collision resilience.  Thus even if an attacker finds 2 colliding
> labelsets where hash(B || labelset1) == hash(B || labelset2), a
> signature under labelset1 won't verify under labelset2.
> 
> Non-empty labelsets also indicate a more complex protocol that might
> bind "extra" data into the Schnorr challenge.  For example, below is a
> "VRF" extension ("VEdDSA") which provides a signature with the same
> security properties as EdDSA, but where every signature is associated
> with a VRF output which is an unpredictable but verifiably-unique
> function of the message and keypair.
> 
> # c = cofactor (i.e. 8 for Ed25519)
> # v = VRF output, byte sequence of length 32
> # Elligator2() maps a 256-bit value to an Ed25519 point
> 
> generalized_veddsa_sign((K,k), Z, M, customization_label):
>    labelset = new_labelset("VEdDSA_25519_SHA512_Elligator2",
> customization_label)
>    labelset1 = add_label(labelset, "1")
>    labelset2 = add_label(labelset, "2")
>    labelset3 = add_label(labelset, "3")
>    labelset4 = add_label(labelset, "4")
>    Bv = c*Elligator2(hash(B || labelset1 || K || M)[...32])
>    Kv = k * Bv
>    R, r = commit(labelset2, (Bv || Kv), (K,k), Z, M)
>    Rv = r * Bv
>    h = challenge(labelset3, (Bv || Kv || Rv), R, K, M)
>    s = prove(r, k, h)
>    v = hash(B || labelset4 || c*Kv)[...32]
>    return (Kv || h || s), v
> 
> add_label(labelset, new_label):
>    labelset[0]++
>    return labelset || len(new_label) || new_label
> 
> 
> Note that the added hashes for Bv and v use the form hash(B ||
> labelset || ...), which allows them to be analyzed as independent
> random oracles.
> 
> Note also that we're careful to bind the new values Bv, Kv, and Rv as
> "extra" data in the Schnorr challenge.  We also include Bv and Kv as
> "extra" data in nonce derivation, following the "deterministic"
> strategy of making the nonce depend on the same inputs as the
> challenge hash.  We could use extra_data for additional extensions,
> e.g. adding a trapdoor commitment to add a "designated verifier"
> property.
> 
> 
> To extend all this another way:  XEdDSA and "XVEdDSA" signatures could
> be seen as EdDSA and VEdDSA signatures that are created and verified
> with Montgomery-form keys (X25519 or X448):
> 
> generalized_xeddsa_sign(k_mont, Z, M, customization_label):
>    K, k = calculate_key_pair(k_mont)
>    return generalized_eddsa_sign((K,k), Z, M, customization_label)
> 
> generalized_xveddsa_sign(k_mont, Z, M, customization_label):
>    K, k = calculate_key_pair(k_mont)
>    return generalized_veddsa_sign((K,k), Z, M, customization_label)
> 
> 
> I've skipped over a lot of things, so more rationales below:
> 
> 
> Rationales
> ===========
> 
> Nonce randomization
> --------------------
> Traditionally discrete-log signatures choose r from an RNG.  (By
> "discrete-log signatures" meaning both Schnorr signatures like EdDSA,
> and DSA/ECDSA which calculate s differently).
> 
> "Deterministic" signatures have been popularized by Ed25519 for
> Schnorr, and RFC 6979 for DSA/ECDSA.  In a deterministic signature the
> nonce is a hash or PRF of the message and some secret key which is
> unique for each signer (e.g. "N" in Ed25519, or the signer's private
> scalar (our "k") in RFC 6979).
> 
> Deterministic signatures are intended to be more robust, since they're
> not vulnerable to RNG failures.  Even a small RNG failure could be
> catastrophic:
> (a) If a nonce r becomes known to an attacker, the signer's private
> scalar is solvable as k = (s-r)/h.
> (b) If r repeats with different h, the private scalar is solvable as
> k = (s1-s2)/(h1-h2).
> (c) If the attacker has a few bits of information about many nonces,
> the private scalar might be solved with more advanced techniques
> [NONCE].
> 
> RNG failures happen, so making the nonce a function of the message and
> a secret key is a good idea.  However randomization *also* provides
> benefits:
> 
> (1) Faulty computations could be caused by attackers glitching clock
> or power lines, or other tampering.  Without randomization, any fault
> in the calculation R=r*B, or h=hash(...), while signing the same
> message repeatedly, would give multiple signatures with the same r but
> different h.  This would leak the private scalar as in (b), provided
> the attacker can calculate or guess the faulty h [FAULT].  With
> randomization, I believe fault attacks are less effective, for
> example:
>   - Flipping one bit in the private scalar to leak the private scalar
> in hundreds of faulty signatures [FAULT1]
>   - Randomizing one byte of the private scalar to leak the private
> scalar in thousands of faulty signatures [FAULT2]
>   - Skipping scalarmult instructions to leak the private scalar in
> tens of faulty ECDSA signatures [FAULT3]
> 
> (2) Without randomization, the algorithm is fragile to modifications
> and misuse.  In particular, modifying it to add an extra input to
> h=... without also adding the input to r=... would leak the private
> scalar if the same message is signed with a different extra input.  So
> would signing a message twice, once passing in an incorrect public key
> K (though the synthetic-nonce algorithm fixes this separately by
> hashing K into r).
> 
> (3) Without randomization, an attacker can trigger repeated
> calculations using the same inputs.  Consider a sidechannel attacker
> recording traces of the signer's power consumption or EM radiation.
> This attacker might request repeated signing of the same message in an
> attempt to average out noise and get information about secret inputs
> in the calculations of r, R, or s.
> 
> Of course, there are other ways besides randomization to mitigate
> fault and sidechannel risk.  Verifying a new signature before
> returning it, or performing the signature twice and comparing, would
> help against faults, but makes the signing operation a few times
> slower.  The R = r*B calculation could be "blinded" in different ways
> to reduce sidechannel risk, but this probably involves either a
> larger, slower scalar, e.g. R = (r+qj)B for random j; or calculating
> new blinding / unblinding factors frequently.  So random nonces appear
> to be a simple and efficient way to increase protection.
> 
> Given this, I'll argue that randomized vs. deterministic is a false
> dichotomy here.  For private-key protection it's not important that
> nonces be deterministic based on the message; nonces just have to
> differ for different "h" values.  Mixing (key, message) into the nonce
> can provide this when an RNG fails, and an RNG can provide this when
> mixing (key, message) fails.  Combining these approaches yields what
> we could call a "synthetic nonce" (by analogy with "synthetic IV"
> authenticated encryption).
> 
> Combining these approaches isn't novel.  Adam Langley switched
> OpenSSL's DSA/ECDSA to combined hashing of RNG output and private
> scalar in 2013, and this remains in OpenSSL plus the BoringSSL and
> LibreSSL forks [OPENSSL].
> 
> Let's consider two counter-arguments:
> 
> * DJB has pointed out that a malicious RNG might inspect memory and
> choose RNG outputs to bias the nonce and leak the private key as in
> (c) above [ENTROPY].  I'd argue that's a far-fetched threat which is
> outweighed by the above risks.
> 
> * It could be argued that determinism is valuable apart from
> private-key protection, e.g. to help testing.  The above signing
> functions specify exactly how the random input Z is used, so are
> deterministic and testable.
> 
> 
> Hashing private scalar into nonce
> ----------------------------------
> Having decided to mix a function of (secret key, message) into the
> nonce, what secret key to use?
> * The signer's private scalar k as in OpenSSL, XEdDSA, RFC 6979, and
> libsecp256k1?
> * A separate key N, as in Ed25519?
> 
> Using N has a clearer security analysis:  We can consider nonce
> derivation as a PRF with key N, so there's not the "circularity" of
> deriving r and h from k, then using r to mask s=r+kh.
> 
> However dealing with N complicates the API, since the signer has to
> either store and pass N to the signing function, or pass in a single
> "master key" which derives both k and N.  Ed25519 uses such a master
> key.  However, that means Ed25519 can't be used with an external
> private scalar, such as an X25519 scalar for XEdDSA, or a derived
> scalar as in Bitcoin's BIP32 Hierarchical Deterministic key
> derivation.
> 
> For a generalized EdDSA I think a simple and flexible API which
> supports external scalars is best, thus the signing function mixes in
> k.  In practice, I think this has security equivalent to using N:
> 
> * The main Schnorr security proof is the Pointcheval-Stern proof in
> the Random Oracle Model.  In the ROM it doesn't matter whether we
> query the Random Oracle with k or N.
> 
> * Hashing the scalar has seen wide deployment for DSA/ECDSA (e.g.
> OpenSSL, and Bitcoin's libsecp256k1).  While DSA/ECDSA isn't identical
> to Schnorr, the main relevant difference seems to be lack of a ROM
> proof for DSA/ECDSA.  If people are comfortable with this construct in
> DSA/ECDSA (without a proof), I think they should be comfortable in the
> Schnorr case (with a proof).
> 
> * With synthetic nonces, nonce derivation is a PRF keyed by the
> random input Z.  Thus mixing k (or N) can be viewed as a backup
> strategy, which is only relevant if the RNG fails.
> 
> * I'm finding it hard to come up with a plausible hash weakness that
> is likely to be insecure with k but secure with N.  (Anyone?)
> 
> If someone prefers using a nonce derivation key N, that's still easy
> with a pre-processing step:
> 
>  Z' = hash(N || pad || Z)[...32]
>  generalized_eddsa_sign(..., Z', ...)
> 
> For example, here's a synthetic-nonce Ed25519 which uses an Ed25519
> "master key" X:
> 
> generalized_ed25519_sign(K, X, Z, M):
>  k || N = hash(X)
>  Z' = hash(N || pad || Z)[...32]
>  return generalized_eddsa_sign((K,k), Z', M, "")
> 
> 
> Choice of hash
> ---------------
> Ed25519 and the RFC 8032 variants tie each curve to a single hash
> function (e.g. SHA-512 for Ed25519, and SHAKE256 for Ed448).  They
> also specify hash outputs of double the public key size, requiring an
> unnecessary and unusual 912-bit hash for Ed448 signatures.
> 
> I think it would be better to allow flexible combination of hashes and
> EC groups (and even non-EC groups, such as DSA domain parameters).
> The hash would be required to have output length at least 64 bits
> larger than the subgroup order q.  The extra 64 bits would ensure that
> any bias in r and h, after the hash output is reduced by q, is very
> small.  This would also match the nonce-derivation requirement in FIPS
> 186-4 section B.5.1.  Finally, this would allow common 512-bit hash
> functions with Curve448 and smaller curves.
> 
> 
> Naming
> -------
> The "EdDSA" name should probably be kept as part of a generalized
> naming scheme.  This name has become associated with Schnorr plus
> hashing the message into nonce.
> 
> Specifying the group and hash in the name makes the combinations
> explicit.  Consider the below group and hash names, and examples.
> 
> "25519" = Edwards encoding of Curve25519 from RFC 7448
> "448" = Edwards encoding of Curve448 from RFC 7448
> "Ed448" = Edwards encoding of Edwards448 from RFC 7448
> "Ed448decaf" = Decaf encoding of Ed448
> "FourQ" = Compressed encoding of MSR's FourQ
> "secp256k1" = Compressed encoding of Bitcoin's secp256k1
> "P256" = Compressed encoding of NIST P-256
> "P384" = Compressed encoding of NIST P-384
> "P521" = Compressed encoding of NIST P-521
> 
> "SHA512" = SHA-512
> "BLAKE2b" = BLAKE2b with output 512 bits
> "SHA3/512" = SHA3-512
> "SHAKE256/N" = SHAKE256 with output N bits
> 
> Examples:
> 
> EdDSA_25519_SHA512 (compatible with Ed25519)
> EdDSA_FourQ_BLAKE2b
> 
> XEdDSA_448_SHA512
> VEdDSA_Ed448decaf_SHAKE256/512
> XVEdDSA_25519_BLAKE2b
> 
> EdDSA_P384_SHA512
> VEdDSA_P521_SHAKE256/640
> 
> 
> Labelsets
> ----------
> Labelsets have a few uses:
> 
> * A user might want to create EdDSA, VEdDSA, and other types of
> signatures from the same key pair.  These different variants should
> use hashes that can be modelled as independent random oracles, so that
> signatures of one type can't be interpreted as, or affect the security
> of, other types.
> 
> * A protocol might involve several hash calls which should be
> independent random oracles (as in the VRF case).
> 
> * A user might want to customize signatures so they are bound to a
> specific application context.
> 
> The "customization labels" use (3rd bullet) has debatable value.  RFC
> 8032 supports them ("contexts"), but I'm not aware of protocols
> planning to use them.  Because we'd have a label mechanism it would be
> easy to experiment with customization labels, and leave them empty if
> they don't prove useful.
> 
> Labelsets aren't compatible with RFC 8032, but I don't think the RFC
> 8032 "context" mechanism is that good:
> 
> * RFC 8032 contexts are inconsistent between Ed25519 and Ed448:
> Ed25519 doesn't support contexts; Ed448 does, and is analogous to
> Ed25519ctx (not Ed25519), except with Ed25519ctx one "SHOULD NOT" use
> zero-length contexts.  Also, while Ed25519ctx and Ed448 both use
> special prefixes for domain separation in the hash function, the
> details are different.
> 
> * The RFC 8032 mechanism doesn't provide a space for protocol names
> or additional labels.
> 
> * RFC 8032 contexts have weaker security than message contents
> because they are hashed before R, and thus a hash collision between a
> pair of contexts (C1, C2) would allow a message signed under context
> C1 to be verified under context C2.  Schnorr is normally more
> resilient to hash collisions due to R randomizing the hash
> calculation.
> 
> 
> Using hash(B || labelset || ...) with a unique labelset provides an
> independent random oracle, because:
> 
> * Uniqueness of labelsets provides domain-separation between hashes
> of the form hash(B || labelset || ...).
> 
> * Assuming hash(B || labelset || ...) is used for all new hashing in
> these signatures, the only non-domain-separated case is the EdDSA
> challenge hash(R || K || M), and hash(B || labelset || ...) if B=R.
> These ROM hash queries can be answered by the latter (non-empty
> labelset) oracles without affecting the EdDSA ROM security proof for
> hash(R || K || M), because:
>   - A simulator for the EdDSA signing oracle won't need to program
> the hash oracle for the case R=B, since the simulator can choose R at
> random.
>   - An extractor for the EdDSA private key will also never need to
> program this case, since a forgery with R=B allows the private key k
> to be extracted directly via:
>  R = sB - hK   (verification equation)
>  B = sB - hK   (R = B)
>  1 = s - hk
>  k = (s-1)/h
> 
> 
> Conclusion
> ===========
> I'd love any feedback on why people would or wouldn't use this,
> thoughts, ideas for improvement, etc.
> 
> Thanks to feedback from this list, in particular Brian Smith, for
> ideas and motivation about refactoring.
> 
> Thanks also to David Wu, Henry Corrigan-Gibbs, and Samuel Neves, for
> feedback and discussion.
> 
> 
> References
> ===========
> [NONCE]
> https://eprint.iacr.org/2014/434.pdf
> https://eprint.iacr.org/2013/346.pdf
> 
> [FAULT]
> Observation due to Benedikt Schmdit, 2016, private conversation.
> Andrey Jivsov had earlier made a similar, though not identical,
> observation:
> 
> https://www.ietf.org/mail-archive/web/cfrg/current/msg06759.html
> 
> Another paper in 2016 made a similar observation, but mistakenly
> thought Ed25519 was resistant since the nonce key N wouldn't leak:
> 
> https://link.springer.com/chapter/10.1007%2F978-3-319-44524-3_11
> 
> [FAULT1]
> https://www.researchgate.net/publication/221291537_Breaking_Public_Key_Cryptosystems_on_Tamper_Resistant_Devices_in_the_Presence_of_Transient_Faults
> 
> [FAULT2]
> "Fault attacks on Signature Schemes"
> ftp://nozdr.ru/biblio/kolxo3/Cs/CsLn/Information%20Security%20and%20Privacy,%209%20conf.,%20ACISP%202004(LNCS3108,%20Springer,%202004)(ISBN%203540223797)(504s).pdf#page=489
> 
> [FAULT3]
> "A Fault Attack on ECDSA"
> https://dl.acm.org/citation.cfm?id=1731211
> 
> 
> [OPENSSL]
> https://github.com/openssl/openssl/commit/8a99cb29d1f0013243a532bccc1dc70ed678eebe
> https://github.com/openssl/openssl/commit/190c615d4398cc6c8b61eb7881d7409314529a75
> 
> [ENTROPY] https://blog.cr.yp.to/20140205-entropy.html
> _______________________________________________
> Curves mailing list
> Curves at moderncrypto.org
> https://moderncrypto.org/mailman/listinfo/curves