[messaging] Bounding hash 2d preimage bits (was Re:...Test Data)

Wed Jul 23 23:04:01 PDT 2014

On Wed, Jul 23, 2014 at 5:09 PM, Joseph Bonneau <jbonneau at gmail.com> wrote:
>
> On Wed, Jul 23, 2014 at 1:10 PM, Trevor Perrin <trevp at trevp.net> wrote:
>>
>> That's a good idea, spending several extra seconds during key
>> generation may well be worth a fingerprint that's smaller by
>> 20-something bits.
>>
>> There's a few obvious twists on this:
>>
>> 2) Encode x into the fingerprint itself, e.g. use the first 4 bits to
>> encode the count of zero bytes, allowing for a "scaleable" security
>> level.
>
>
> Sounds like a potentially bad idea for usability-can't the attacker can just
> set their fingerprint to have no zero bytes?

Yes, if 0000 in the first 4 bits means no prefixed zero bytes (there
could be a minimum number of prefixed zero bytes, with the encoded
value adding to that).

But in any case, fingerprints with a different zero-prefix would have
a different first character (in hex or base32).  Hopefully the user
would notice that.  But you're right that the first character would
contain more of the fingerprint's security.  That's the price of
flexibility.

> A user doing the comparison
> will probably ignore some extra junk in the middle. This is why I was
> thinking the system needs to impose a universal minimum.
>
>>
>> 3) Instead of searching for a prefix of zero bytes, search for a
>> fingerprint with a high value in some useability metric.  E.g., my
>> "base32 pseudoword" format searches for a base32 fingerprint with high
>> vowel-consonant alternation, which I think makes compact but
>> pronounceable fingerprints, e.g.
>
>
> The idea of pronounceable fingerprints sounds nice, but I would advocate
> separating the work added to make brute-force expensive from the work
> required by some more complicated hash algorithm which makes pronounceable
> fingerprints.

There's benefits to searching directly on the useability metric:

 * The generator can use any metric they want (i.e. you could score
base32 strings with other metrics besides vowel/consonant alternation,
filter for profanities in your local language, etc.).

 * Other parties are oblivious to the generator's metric.  Your
suggestion requires different parties to agree on the most useable
encoding for their ~100 bits of fingerprint hash.

So your approach gives a smaller number of random bits that can be
flexibly encoded in different ways, which parties have to agree on
(which wordlist, which sentence generator, etc).  My approach gives a
larger number of random bits, but lets the generator choose a metric
which doesn't need explicit support by other parties.

I guess there's pros and cons to each.

Trevor