[messaging] How and When less bits do more
Stefan Birgmeier
e0725468 at student.tuwien.ac.at
Thu Feb 13 04:37:04 PST 2014
If this seems long and at the start quite boring and familiar and you
are really in a hurry, skip to the ###. :)
Let us not forget some lower limits to the number of bits needed to
provide sufficient security. The number of bits needed depends on the
number of tries an attacker can do per second. Let's start with 128
bits. A 128 bit HMAC, CMAC,... or fingerprint (i.e. hash) is currently
thought to provide enough security even against a powerful attacker that
can run an attack for years. Now we do not want to use 128 bits because
we know that people are bad at comparing a 128 bit fingerprint. So how
can we shave off bits?
- make the algorithm slow. The 128 bits starting value are based on the
fact that hashes etc. are easy to implement in hardware and can be
calculated quickly. Making the algorithm n times slower than a hash
saves ld(n) bits. This also means that the legit user has to invest more
time, but he usually does not need to run the algorithm frequently.
Examples that use this approach are:
- PBKDF2
- scrypt
- fingerprints with leading zeros
- fingerprints that are "nicely readable" in some baseXX (*)
- (add more here)
- only allow a limited number of trials per time unit or even total
number of trials. It reduces the number of bit dramatically. This
already slightly leans into the realm of special systems because you can
not have that in an environment where you can do true offline attacks
(that's my theory, surprise me with a counter-example). This is really
suited for even the most forgetful people (though I must admit that I
was once completely in the blue about my pin code too... very awkward)
- ATM pin codes ("secure" hardware)
- SIM card pin codes (mobile phones) ("secure" hardware)
- any kind of well-implemented remote login. If you have hundreds of
servers that use the same authentication system while not talking to
each other about failed attempts, that of course factors in. (presumed
impossibility of offline attacks, i.e. you do not have the password's
hash or maybe even know whether a user exists)
- PAKE tries it but as Trevor mentioned, it is not water-tight as there
are possibilities for online and offline brute-forcing that are viable.
- (add more here)
- restrict yourself to VERY specific cases:
- people must physically meet each other AND
- bring mobile phones / business cards / ...
- make up and remember (or write down, which is mostly bad though,
because the value is a secret) bit strings of certain length i.e. random
words... how do you know if your random words have the reqired number of
bits? this case really is very shady from a security perspective even
though the procedure is easily understandable, i.e. usability (as in
user-friendlyness) is high
- you have an existing secure channel (there should not need to be any
discussion about this case)
- (add more here)
- make a weak assumption
- "voice imitation is difficult" (it is not, there are many people who
can imitate voices compared to people who can fake hashes)
- "people can think up random bitstrings of certain length" (very
unlikely. numerous studies have shown that the human being is not a good
RNG. E.g. if i knew someone met at a conference and they made up a
shared secret there, i'd probably try something with *con*, cake,
coffee, napkin... and anything i know about those people personally.
Also, people tend to forget these one-time secrets and write them down -
subverting security is as easy as stealing their wallets then)
- (add more here)
- more, better ways to shave off bits?
Added complications are extra requirements on the value/bitstring which
basically means added bits and/or decreased usability, often to the
point where people do not know what to do:
- unlinkability
- (add more here)
Anything non computer-generated with very low bit count is only
applicable in _very_ specific settings, varying depending on what is
assumed about the attacker (i.e. are you a high profile target?).
Personally I am more interested in as unconditionally as possible secure
schemes and I do not trust any of the "weak assumptions" I listed - for
me, any scheme that relies on them is just about as good as no scheme.
That does not mean it actually IS that way, I am just saying, for me,
the difference between 0.0001 and 0 is negligible.
### Many discussions here have contained several solutions. Some are
solutions to to very specific, settings, where suddenly, under certain
assumptions connected with that setting, the number of needed bits is
very low. Then the same discussions sometimes also contain solutions
with generalized approaches, that do not rely on anything except the
presumed hardness of certain problems (ECDLP, prime factoring,
"inverting" hashes). More often than not, some solutions from both
realms need a comparable number of bits for the user to handle - it is
obvious that the strong solution is the better one then. I propose
making it very clear which case a certain discussion is about, maybe by
writing down the assumptions about the system (even the obvious ones)
and further requirements (unlinkability etc). Otherwise there can be
discussions containing 5 different schemes at the same time, that are
all secure - under their own very specific assumptions. This might lead
to confusion and it is doubtful that the discussion can arrive at
valuable results if the challenge (system parameters, requirements) is
not clear (maybe it was just me that sometimes the challenge was not
completely clear). As an example requirements might be, "no business
cards, no mobile phones" or "users can easily be tricked into accepting
a wrong hash as the right one", both of which would rule out any
discussion about fingerprints or random computer-generated values, and
the discussion would then just be about which scheme provides most
security while being as user-friendly as possible under the given
assumptions. There could be discussions that aim for different settings,
one allowing business cards, the other one does not - in that cases,
answers should make clear which of the target scenarios they refer to.
These are just ideas :)
(*) Note on the "nicely readable" fingerprints: I am personally not sure
how much you actually gain by doing it. Reading such a fingerprint out
loud to someone else, you might (as the listener) not be able to
distinguish between a's and e's or i's and since the fingerprint is
basically an alteration between consonant and vowel, there is a lot of
potential for attackers; basically, they just need to get the consonants
right, and even there you can swap c with k etc... There need to be
studies for this. Of course it is even pretty hard already to just get
the consonants right by brute force. Basically what I am saying is that
as an attacker you just need to get something that "sounds right", which
might be much easier than getting something that is close to the
original in a hamming-distance way.
The amount of bits you save is easily calculated: as mentioned, the
"readable" fingerprint is mostly an alteration between vowels (5 =
2.3bits) and consonants (21 = 4.4bits), totaling 6.7bits per character
pair, i.e. 3.4 bits per character (actually more, 1 more bit for whether
we start with consonant or vowel and the alteration is not strict).
Comparing that with the theoretical entropy of a base32 string (5 bits
per character) we reduce the entropy to 3.4/5 = 67% (of course it
doesn't scale that way, if you start with a 256 bit hash you will
probably never get close to 67%). A 125 bit value that meets the
"niceness" criteria therefore has around 84bits of entropy. Achieving
the same by looking for a hash with leading zeros means finding a 125
bit hash with 41 leading zeros, occurring on average every 2^41 hashes
(2,199,023,255,552 .. 2,200 billon). However, that hash might be more
valuabe (depending on the results of the needed "nicely readable
hash"-study) because you can take these remaining 84 bits and get them
into any shape that is, coding-theoretically speaking, the shortest one
with most distance by human standards (visual and auditorial), therefore
providing the lowest error rate at shortest word length / lowest word
number. Of course you need to encode the number of leading zeros
somehow, which adds around 6 bits of information to the remaining 84
bits from the hash. Alternatively, find a hash with even more leading
zeros to make up for those 6 bits, getting you 2^47 which is already
very uncomfortable to calculate on standard hardware: 1 every
140,737,488,355,328 i.e. 140 trillion...
I hope at least parts of this are useful in the discussion.
Stefan
More information about the Messaging
mailing list