[messaging] Base32

Fri Jan 31 09:20:21 PST 2014

On Fri, Jan 31, 2014 at 3:51 AM, Robert Ransom <rransom.8774 at gmail.com> wrote:
> On 1/30/14, Trevor Perrin <trevp at trevp.net> wrote:
>
>>  * Alphabet selection is an another question.  I like base32, but the
>> RFC 4648 version is what people have in libraries, and the 'l' is an
>> unfortunate lowercase character in a lot of fonts...
>
> Which base32 alphabet do you like (and where can I find it)?

Not sure.  For an alphanumeric base32 I think you choose
capitalization, then remove 4 chars from (26 alphabetic + 10 numeric).

RFC 4648 removes 0189
Crockford removes ILOU and encodes in uppercase
Z-base32 removes 0lv2 and specifies lowercase

http://www.crockford.com/wrmg/base32.html
http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt

Z-base-32's rationale is interesting:

"Our choice of confusing characters to eliminate is: `0', `l', `v',
and `2'.  Our reasoning is that `0' is potentially mistaken for `o',
that `l' is potentially mistaken for `1' or `i', that `v' is
potentially mistaken for `u' or `r' (especially in handwriting) and
that `2' is potentially mistaken for `z' (especially in handwriting)."

Note that z-base32 removes one element from various confusable pairs.
So if you can remember that "o", "1, "u", and "z" remain then
z-base-32 will work better.

--

Regarding capitalization:  There's evidence that uppercase chars are
individually more legible as they're larger (eg [TINKER]), but also
that lowercase *text* is more readable, though I'm not sure whether
this is due the more varied word shapes produced by lowercase ("bouma"
theory) or simply the "practice effect" of being more used to
lowercase:

http://www.microsoft.com/typography/ctfonts/wordrecognition.aspx

My guess is that the shape of character groups does aid in noticing
where fingerprints differ, so I lean towards lowercase.

Regarding characters:  Font is going to make a big difference in
character confusability, e.g.:

http://psychology.wichita.edu/surl/usabilitynews/81/legibility.asp

So there's not a perfect choice.  However 4648's lowercase "L" is
awful in a lot of fonts, and "8" is pretty unambiguous, so I've
wondered about simply doing 4648, then replacing '8' for 'l'.

But this is all guesswork, to honest...

Trevor

[TINKER]  "The Relative Legibility of the Letters, the Digits, and of
Certain Mathematical Signs", Miles A. Tinker, 1928.

This reports experiments on legibility of uppercase and lowercase
characters, numbers, and mathematical symbols.

Most misrecognized lowercase alphanumerics: ls1niegu...
Most misrecognized uppercase alphanumerics: I0AEGR5B...

"The consensus of opinion is that, in letters, the maximum of
legibility is represented by the old Roman capitals which are made up
almost entirely of straight lines and sharp angles".

"It is rather obvious that size is a factor influencing legibility.
[...] In this experiment, the length of exposure needed to produce
more than fifty per cent wrong readings was materially shorter for
Series II, which included capitals, than for Series I, including small
letters, though of course other things enter as factors to bring about
this difference. Examination of the more legible third of the series
in the orders as given above indicates plainly that many of the larger
letters and characters are included in this section."

"Capitals, largely because of their size, are for the most part more
legible than small letters."

"The greater relative legibility of the 4 and 1 in Series II
[uppercase] is because in Series I [lowercase], 4 is confused a great
deal with t, and 1 with l."