As someone who always prefers to argue a point on its objective merit, I found myself yet again in a nomenclature
battle over machine names at work - fighting what has often been a losing argument in past experience. I wanted
to name them on a theme (e.g. naming each database machine after one of the Avengers), whereas others wanted to use
names like SERVD01, SERVD02, etc. Besides the obvious shortcoming that when we exceed 100 machines this method will
fail (not a good argument when one will run out of Avengers, too), I found myself searching for a leg to stand on. I
couldn't just say "physicists do it that way" because of my experience in that space, even though my intuition
says that there is probably a good reason for them doing it that way.
Fortunately, this time the other side provided inspiration in the form of an error: two machines were differentiated
only by one letter, W for webserver and D for database, in their 10+ character names and in a discussion he switched
the two. It was not easy in
the context to catch the error either and it led to genuine misunderstanding. That got me thinking about how words are
differentiated and I remembered the
Levenshtein distance,
which is used to measures the distance between word by the minimum number of letter changes required to turn one into
the other. The names of the machines in question had a
Levenshtein distance of only 1 because only 1 letter change was required.
Naturally I wondered if any pyschologists had done research on the minimal Levenshtein distance to easily differentiate
sets of names. As it turns out,
Yarkoni et al.
had done just that. By measuring the average Levenshtein distance to nearest neighbors in a set and the ease of humans
in differentiating them reading or hearing them (phonetic components being the items changed rather than letters in
auditory experiments), they were able to settle on a Levenshtein distance of between 4 and 5 as a minimum for easy
human differentiation. Voila! Objective argument which could be applied to my conundrum!
Upon further reflection I realized that this explains a conundrum many of us have encountered in mixing up our kids
names. For example, the Levenshtein distances between my children's names are
Alora | Brittan | 6 |
Alora | Maxwell | 7 |
Alora | Zara | 3 |
Brittan | Maxwell | 7 |
Brittan | Zara | 6 |
Maxwell | Zara | 6 |
Can you guess the two kid names we're always mixing up? My poor parents made it even worse on themselves: Bradley,
Packie, and Jaime are just 3-4 phonetic changes away from eachother. No wonder my mother was always mixing all our names up!
|