Levenshtein Distance and Kid Names

Last Blog | Index | Next Blog

4 February 2015

As someone who always prefers to argue a point on its objective merit, I found myself yet again in a nomenclature battle over machine names at work - fighting what has often been a losing argument in past experience. I wanted to name them on a theme (e.g. naming each database machine after one of the Avengers), whereas others wanted to use names like SERVD01, SERVD02, etc. Besides the obvious shortcoming that when we exceed 100 machines this method will fail (not a good argument when one will run out of Avengers, too), I found myself searching for a leg to stand on. I couldn't just say "physicists do it that way" because of my experience in that space, even though my intuition says that there is probably a good reason for them doing it that way. Fortunately, this time the other side provided inspiration in the form of an error: two machines were differentiated only by one letter, W for webserver and D for database, in their 10+ character names and in a discussion he switched the two. It was not easy in the context to catch the error either and it led to genuine misunderstanding. That got me thinking about how words are differentiated and I remembered the Levenshtein distance, which is used to measures the distance between word by the minimum number of letter changes required to turn one into the other. The names of the machines in question had a Levenshtein distance of only 1 because only 1 letter change was required.

Naturally I wondered if any pyschologists had done research on the minimal Levenshtein distance to easily differentiate sets of names. As it turns out, Yarkoni et al. had done just that. By measuring the average Levenshtein distance to nearest neighbors in a set and the ease of humans in differentiating them reading or hearing them (phonetic components being the items changed rather than letters in auditory experiments), they were able to settle on a Levenshtein distance of between 4 and 5 as a minimum for easy human differentiation. Voila! Objective argument which could be applied to my conundrum!

Upon further reflection I realized that this explains a conundrum many of us have encountered in mixing up our kids names. For example, the Levenshtein distances between my children's names are

Alora	Brittan	6
Alora	Maxwell	7
Alora	Zara	3
Brittan	Maxwell	7
Brittan	Zara	6
Maxwell	Zara	6

Can you guess the two kid names we're always mixing up? My poor parents made it even worse on themselves: Bradley, Packie, and Jaime are just 3-4 phonetic changes away from eachother. No wonder my mother was always mixing all our names up!

Last Blog | Index | Next Blog

Levenshtein Distance and Kid Names

Last change was on 4 March 2015 by Bradley James Wogsland. Copyright © 2015 Bradley James Wogsland. All rights reserved.

Last change was on 4 March 2015 by Bradley James Wogsland.
Copyright © 2015 Bradley James Wogsland. All rights reserved.