Misspelled

6 - Performance improvements to editDistance function

Based on my work on LFMonitor (http://wow.curseforge.com/addons/lfmonitor/), where I had to adapt the editDistance() function from Misspelled to create the wordSimilarity() function, I've discovered 3 ways the editDistance function can be greatly optimized for performance.

First, it indexes arrays starting with 0, which forces Lua to make them hash tables rather than faster simple arrays. I've changed it to increase all the array indices by 1.

Second, it creates new arrays each time, which is unnecessarily slow, especially since the first row and column are always the same. My wordSimilarity function instead uses a file-local array. While I specified it as a literal, you may have to build it at login time to ensure it's large enough.

Finally, it helps a lot to pre-sort the dictionary by length, and then iterate editDistance over only words that are close in length to the misspelled word. I tried adding a length check at the start of wordSimilarity, but this didn't save nearly as much time as pre-sorting. (I suspect the reason for this is either that string.len is slow or that function calls have a large overhead.)

User When Change
nrpieper Sep 19, 2011 at 08:56 UTC Changed assigned to from None to nrpieper
nrpieper Sep 19, 2011 at 08:55 UTC Changed assigned to from nrpieper to None
nrpieper Sep 19, 2011 at 08:53 UTC
seahen Sep 19, 2011 at 04:32 UTC Create

You must login to post a comment. Don't have an account? Register to get one!

Facts

Last updated
Sep 19, 2011
Reported
Sep 19, 2011
Status
Accepted - Problem reproduced / Need acknowledged.
Type
Enhancement - A change which is intended to better the project in some way
Priority
Medium - Normal priority.
Votes
0

Reported by

Possible assignees