Performance improvements to editDistance function #6


  • New
  • Enhancment
Open
Assigned to nrpieper
  • promethean1337 created this issue Sep 18, 2011

    Based on my work on LFMonitor (http://wow.curseforge.com/addons/lfmonitor/), where I had to adapt the editDistance() function from Misspelled to create the wordSimilarity() function, I've discovered 3 ways the editDistance function can be greatly optimized for performance.

    First, it indexes arrays starting with 0, which forces Lua to make them hash tables rather than faster simple arrays. I've changed it to increase all the array indices by 1.

    Second, it creates new arrays each time, which is unnecessarily slow, especially since the first row and column are always the same. My wordSimilarity function instead uses a file-local array. While I specified it as a literal, you may have to build it at login time to ensure it's large enough.

    Finally, it helps a lot to pre-sort the dictionary by length, and then iterate editDistance over only words that are close in length to the misspelled word. I tried adding a length check at the start of wordSimilarity, but this didn't save nearly as much time as pre-sorting. (I suspect the reason for this is either that string.len is slow or that function calls have a large overhead.)

  • promethean1337 added the tags New Enhancment Sep 18, 2011
  • nrpieper removed their assignment Sep 19, 2011
  • nrpieper self-assigned this issue Sep 19, 2011

To post a comment, please login or register a new account.