Name abbreviation does not handle UTF-8 multi-byte characters from non-English languages #2070

  • Defect
  • delane33 created this issue May 9, 2024

    Hello, it looks like current "AbbreviateName" function (modules\tags.lua, L369-372) treats each character as a single byte which breaks in languages using characters encoded in multiple bytes.


    Function snippet:


    -- Name abbreviation
    local function abbreviateName(text)
    	return string.sub(text, 1, 1) .. "."


    Solution snippet:



    local function utf8len1stchar(str)
        local byte = str:byte(1)
        if byte < 128 then return 1 end -- 1-byte character
        if byte < 224 then return 2 end -- continue of multi-byte character
        if byte < 240 then return 3 end -- start of 2-byte character
        if byte < 248 then return 4 end -- start of 3-byte character
        return 1 -- invalid
    -- Name abbreviation
    local function abbreviateName(text)
        local lengthOfFirstChar = utf8len1stchar(text)
        return string.sub(text, 1, lengthOfFirstChar) .. "."

     Solution PR on GH:

  • delane33 added a tag Defect May 9, 2024

To post a comment, please login or register a new account.