String Handling in Go Using Rune

#unsorted

Introduction

When indexing position n in a Go string, why isn’t the nth character returned?

func main() {
  foo := "ABC"
  for _, v := range foo {
    fmt.Println(v) // 65, 66, 67
  }
}

Compared to other programming languages, where a string traversal expects a single character, in Go, a “Rune” is returned; directly indexing into a string gets you bytes:

func main() {
  s := "Hello世界"
  fmt.Println(len(s)) // 11 byte
  fmt.Println(s[0]) // 72 (byte value of H)
  fmt.Println(s[5]) // 228 (first byte of 世)
}

What is a Rune?

You can create a Rune with the syntax '' (single quotes):

r := 'A'

A Rune is a built-in type, essentially an alias for int32 (equivalent in all respects), designed to represent a Unicode code point, allowing Go to handle characters from various languages correctly.

String vs Rune vs Byte

Byte
- An alias for uint8, representing an 8-bit value (0-255)
- The basic unit that makes up a string
String
- An immutable sequence composed of a series of bytes
- Can contain any valid UTF-8 character
Rune
- An alias for int32
- Designed to hold Unicode code point

Practical Examples

Convert to Rune Slice

func main() {
  s := "Hello世界"
  runes := []rune(s)

  fmt.Println(runes[5])  // 19990 (世)
  fmt.Printf("%c\n", runes[5])  // 世
  fmt.Println("Number of characters:", len(runes))  // 7
}

Iterate Using Range

func main() {
  s := "Hello世界"

  for i, r := range s {
    fmt.Printf("Index: %d, Character: %c, Unicode: U+%04X\n", i, r, r)
  }
  // Index: 0, Character: H, Unicode: U+0048
  // Index: 1, Character: e, Unicode: U+0065
  // Index: 2, Character: l, Unicode: U+006C
  // Index: 3, Character: l, Unicode: U+006C
  // Index: 4, Character: o, Unicode: U+006F
  // Index: 5, Character: 世, Unicode: U+4E16
  // Index: 8, Character: 界, Unicode: U+754C
}

Indexed capitalization🔗

Given a string consisting of “lowercase letters” and an array of “integer indices”, capitalize all letters at the specified indices. If an index is out of bounds, ignore that index.

"abcdef", [1,2,5]     ==> "aBCdeF"
"abcdef", [1,2,5,100] ==> "aBCdeF" // There is no index 100.

Initially, my thought was to create an empty []rune and iterate through the string st, checking if the current character exists in arr. If so, convert to uppercase; otherwise, keep it lowercase. This would be O(n × m).

import "unicode"

func Capitalize(st string, arr []int) string {
  result := []rune{}

  for i, c := range st {
    if contains(arr, i) {
      result = append(result, unicode.ToUpper(c))
    } else {
      result = append(result, c)
    }
  }

  return string(result)
}

func contains(arr []int, val int) bool {
  for _, v := range arr {
    if v == val {
      return true
    }
  }
  return false
}

Additional Note: Byte Index is Not Char Index

Assuming there are no restrictions on input being lowercase letters.

From the knowledge of string handling discussed above, it can be observed that i is actually the byte behind the string, and a character may consist of multiple bytes which could lead to errors; we should use character indices instead of byte indices:

func CapitalizeByCharIndex(st string, arr []int) string {
    result := []rune{}
    charIndex := 0  // Maintain character index manually

    for _, c := range st {  // Do not use byte index i
        if contains(arr, charIndex) {
            result = append(result, unicode.ToUpper(c))
        } else {
            result = append(result, c)
        }
        charIndex++  // Increment for each character
    }
    return string(result)
}

Alternate Approach: Start from the Array of Indices to Capitalize

Converting st to []rune actually involves a traversal conversion, and then by iterating through arr, we overwrite the characters that need to be capitalized, making the whole process more straightforward as O(n + m).

func CapitalizeByChar(st string, arr []int) string {
    runes := []rune(st)

    for _, idx := range arr {
        if idx < len(runes) {
            runes[idx] = unicode.ToUpper(runes[idx])
        }
    }

    return string(runes)
}

Site Search

String Handling in Go Using Rune

Introduction

What is a Rune?

String vs Rune vs Byte

Practical Examples

Convert to Rune Slice

Iterate Using Range

Indexed capitalization🔗

Additional Note: Byte Index is Not Char Index

Alternate Approach: Start from the Array of Indices to Capitalize

Further Reading