String Handling in Go Using Rune

Introduction

When indexing position n in a Go string, why isn’t the nth character returned?
func main() {
foo := "ABC"
for _, v := range foo {
fmt.Println(v) // 65, 66, 67
}
}

Compared to other programming languages, where a string traversal expects a single character, in Go, a “Rune” is returned; directly indexing into a string gets you bytes:

func main() {
s := "Hello世界"
fmt.Println(len(s)) // 11 byte
fmt.Println(s[0]) // 72 (byte value of H)
fmt.Println(s[5]) // 228 (first byte of 世)
}

What is a Rune?

You can create a Rune with the syntax '' (single quotes):

r := 'A'

A Rune is a built-in type, essentially an alias for int32 (equivalent in all respects), designed to represent a Unicode code point, allowing Go to handle characters from various languages correctly.

String vs Rune vs Byte

  • Byte
    • An alias for uint8, representing an 8-bit value (0-255)
    • The basic unit that makes up a string
  • String
    • An immutable sequence composed of a series of bytes
    • Can contain any valid UTF-8 character
  • Rune
    • An alias for int32
    • Designed to hold Unicode code point

Practical Examples

Convert to Rune Slice

func main() {
s := "Hello世界"
runes := []rune(s)
fmt.Println(runes[5]) // 19990 (世)
fmt.Printf("%c\n", runes[5]) // 世
fmt.Println("Number of characters:", len(runes)) // 7
}

Iterate Using Range

func main() {
s := "Hello世界"
for i, r := range s {
fmt.Printf("Index: %d, Character: %c, Unicode: U+%04X\n", i, r, r)
}
// Index: 0, Character: H, Unicode: U+0048
// Index: 1, Character: e, Unicode: U+0065
// Index: 2, Character: l, Unicode: U+006C
// Index: 3, Character: l, Unicode: U+006C
// Index: 4, Character: o, Unicode: U+006F
// Index: 5, Character: 世, Unicode: U+4E16
// Index: 8, Character: 界, Unicode: U+754C
}

Indexed capitalization🔗

Given a string consisting of “lowercase letters” and an array of “integer indices”, capitalize all letters at the specified indices. If an index is out of bounds, ignore that index.

Terminal window
"abcdef", [1,2,5] ==> "aBCdeF"
"abcdef", [1,2,5,100] ==> "aBCdeF" // There is no index 100.

Initially, my thought was to create an empty []rune and iterate through the string st, checking if the current character exists in arr. If so, convert to uppercase; otherwise, keep it lowercase. This would be O(n × m).

import "unicode"
func Capitalize(st string, arr []int) string {
result := []rune{}
for i, c := range st {
if contains(arr, i) {
result = append(result, unicode.ToUpper(c))
} else {
result = append(result, c)
}
}
return string(result)
}
func contains(arr []int, val int) bool {
for _, v := range arr {
if v == val {
return true
}
}
return false
}

Additional Note: Byte Index is Not Char Index

Assuming there are no restrictions on input being lowercase letters.

From the knowledge of string handling discussed above, it can be observed that i is actually the byte behind the string, and a character may consist of multiple bytes which could lead to errors; we should use character indices instead of byte indices:

func CapitalizeByCharIndex(st string, arr []int) string {
result := []rune{}
charIndex := 0 // Maintain character index manually
for _, c := range st { // Do not use byte index i
if contains(arr, charIndex) {
result = append(result, unicode.ToUpper(c))
} else {
result = append(result, c)
}
charIndex++ // Increment for each character
}
return string(result)
}

Alternate Approach: Start from the Array of Indices to Capitalize

Converting st to []rune actually involves a traversal conversion, and then by iterating through arr, we overwrite the characters that need to be capitalized, making the whole process more straightforward as O(n + m).

func CapitalizeByChar(st string, arr []int) string {
runes := []rune(st)
for _, idx := range arr {
if idx < len(runes) {
runes[idx] = unicode.ToUpper(runes[idx])
}
}
return string(runes)
}

Further Reading