Introduction
When indexing positionnin a Go string, why isn’t thenth character returned?
func main() { foo := "ABC" for _, v := range foo { fmt.Println(v) // 65, 66, 67 }}Compared to other programming languages, where a string traversal expects a single character, in Go, a “Rune” is returned; directly indexing into a string gets you bytes:
func main() { s := "Hello世界" fmt.Println(len(s)) // 11 byte fmt.Println(s[0]) // 72 (byte value of H) fmt.Println(s[5]) // 228 (first byte of 世)}What is a Rune?
You can create a Rune with the syntax '' (single quotes):
r := 'A'A Rune is a built-in type, essentially an alias for int32 (equivalent in all respects), designed to represent a Unicode code point, allowing Go to handle characters from various languages correctly.
String vs Rune vs Byte
- Byte
- An alias for uint8, representing an 8-bit value (0-255)
- The basic unit that makes up a string
- String
- An immutable sequence composed of a series of bytes
- Can contain any valid UTF-8 character
- Rune
- An alias for int32
- Designed to hold Unicode code point
Practical Examples
Convert to Rune Slice
func main() { s := "Hello世界" runes := []rune(s)
fmt.Println(runes[5]) // 19990 (世) fmt.Printf("%c\n", runes[5]) // 世 fmt.Println("Number of characters:", len(runes)) // 7}Iterate Using Range
func main() { s := "Hello世界"
for i, r := range s { fmt.Printf("Index: %d, Character: %c, Unicode: U+%04X\n", i, r, r) } // Index: 0, Character: H, Unicode: U+0048 // Index: 1, Character: e, Unicode: U+0065 // Index: 2, Character: l, Unicode: U+006C // Index: 3, Character: l, Unicode: U+006C // Index: 4, Character: o, Unicode: U+006F // Index: 5, Character: 世, Unicode: U+4E16 // Index: 8, Character: 界, Unicode: U+754C}Indexed capitalization
Given a string consisting of “lowercase letters” and an array of “integer indices”, capitalize all letters at the specified indices. If an index is out of bounds, ignore that index.
"abcdef", [1,2,5] ==> "aBCdeF""abcdef", [1,2,5,100] ==> "aBCdeF" // There is no index 100.Initially, my thought was to create an empty []rune and iterate through the string st, checking if the current character exists in arr. If so, convert to uppercase; otherwise, keep it lowercase. This would be O(n × m).
import "unicode"
func Capitalize(st string, arr []int) string { result := []rune{}
for i, c := range st { if contains(arr, i) { result = append(result, unicode.ToUpper(c)) } else { result = append(result, c) } }
return string(result)}
func contains(arr []int, val int) bool { for _, v := range arr { if v == val { return true } } return false}Additional Note: Byte Index is Not Char Index
Assuming there are no restrictions on input being lowercase letters.
From the knowledge of string handling discussed above, it can be observed that i is actually the byte behind the string, and a character may consist of multiple bytes which could lead to errors; we should use character indices instead of byte indices:
func CapitalizeByCharIndex(st string, arr []int) string { result := []rune{} charIndex := 0 // Maintain character index manually
for _, c := range st { // Do not use byte index i if contains(arr, charIndex) { result = append(result, unicode.ToUpper(c)) } else { result = append(result, c) } charIndex++ // Increment for each character } return string(result)}Alternate Approach: Start from the Array of Indices to Capitalize
Converting st to []rune actually involves a traversal conversion, and then by iterating through arr, we overwrite the characters that need to be capitalized, making the whole process more straightforward as O(n + m).
func CapitalizeByChar(st string, arr []int) string { runes := []rune(st)
for _, idx := range arr { if idx < len(runes) { runes[idx] = unicode.ToUpper(runes[idx]) } }
return string(runes)}Further Reading
- Characters do not exist in Go: Everything about runes!
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)