Go Walkthrough: strconv

08 Sep 2016

Formatting & parsing primitive values in Go is a common task. You probably first dipped your toes into the fmt package when you started Go, however, there’s a less commonly used package for basic formatting that’s more efficient and preserves compiler type checking.

The strconv package is built for speed. It’s great for when you need to handle primitive value formatting while minimizing allocations and CPU cycles. Understanding the package also gives you a better understanding of how the fmt package itself works.

This post is part of a series of walkthroughs to help you understand the standard library better. While generated documentation provides a wealth of information, it can be difficult to understand packages in a real world context. This series aims to provide context of how standard library packages are used in every day applications. If you have questions or comments you can reach me at @benbjohnson on Twitter.

The primitive types

There are 4 different types of primitives that strconv works with — booleans, integers, floating-point numbers, & strings. Each type has functions for formatting & parsing.

Some of these have variations for reducing allocations and some have helper functions for common options. Let’s take a look at them one by one.

Integer operations

Computers use a binary representation of numeric values so we have to convert them if we want to work with them in decimal (or other numeral systems).

Go has two kinds of integer types, int & uint, for signed and unsigned representations. The strconv package has different functions for each kind.

Parsing integers

If you have a string and want to convert it to one of Go’s integer types, you can use the ParseInt() or ParseUint() functions:

func ParseInt(s string, base int, bitSize int) (int64, error)
func ParseUint(s string, base int, bitSize int) (uint64, error)

These functions read s and convert it to a numeral system based on the base argument. You can specify any base between 2 and 36 but you can also use zero to determine the base from the string. If the string contains a “0x” prefix then it’s parsed as base-16, if it contains just a “0” prefix then it’s parsed as base-8, otherwise it’s parsed as decimal.

The bitSize argument restricts the size of the integer parsed. This is important if you need to ensure that your value can fit into a smaller type such as int8, int16, or int32. You can also specify a bitSize of zero to indicate that you want it to fit into the system’s int size (i.e. 32-bit or 64-bit).

When parsing fails

There are a couple ways that parsing can fail. The most obvious is if your string contains characters outside of the range the numeral system. For example, a “9” is not a valid number when parsing base-8. This will return a NumError with an Err field of ErrSyntax.

Parsing can also fail if the number is too large for the bitSize. For example, parsing the number “300” with a bitSize of 8 is invalid because an int8 has a maximum value of 127. This will return a NumError with an Err field of ErrRange.

Convenient int parsing

Finally, there’s a convenience function called Atoi():

func Atoi(s string) (int, error)

Internally this is just a call to ParseInt() with a base of 10 and a bitSize of 0. Also, note that it returns an int type instead of ParseInt()’s int64 return type.

I typically use the int type for all my integers in my application unless there’s a specific need to use sized integer types (such as efficiency or to ensure 64-bit range on 32-bit systems). It reduces the clutter of having to type convert between the various sizes. Because I mainly use the int type, I primarily use the Atoi() function.

Formatting integers

For encoding your Go integer types into a string, you can use FormatInt() and FormatUint():

func FormatInt(i int64, base int) string
func FormatUint(i uint64, base int) string

These functions convert i into the given base and return the string representation. The base supports anything between 2 and 36.

Formatting integers seems simple from the outside but there are a lot of optimizations internally for base-10 as well as any base which is a power of 2.

Formatting integers (with fewer allocations)

One often overlooked part of the strconv package is its Append functions. The Format functions generally require that the returned variable is allocated each time. Allocations are the enemy of performance.

To remove these allocations for each call, we can reuse a single buffer by using the Append functions. Integer formatting provides the AppendInt() and the AppendUint() functions:

func AppendInt(dst []byte, i int64, base int) []byte
func AppendUint(dst []byte, i uint64, base int) []byte

Reusing a buffer is simple. In fact, you can create one on the stack if it’s small by using a fixed size array and then converting to a byte slice:

a := []int16{-80, 100, 362, 32000}

var buf [6]byte
for _, v := range a {
	b := strconv.AppendInt(buf[:0], int64(v), 10)
	// Do something with b
}

In this example, we have a list of int16 values called a. We can determine the buffer size by taking the number of digits of the maximum value of an int16 (which is 32,767) plus 1 extra byte for a possible negative sign. That’s 5 bytes + 1 byte which makes our buffer 6 bytes. That’s the maximum size an int16 can encode into in base-10.

Now we can loop over our list of values and append into our local buffer. We need to convert our buffer’s byte array to a byte slice so we reslice it using the [:0] notation. This just means that we want to start from the beginning of the slice but make the length zero. The capacity of our slice will be 6 so we can append into it without an allocation.

The returned b variable has a new slice header with the appropriate length set for the formatted integer but it’s data still points to the underlying buf byte array.

Floating-point operations

There are two floating-point types in Go — float32 & float64. They provide a way to express numbers which are not whole numbers. They provide a much larger range of available values compared to the integer types but they do so by trading off precision. They also allow you to represent NaN and ±Infinity.

Go implements the IEEE-754 specification for floating-point numbers and there are a lot of technical considerations when using it. I’m not going to get into those details here. Wikipedia has a good page on IEEE floating point if you want to read more.

Parsing floats

Unlike integers, floating-point numbers can take on a couple different forms:

Integers such as “123”.
Numbers with a fractional part such as “123.45678”.
Numbers with an exponent such as “1.234E+56".

You can parse these using the ParseFloat() function:

func ParseFloat(s string, bitSize int) (float64, error)

This parses s and returns a value that fits within bitSize (which can be 32 or 64). If you try to parse a number that is too large or small then you’ll receive a NumError with Err set to ErrRange and the value will be +Infinity or -Infinity.

Parsing float-point in Go involves tons of optimization and bit twiddling so if you’re interested in the low level mechanics I suggest diving into the atof.go file to explore further.

Formatting floats

Encoding floats to strings is a little more complicated than parsing them. For this task we use the FormatFloat() function:

func FormatFloat(f float64, fmt byte, prec, bitSize int) string

This function encodes f as a string. The fmt provides a couple options for how you want to display that float:

‘f’ — This encodes your float without any exponent. So 123.45 will print as “123.45”. Easy enough so far.
‘e’, ‘E’ — These encode your float by always using an exponent. In this case, 123.45 will encode as “1.2345E+02”. The case of the fmt character determines whether an “e” or “E” is used in your encoded string.
‘g’, ‘G’ — These encode your float without an exponent for small values and with an exponent for large values. What qualifies as a small value vs large value depends on your prec argument.
‘b’— This one is the most confusing. The other formats use a decimal exponent (e.g. 10ⁿ), however, this format uses a binary exponent (e.g. 2ⁿ). For example, 64.0 is formatted with a bitSize of 32 is “8388608p-17". You can convert this back to 64 by doing 8388608 × (2^-17). There is probably some fancy math stuff you use this for but I‘ve never had to use it.

Next is the prec argument to specify precision. For example, formatting 3.14159 with a precision of 2 will give you “3.14”. If you pass in a -1 then it’ll determine the precision based on the bitSize.

Finally, the bitSize specifies whether the formatting should treat f like a float32 or float64. As mentioned before, precision is affected by this.

Boolean operations

Parsing booleans

To parse boolean values, we can use the ParseBool() function:

func ParseBool(str string) (bool, error)

This function has a set list of true and false values for str. True values consist of “1”, “t”, “T”, “true”, “True”, & “TRUE”. False values consist of “0”, “f”, “F”, “false”, “False”, & “FALSE”. Anything else will return an error.

I don’t typically use ParseBool() simply because I like my inputs to be specific (e.g. either “true” or “false”).

Formatting booleans

We can perform the reverse operation and format a boolean using FormatBool():

func FormatBool(b bool) string

This returns “true” or “false” depending on the value of b. Easy peasy so far.

There’s also another option for formatting booleans when you’re using byte slices called AppendBool():

func AppendBool(dst []byte, b bool) []byte

This will append “true” or “false” to the end of dst and return the new byte slice.

Despite this function only being 3 lines, I do find myself using it. No reason to rewrite 3 lines of code that’s already in the standard library.

String operations

Oddly enough there is also string encoding for strings. This is used for quoting strings so that control characters and non-printable characters can be displayed. It uses Go’s character escapes so it’s very specific to the Go language itself.

Quoting strings

You can quote strings by use the Quote() function:

func Quote(s string) string

This will convert your tabs and newlines and unprintable characters using escape sequences such as \t, \n, and \uXXXX. This can be useful when displaying error messages with data since your data may include weird characters like the Backspace character (\u0008) which is invisible.

If you need to limit your string to ASCII characters only, you can use the QuoteToASCII() function:

func QuoteToASCII(s string) string

This will ensure that fancy Unicode characters will be escaped (e.g. ☃ will be displayed as “\u2603”).

There is also another function called QuoteToGraphic() for printing Unicode Graphic characters instead of escaping them. For example, one Graphic character that gets escaped by Quote() is the Ogam Space Mark (whatever that is). Honestly, I had a really hard time figuring out when you would care about the difference. The QuoteToGraphic() function isn’t even used within the standard library.

Efficiently quoting strings

As with other strconv functions, there’s also Append functions for appending to a byte slice to reduce allocations:

func AppendQuote(dst []byte, s string) []byte
func AppendQuoteToASCII(dst []byte, s string) []byte
func AppendQuoteToGraphic(dst []byte, s string) []byte

Quoting individual runes

You can also quote runes by using QuoteRune(), QuoteRuneToASCII(), and QuoteRuneToGraphic():

func QuoteRune(r rune) string
func QuoteRuneToASCII(r rune) string
func QuoteRuneToGraphic(r rune) string

One difference with these is that individual runes are quoted with single quotes instead of double quotes.

Efficiently quoting runes

Again, there’s a set of Append functions for each of these:

func AppendQuoteRune(dst []byte, r rune) []byte
func AppendQuoteRuneToASCII(dst []byte, r rune) []byte
func AppendQuoteRuneToGraphic(dst []byte, r rune) []byte

Unquoting strings

If you already have a quoted string value, you can parse it into a Go string by using Unquote():

func Unquote(s string) (string, error)

This will parse not only double-quoted strings but also single-quoted and backtick-quoted strings.

Unquoting strings the hard way

If you are a masochist then you can also unquote strings one character at a time using UnquoteChar():

func UnquoteChar(s string, quote byte) (value rune, multibyte bool, tail string, err error)

This unquotes the first character of s and returns the rune value along with whether the rune is multi-byte. It also returns tail which is the remainder of the string.

Conclusion

Converting Go’s boolean, numeric, and string types into human readable strings is a core component of most software. We need to see our data! While fmt is the go-to package for formatting, it can be slow and inefficient.

The strconv package gives us a way to format our primitives quickly and efficiently while providing some basic formatting options. It also preserves strong type checking for its arguments whereas fmt frequently uses interface{}.

Go Walkthrough

Ben Johnson

Freelance Go developer, author of BoltDB

Go Walkthrough: strconv

The primitive types

Integer operations

Parsing integers

When parsing fails

Convenient int parsing

Formatting integers

Formatting integers (with fewer allocations)

Floating-point operations

Parsing floats

Formatting floats

Boolean operations

Parsing booleans

Formatting booleans

String operations

Quoting strings

Efficiently quoting strings

Quoting individual runes

Efficiently quoting runes

Unquoting strings

Unquoting strings the hard way

Conclusion

Ben Johnson

Featured Posts

Standard Package Layout

Authors →

Ben Johnson

The primitive types

Integer operations

Parsing integers

When parsing fails

Convenient int parsing

Formatting integers

Formatting integers (with fewer allocations)

Floating-point operations

Parsing floats

Formatting floats

Boolean operations

Parsing booleans

Formatting booleans

String operations

Quoting strings

Efficiently quoting strings

Quoting individual runes

Efficiently quoting runes

Unquoting strings

Unquoting strings the hard way

Conclusion

Ben Johnson

You might also like

Go Walkthrough: fmt

Go Walkthrough: encoding/binary

Featured Posts

Standard Package Layout

Authors →

Ben Johnson