Type conversion in golang

Today we're talking about an operation that we all do every day but rarely think about in depth - type conversion.

Index of this article

A strange line of code

Type conversion in go

Converting Numeric Types to and from Each Other

Unsafe related conversions

String to byte and rune slice conversion

slice to array

Conversion when the underlying types are the same

What's going on in other languages?

summarize

A strange line of code

It started at the beginning of the year when I was making some changes to the standard library sync.

Changes will use the standard library newly added in 1.19I read through its code in general out of an abundance of caution before making the change, however one line of code caught my attention:

// A Pointer is an atomic pointer of type *T. The zero value is a nil *T.
type Pointer[T any] struct {
    // Mention *T in a field to disallow conversion between Pointer types.
    // See /issue/56603 for more details.
    // Use *T, not T, to avoid spurious recursive type definition errors.
    _ [0]*T

    _ noCopy
    v 
}

It's not noCopy, and this is what I'm talking about in thegolang pickup: implementing a non-copyable typeExplained in detail.

The thing that caught my attention was_ [0]*T, it's an anonymous field and an array of zero length doesn't take up memory. This doesn't affect the code I'm going to modify, but what it does piques my curiosity.

Luckily this field's own comment gives the answer: this field is to prevent incorrect type conversions. What kind of type conversion need to add this field to block it. With the question I clicked on the given issue link, and then saw the following example:

package main

import (
	"math"
	"sync/atomic"
)

type small struct {
	small [64]byte
}

type big struct {
	big [math.MaxUint16 * 10]byte
}

func main() {
	a := [small]{}
	(&small{})

	b := [big](a) // type conversion
	big := ()

	for i := range  {
		[i] = 1
	}
}

The example program causes memory errors, and on a Linux environment it has a high probability of causing segment errors. Why? Because the index value of big is considerably beyond the range of small, and we are actually only storing one small object in Pointer, so we have an index crossing there in the final loop, and go doesn't detect this crossing.

Of course, go is not obliged to detect this kind of out-of-bounds, because type safety and memory safety are only the user's responsibility after using unsafe (which is the right wrapper).

The fundamental problem here is that the[small]cap (a poem)[big]There is no correlation between them, they should be completely different types that should not be converted (if in doubt, search for information on type constructors, usually there should be no correlation between the types produced by such generalized type constructors), especially since go is a strongly typed language, something like this won't compile in c++ and will run with an error in python. .

But the fact is that this conversion is legal until the opening field is added and can easily occur in generic types.

You may still be a little clouded by this point, but that's okay, you'll be clouded after reading the next section.

Type conversion in go

Implicit type conversions do not exist in golang, so the only way to convert a value of one type to another is with an expression like thisType(value). The expression makes a copy of value and then converts it to a Type type.

For untyped constants rules are slightly more flexible, they can be automatically converted to the appropriate type in context, see my other post for more detailsUntyped constants in golang。

Constants and cgo aside, golang's type conversions can be divided into several categories, so let's look at some of the more common ones first.

Converting Numeric Types to and from Each Other

This is a fairly common conversion.

There's really not much to say about this one, everyone should be writing similar code every day:

c := int(a+b)
d := float64(c)

Numeric types can be converted to each other, and integers and floats are converted to each other according to the appropriate rules. Numeric values are wrapped/truncated when necessary.

This conversion is also relatively safe, the only thing to watch out for is overflow.

Unsafe related conversions

and all pointer types can be converted to each other, but from theConverting back does not guarantee type safety.

cap (a poem)uintptrcan also be converted to each other, the latter is mainly needed for some system-level api.

These conversions occur frequently in the runtime of go and in code that relies heavily on systems programming. These conversions are dangerous and it is recommended that they not be used unless necessary.

String to byte and rune slice conversion

This conversion should occur second only to the numerical conversion:

([]byte("hello"))
(string([]byte{104, 101, 108, 108, 111}))

This conversion GO does quite a bit of optimization, so sometimes the behavior is a bit different from normal type conversions, for example many times data copying is optimized out.

The rune will not be used as an example, there is not much difference in the code.

slice to array

After go1.20, slice is allowed to be converted to an array, and the elements of the slice within the scope of the copy will be copied:

s := []int{1,2,3,4,5}
a := [3]int(s)
a[2] = 100
(s)  // [1 2 3 4 5]
(a)  // [1 2 100]

If the length of the array exceeds the length of the slice (note that it is not cap), it will be panic. pointers converted to arrays are also possible, the rules are exactly the same.

Conversion when the underlying types are the same

The types discussed above, although common, can be considered special cases. Because these conversions are limited to specific types and the compiler recognizes them and generates different code.

But go actually allows a broader class of conversions that don't require as much special handling: types with the same underlying type can be converted to each other.

An example:

type A struct {
    a int
    b *string
    c bool
}

type B struct {
    a int
    b *string
    c bool
}

type B1 struct {
    a1 int
    b *string
    c bool
}

type A1 B

type C int
type D int

A and B are completely different types, but their underlying types are bothstruct{a int;b *string;c bool;}C and D are also completely different types, but their underlying type is int. A1 is derived from B, and A1 and B have the same underlying type, as do all of A1 and A. B1 has no one of the same underlying type because it has a field with a different name than everyone else.

To put it crudely, the underlying type is the various built-in types (int, string, slice, map, ...) andstruct{...}(field names and whether or not export is taken into account). The built-in types andstruct{...}The underlying type is itself.

Types can be converted to each other as long as the underlying types are the same:

func main() {
    text := "hello"
    a := A{1, &text, false}
    a1 := A1(a)
    ("%#v\n", a1) // main.A1{a:1, b:(*string)(0xc000014070), c:false}
}

A1 and B can be sort of related, but it's literally nothing to do with A. Our program compiles and runs just fine. This is the result of the rule that types with the same underlying type can be converted to each other.

In addition struct tag is ignored in the conversion, so as long as the field name and type are the same, regardless of the tag is not the same can be converted.

This rule allows some unrelated types to be converted in both directions, which at first glance seems like the rule is messing around, but this stuff isn't completely useless:

type IP []byte

Consider a type where IP can be represented as a sequence of bytes, which is explicitly stated in the RFC documentation, so it makes sense for us to define it that way (and in fact, that's what everyone does). Since it's a sequence of bytes, it's natural to use some of the methods/functions that deal with byte slices on IPs to reuse code and simplify development.

The problem is that all of this code assumes that its own parameters/return values are[]byteInstead of IP, we know that IP is actually[]byteBut go doesn't allow implicit type conversion, so taking the value of IP directly and removing these functions won't work. Consider if there is no rule that types with the same underlying type can be converted to each other, how are we going to reuse these functions, surely we can only go through some unsafe devious ways. Instead of that, we should allow[]byte(ip)cap (a poem)IP(bytes)The conversion.

Why don't you just limit it to something likeIPcap (a poem)[]byteBetween such a conversion? Because this will lead to type checking becomes complex but also dragged down the compilation speed, go the most important thing is that the compiler code is simple and fast compilation, nature is not willing to check more of these things, not as good as directly liberalize the standard so that the underlying type of the same type of conversion to each other to the simple and fast.

But this rule is dangerous, and it is it that leads to the previously statedof the problem.

Let's look at the first version of theof the code:

type Pointer[T any] struct {
    _ noCopy
    v 
}

The type parameter is only used in theStorecap (a poem)Loadis used to perform theto normal pointers. This leads to a fatal flaw: allAll will have the same underlying typestruct{_ noCopy;v ;}。

So whether it's[A]，[B]nevertheless[small]cap (a poem)[big], they all have the same underlying type, and they can be converted arbitrarily between them.

This is a complete mess, and while the user is responsible for unsafe, this kind of obvious error that shouldn't even compile in the first place can now appear in the code without the user being prepared for it - the average developer doesn't take the time to care about how the standard library is implemented so they don't know about theWhat does it have to do with unsafe.

The developers of go ended up adding_ [0]*T, so that for each instantiated, as long as T is different, their underlying types will be different, and the incorrect type conversion above will not be possible. And the choice of*TIt also prevents self-referencing from causing[[...]]Such code compiles and reports errors.

By now you should also understand why I say that generic types are the most likely to encounter this problem: as long as your generic type is a struct or other composite type that doesn't use a generic type parameter in a field or composite type, it's possible that all types instantiated from that generic type will have the same underlying type, allowing the kind of completely wrong type conversion described in issue to occur. to occur.

What's going on in other languages?

For structured type language, like go, the underlying type is the same, can be converted to each other belong to the base manipulation, different languages will be appropriate to relax/restrict this kind of conversion. Different languages will relax/restrict such conversion. To put it bluntly, they only recognize structure and nothing else, and things with the same structure will be considered the same type anyhow. So the problem described in the issue is not even wrong in these languages, and you need to change the design to avoid similar problems.

For languages that use a nominal type system, the same name counts as the same class and different ones are different types even if they are structurally the same. Incidentally, c++, golang, and rust all fall into this category. golang's underlying types behave like structured types in terms of type conversions and type constraints, but their overall behavior still favors nominal types, and there is no official definition of what type system they are, so take it as an opinion.

Fully structured typed languages are not very common, so we'll take the common nominally typed language c++ and python, which uses duck types, as examples.

In python we can customize the constructor of a type, so we can implement the logic of type conversion in the constructor, if we don't customize the constructor or any other class method that can return a new type, then by default no conversion can be done between two types. So in python there is no problem like go.

c++ is similar to python in that there are no conversion paths by default unless customized by the user. Unlike python, c++ has conversion operators in addition to constructors and supports theImplicit conversions under rule constraintsThe user has to define the conversion constructor/conversion operator and the syntax rules to realize the conversion between two different types. You need to define your own conversion constructor/conversion operator and within the constraints of the syntax rules in order to convert between two different types, and it is up to you to control whether the conversion is unidirectional or bidirectional, just like in python. So there is no go problem in c++ either.

And rust, Java, ... I won't list them all.

All in all it's a facet of the simplicity of go - creating problems that are hard to come by in other languages and then fixing them in a simple way.

summarize

We reviewed type conversions in go and stepped on a related pitfall along the way.

A couple suggestions here:

If you want to use generic types but don't want to step in the pits: try to use generic type parameters in struct fields or composite types, use the_ [0]*TSuch fields not only make the code hard to understand, but also make initialization of the type cumbersome, less thanI don't recommend using it when this is a last resort.
Don't use generics but are afraid that other types have the same underlying type as your own: have no fear, just use less type conversion syntax on your custom types, and if you really need to convert between related custom types, define sometoTypeAand so on, so that the conversion process is what you control is no longer the default go.
Converting between built-in types and custom types based on those types: there's nothing to worry about with this one, because it's a you-are-me-and-I'm-you relationship. If you don't feel comfortable with it, you don't have totype T []int, replace the type definition withtype T struct { data []int }In addition to being more verbose, many of the functions and range loops that take sliced parameters can no longer be used directly.

Languages like go are interesting in the sense that the simple rules of grammar are hidden, and you might step on a landmine if you just think of a quick fix.