Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflect.*Header uses a different representation than in the standard Go implementation #1284

Closed
ribasushi opened this issue Aug 9, 2020 · 18 comments
Labels
enhancement New feature or request

Comments

@ribasushi
Copy link

TLDR
There seems to be an implicit cast in go-proper. See below.

Version
tinygo version 0.14.0 darwin/amd64 (using go version go1.14.7 and LLVM version 10.0.0)

Failure

tinygo run tiny.go 
# github.com/twmb/murmur3
../../../admin/go/pkg/mod/github.com/twmb/[email protected]/murmur.go:69:9: cannot use len(slice) (value of type int) as uintptr value in struct literal

Expectation

go run tiny.go 
Murmursize 16

Testcase

package main

import (
        "fmt"

        "github.com/twmb/murmur3"
)

func main() {
        fmt.Printf("Murmursize %d\n", murmur3.New128().Size())
}
@niaow
Copy link
Member

niaow commented Aug 9, 2020

Same as #691

The layout of reflect.SliceHeader and reflect.StringHeader are not guaranteed to be stable. On some TinyGo platforms, int is bigger than a word, and so we use uintptr for the underlying data type and zero-extend whenever you use the len or cap operators.

That said, we did consider using int on all non-AVR platforms.

@niaow niaow changed the title cannot use len(slice) (value of type int) as uintptr value reflect.*Header uses a different representation than in the standard Go implementation Aug 9, 2020
@ribasushi
Copy link
Author

The layout of reflect.SliceHeader and reflect.StringHeader are not guaranteed to be stable.

Does this imply that this is a bug in murmur and needs an addition here: https://github.com/twmb/murmur3/blob/v1.1.3/murmur.go#L69 ?

@bgould
Copy link
Member

bgould commented Aug 9, 2020

It is not necessarily a bug in that library per se; it is just non-portable code. In general Go code is written to target the standard Go implementation, and in practice the layout of string headers and slice headers in standard Go don't change, so people treat them as stable in their code.

Rob Pike mentions this when explaining one of his "Go proverbs", which is "With the unsafe package there are no guarantees." - https://www.youtube.com/watch?v=PAAkCSZUG1c&t=13m49s

In the library you linked to, it looks to me like that function is just an optimization, and the same result could be achieved in a portable way (or at least provide a portable alternative using build tags etc)

@twmb
Copy link

twmb commented Jun 24, 2021

Note that v1.1.5 avoids the problem this issue is about.

@aykevl
Copy link
Member

aykevl commented Jun 24, 2021

Note that v1.1.5 avoids the problem this issue is about.

Are you sure? It doesn't look like it:

https://github.com/twmb/murmur3/blob/v1.1.5/murmur.go#L69

	var str string
	strhdr := (*reflect.StringHeader)(unsafe.Pointer(&str))
	strhdr.Data = ((*reflect.SliceHeader)(unsafe.Pointer(&slice))).Data
	strhdr.Len = len(slice)
	return str
../../../../../pkg/mod/github.com/twmb/[email protected]/murmur.go:69:15: cannot use len(slice) (value of type int) as uintptr value in assignment

For that reason, I'm using github.com/spaolacci/murmur3 instead.

For reference, this is the TinyGo type:

tinygo/src/reflect/value.go

Lines 756 to 759 in c303266

type StringHeader struct {
Data uintptr
Len uintptr
}

And here the upstream Go type:

// StringHeader is the runtime representation of a string.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type StringHeader struct {
	Data uintptr
	Len  int
}

Note this part:

It cannot be used safely or portably and its representation may change in a later release.

Therefore, if you want portable code you shouldn't use reflect.StringHeader (or at least guard it with a build tag).

@twmb
Copy link

twmb commented Jun 24, 2021

Oh, correction: this follows the unsafe rule (6) in https://golang.org/pkg/unsafe/, follows the rules laid out by @mdempsky in golang/go#40701, and passes https://github.com/jlauinger/go-safer. I misread this issue.

I'm not sure why tinygo's int may be longer than the machine's uintptr, because the purpose of an int is to be the size of a machine word.

The reflect.StringHeader / reflect.SliceHeader optimizations are extremely common. However, I will be switching to unsafe.Slice after the forthcoming Go 1.17 release wherever relevant, but that will not address the conversion from a slice to a string. The original optimization from around the Go 1.0 days of str := *(*string)(unsafe.Pointer(&slice)) is even more un-documented than the reflect.StringHeader approach.

Also fwiw, spaolacci's does an unsafe unaligned conversion from of four slice bytes to an int32, this was originally flagged in go1.14 but Go removed the check (since on any realistic arch, the unaligned read is fine). ref: spaolacci/murmur3#29

@aykevl
Copy link
Member

aykevl commented Jun 25, 2021

I'm not sure why tinygo's int may be longer than the machine's uintptr, because the purpose of an int is to be the size of a machine word.

No. Many people think this, but it's false. The int type is simply a shorthand integer type when you don't care too much about the exact size of an integer. This is what the Go specification says:

There is also a set of predeclared numeric types with implementation-specific sizes:

uint     either 32 or 64 bits
int      same size as uint
uintptr  an unsigned integer large enough to store the uninterpreted bits of a pointer value

So it would be perfectly legal for a Go implementation to implement int as int32 even on a 64-bit system. In fact, this is what Go did in early versions (1.0, maybe 1.1 but this got changed soon after). What would not be legal is implementing int as int16, and it would break many programs that assume int can hold large values.

(Sidenote, the otherwise unrelated type int in C is practically always 32-bit in size even on 64-bit systems like amd64, so it doesn't match the "machine word" either, whatever that is).

Because a string can never be longer than what can be stored in a pointer-sized integer, this is the implementation of the string type in TinyGo:

// The underlying struct for the Go string type.
type _string struct {
ptr *byte
length uintptr
}

As you can see, it uses uintptr, not int or uint. This is so it won't waste space on AVR, which (usually) has 16-bit pointers.

So what to do about reflect.StringHeader? It would clearly be incorrect to define the string length as int on AVR, it has to be an uintptr otherwise the integer types won't match. That leaves all other (32-bit and 64-bit) platforms supported by TinyGo. I've chosen to use uintptr for everything, so that TinyGo is internally consistent.

In summary, if you want to write portable code that uses reflect.StringHeader, you will have to use build tags and implement the code slightly differently for regular Go and TinyGo.

The reflect.StringHeader / reflect.SliceHeader optimizations are extremely common.

Sure, but that doesn't make them portable (as pointed out in the documentation for these two types).

Also fwiw, spaolacci's does an unsafe unaligned conversion from of four slice bytes to an int32, this was originally flagged in go1.14 but Go removed the check (since on any realistic arch, the unaligned read is fine). ref: spaolacci/murmur3#29

TinyGo supports the Cortex-M0 chip, which does not support unaligned accesses. So this conversion would be illegal on some targets that TinyGo supports. I happen to be using a Cortex-M4 so that's probably why I haven't had any issues with it.

Oh and note that x86 also mandates proper alignment in some cases: https://pzemtsov.github.io/2016/11/06/bug-story-alignment-on-x86.html
I don't know whether the Go compiler will take advantage of it, but it is entirely free to do so.

@twmb
Copy link

twmb commented Jun 29, 2021

Those are reasonable points.

While the reflect.SliceHeader docs do say to not rely their portability / representation, I really doubt at this point after a decade that the representation will change, especially considering the likely fallout from how widely they are currently used.

This basically means that tinygo is defining its own custom compiler (as expected!), which means that tinygo is basically expecting any library that wants to use this reflect trick to (a) know of tinygo and its custom formats, and (b) define separate build-tag specified files.

Is it worth it to consider having tinygo's reflect.StringHeader / reflect.SliceHeader mirror the stdlib's, and then perform a translation under the hood from int to uintptr? This would allow tinygo to internally use the space-saving uintptr, while externally support libraries that rely on the int-typed headers. From a library author perspective, I'm failing to see how tinygo's reflect.SliceHeader / reflect.StringHeader are not functionally unusable as is, because I doubt the general dev is going to write code with tinygo in mind?

@aykevl
Copy link
Member

aykevl commented Aug 14, 2021

This basically means that tinygo is defining its own custom compiler (as expected!), which means that tinygo is basically expecting any library that wants to use this reflect trick to (a) know of tinygo and its custom formats, and (b) define separate build-tag specified files.

Yes, unfortunately that would be the consequence. To repeat, this is what the comment says:

// StringHeader is the runtime representation of a string.
// It cannot be used safely or portably and its representation may
// change in a later release.

Code that uses it should expect it to be non portable.

Is it worth it to consider having tinygo's reflect.StringHeader / reflect.SliceHeader mirror the stdlib's, and then perform a translation under the hood from int to uintptr?

Technically, maybe. In practice, this would either be very difficult and/or leave a trail of subtle bugs.
The only thing that would be possible, is to change the type of reflect.StringHeader and reflect.SliceHeader depending on the architecture, so that it uses int on most platforms but uintptr on AVR. But that would introduce an internal inconsistency and I'm not sure I would want that either.

@aykevl
Copy link
Member

aykevl commented Aug 14, 2021

I've made a PR upstream: twmb/murmur3#9

@deadprogram
Copy link
Member

Looks like this was resolved, so now closing. Please reopen if needed. Thanks!

@deadprogram deadprogram added enhancement New feature or request and removed question Question about usage of TinyGo labels Sep 18, 2021
@tschaub
Copy link
Contributor

tschaub commented Aug 31, 2022

The only thing that would be possible, is to change the type of reflect.StringHeader and reflect.SliceHeader depending on the architecture, so that it uses int on most platforms but uintptr on AVR.

Assuming that this remains an unappealing solution for TinyGo to adopt, I'm wondering if there could be additional documentation created with suggested fixes for projects that don't work with TinyGo but might want to.

Maybe @aykevl's contribution in twmb/murmur3#9 could be documented as an alternative to one otherwise unportable pattern.

I was hoping to use TinyGo to build a WASM binary with https://github.com/apache/arrow. This doesn't work due to the use of reflect.SliceHeader fields. E.g. https://github.com/apache/arrow/blob/go/v9.0.0/go/arrow/bitutil/bitutil.go#L158-L159

I see that other related TinyGo tickets that have been opened have been closed as duplicates of this issue, so I didn't want to reopen this or open a new issue. But I would be willing to contribute documentation if someone could help suggest more alternatives to existing uses of reflect.SliceHeader fields.

@aykevl
Copy link
Member

aykevl commented Sep 8, 2022

I have created a docs PR here: tinygo-org/tinygo-site#289

@tschaub
Copy link
Contributor

tschaub commented Sep 8, 2022

Thanks, @aykevl. For reference, here is the draft set of changes for Apache Arrow to work with TinyGo: apache/arrow#14026. Curious if you might have any suggestions for improvement.

@aykevl
Copy link
Member

aykevl commented Sep 8, 2022

Maybe, instead of this:

return unsafe.Slice((*int32)(unsafe.Pointer(h.Data)), cap(b)/Int32SizeBytes)[:len(b)/Int32SizeBytes]

You could do this:

return unsafe.Slice((*int32)(unsafe.Pointer(h.Data)), cap(b)/unsafe.Sizeof(int32(0)))[:len(b)/unsafe.Sizeof(int32(0))]

...but I'm not sure the latter is any easier to read. In any case, I think unsafe.Slice is generally an improvement over reflect.SliceHeader.

Ideally you could do this all in a single function using generics, but I don't know whether you still need to support Go 1.17. When you use generics, you can use unsafe.Sizeof to determine the size of each element in the slice.

Also, just FYI: casting slices like this is not risk-free, even on x86. You could get a segmentation fault if the alignment isn't correct.

@bcmills
Copy link

bcmills commented Apr 20, 2023

Note that as of Go 1.20 (assuming that TinyGo is up to date with the API changes in that release) you can use the Slice, SliceData, String, and StringData functions in the unsafe package instead of the explicitly non-portable reflect headers.

@aykevl
Copy link
Member

aykevl commented Apr 21, 2023

Yes, and I'm thankful for those functions :) they are supported in TinyGo.
We need to update the documentation here: https://tinygo.org/docs/guides/compatibility/#reflectsliceheader-and-reflectstringheader

@aykevl
Copy link
Member

aykevl commented Apr 25, 2023

Here is an update to the documentation that describes how to use these newer and more portable functions: tinygo-org/tinygo-site#346

(They're still not fully portable of course, because casting slices and strings like that can reveal architecture specific details).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants