You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unless I misunderstand the YAML spec's section on characters, all the bytes in our current block identifier sequence are valid in a YAML document:
d3 42 4c 4b
If this is true, then should we consider changing one of these characters to be outside of the YAML valid set? Doing so would allow us to seek through the ASDF file to find the first block without first parsing the YAML section.
The text was updated successfully, but these errors were encountered:
The ASDF Standard requires that the tree be encoded in UTF-8:
ASDF is a hybrid text and binary format. The header, tree and block index are text, (specifically, in UTF-8 with DOS or UNIX-style newlines), while the blocks are raw binary.
and the block identifier sequence is in fact invalid UTF-8, since 0xD3 must be followed by a byte in the range 80..BF (see table 3.7 in the unicode standard).
So it should be possible to seek to the first block by looking for this sequence, but maybe we need to better document that fact. I'll change the title of this issue accordingly.
eslavich
changed the title
More convenient block magic sequence
Document that the block magic sequence is invalid UTF-8
Jun 9, 2021
Unless I misunderstand the YAML spec's section on characters, all the bytes in our current block identifier sequence are valid in a YAML document:
If this is true, then should we consider changing one of these characters to be outside of the YAML valid set? Doing so would allow us to seek through the ASDF file to find the first block without first parsing the YAML section.
The text was updated successfully, but these errors were encountered: