Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A file with junk bytes after the tree leads to a read failure #441

Open
braingram opened this issue Aug 3, 2024 · 4 comments · May be fixed by asdf-format/asdf#1822
Open

A file with junk bytes after the tree leads to a read failure #441

braingram opened this issue Aug 3, 2024 · 4 comments · May be fixed by asdf-format/asdf#1822
Assignees

Comments

@braingram
Copy link
Contributor

Description of the problem

The standard mentions:

There may be an arbitrary amount of unused space between the end of the tree and the first block. To find the beginning of the first block, ASDF parsers should search from the end of the tree for the first occurrence of the block_magic_token. If the file contains no tree, the first block must begin immediately after the header with no padding.

However using the following file:

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.0.2.dev112+g143eb2d9}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.0.4.dev187+g1458b25}
    software: !core/software-1.0.0 {name: asdf, version: 3.0.2.dev112+g143eb2d9}
arr: !core/ndarray-1.0.0
  source: 0
  datatype: int64
  byteorder: little
  shape: [42]
...
JUNK
<<<blocks, unfortunately these don't paste well>>>

Results in an error:

OSError: Invalid bytes while reading blocks b'JUNK'

Note that null characters are ok (and don't produce an error).

System information

The error occurs for all asdf version 3.0.0 and newer. 2.15 (and 2.13) were tested and don't produce errors.

@zacharyburnett zacharyburnett linked a pull request Aug 12, 2024 that will close this issue
7 tasks
@zacharyburnett zacharyburnett self-assigned this Aug 13, 2024
@zacharyburnett
Copy link
Member

as we discussed in standup, perhaps this merits changing the ASDF Standard to NOT support junk bytes

@braingram
Copy link
Contributor Author

I think so. Some of it comes down to what the standard means by unused space. If it means 0 bytes we support this:
https://github.com/asdf-format/asdf/blob/dbbc26f5f0c3a4a7775bca0903bd0051a132b2e4/asdf/_block/reader.py#L172-L173
if it means any arbitrary uninitialized bytes (junk) than we don't support this and I'd argue it's dangerous to allow it (since the uninitialized bytes could happen to match the block magic).

What do you think we should do here?

@zacharyburnett
Copy link
Member

making it more explicit will make our lives easier

- There may be an arbitrary amount of unused space between the end of the tree and the first block. 
+ There may be 0 bytes of unused space between the end of the tree and the first block. 

@braingram
Copy link
Contributor Author

Updating the standard sounds good to me.

We can work out the wording in the PR and I think a reference file that contains some 0 bytes between the tree and first block would be useful. I vote for 5 bytes since the block magic is 4 bytes long (to catch possible bugs where the post-tree file contents are parsed in 4 byte chunks).

I'll try to transfer this issue to asdf-standard

@braingram braingram transferred this issue from asdf-format/asdf Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants