Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use nom-locate to add span information in the AST #129

Closed
wants to merge 1 commit into from
Closed

WIP: Use nom-locate to add span information in the AST #129

wants to merge 1 commit into from

Conversation

alixinne
Copy link
Contributor

The main target for this PR is issue #74. By using nom_locate we can rewrite the parsing functions to operate on input with span information added in. I'll post an update when the functionality is usable (for now I've just changed the parser and tests to use nom_locate::LocatedSpan).

This is flagged as WIP to discuss how to integrate this in the current architecture. One potential side-effect is allowing to parse comments (as requested in #116). We could have a way to associate "non-significant" spans (whitespace and comments) to the relevant syntax nodes in order to parse comments relative to the syntax tree, as suggested in https://www.oilshell.org/blog/2017/02/11.html.

This one is more of a stretch and it might be more sensible to be a second iteration on this feature, but this would allow preserving pre-processing directives in non-top-level contexts in the same way, which is currently broken (see issues #117 and #64).

@hadronized
Copy link
Owner

Thanks! I’ll try to dedicate some time to read this asap!

@alixinne
Copy link
Contributor Author

In the last commit I decided to try implementing a ParseInput type. This allows passing in the LocatedSpan from nom_locate as well as a parsing context. The role of the parsing context is for now (to test this strategy) to record the comments as they are successfully parsed.

This requires quite a lot of changes since this introduces lifetimes everywhere, and the ParseContext structure is using a RefCell to allow the ParseInput type to be Copy using only a shared reference. This might not be the prettiest solution but it does work without having to rewrite the entire parser, and also since many of the nom combinators won't accept an FnMut which would appear if we held an exclusive reference to the parsing context.

Using a parsing context will allow replacing the Node structure with one that only holds an Option<NonZeroUSize> to reference a span by its identifier in the parsing context list of spans (not yet implemented). This will make the memory impact of these changes minimal, and also optional (for example, programmatically-generated syntax trees might not have span information so we can just use None and no context).


let expected = syntax::TranslationUnit(syntax::NonEmpty(vec![block]));

assert_eq!(translation_unit(src), Ok(("", expected)));
assert_eq_parser(translation_unit, src, Ok(("", expected)), &ctx);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently broken on Windows because of how new lines are checked out by git with autocrlf on

@alixinne
Copy link
Contributor Author

Spans are now owned by nodes instead of the parsing context. This simplifies a big part of the code (and probably more when I finish refactoring). I think it's a better idea to have span information working first, and then to optimize how this is handled if it proves necessary.

To avoid having too many changes required on the parsing tests, there is now a NodeContentsEq trait with an associated assert_ceq! macro to compare syntax trees ignoring span information (PartialEq still compares everything in the tree). Its implementation is handled by the proc macro in glsl-impl.

@alixinne
Copy link
Contributor Author

alixinne commented Sep 1, 2020

With the proc-macro in place we can also automatically generate type aliases for OldNodeType -> pub type OldNodeType = Node<RenamedOldNodeType> which allows integrating span information in the AST with minimal breakage to the API (the type names remain the same, except when explicitely matching on enum variants. This also avoids having Node<T> everywhere in the API. The into_node() method was also removed using this, so into() will convert from node data into a node with no span information.

@alixinne alixinne closed this by deleting the head repository Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants