Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error recovery/resilience #68

Open
bluetech opened this issue May 24, 2023 · 3 comments
Open

Error recovery/resilience #68

bluetech opened this issue May 24, 2023 · 3 comments

Comments

@bluetech
Copy link

Currently, all of the parse functions return Result<ast, ParseError>. This means that if the file has a syntax error, we get the error and nothing else. However, there are some use cases in which having access to parts which did parse successfully would be beneficial:

  • Language servers usually want to keep functioning when the file is temporarily broken during editing.
  • Linters might want to skip over broken parts and continue reporting lints, or maybe up to the syntax error.

On the other hand, interpreters usually don't have too much use for this AFAIK.

lalrpop seems to have some support for error recovery: https://lalrpop.github.io/lalrpop/tutorial/008_error_recovery.html
Alex Kladov has written about error recovery, most recently here: https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html

Is RustPython/Parser interested in error recovery or is it out of scope?

@youknowone
Copy link
Member

That sounds really great!
I wish we have it for our parser.

Will the return type be (ast, Vec<ParseError>) in that case?

@MichaReiser
Copy link
Contributor

Ruff is interested in having error recovery too but I believe that this will also require changes to the AST because errors should be represented in the tree in some way or another.

A possible approach is to introduce a BogusExpression node that the parser uses in places where it can't make sense of the syntax (we may want to have more than that). Another issue is that it could be interesting to synthesize nodes. Let's say we have a + where the right hand side is missing. The parser could synthesize a non-existing name expression with an empty name (it would need to mark the node as synthesized).

TypeScript uses synthesized nodes. I don't remember if they have BogusExpressions or if they simply try to force the syntax to somehow fit into an existing node type. I do know, that some nodes have properties that only exist for error recovery. Meaning, it can never be a valid tree if the node has one of those child-attributes initialized.

@bluetech
Copy link
Author

@youknowone

Will the return type be (ast, Vec) in that case?

I think that makes sense, yes. The lalrpop doc seems to suggest this.

I also saw that the person who added the error recovery support to lalrpop implemented a language in rust called gluon, and its grammar uses error recovery so we can have a real example: https://github.com/gluon-lang/gluon/blob/be67982154319dea5e5486dfd1c46f8ca587c82e/parser/src/grammar.lalrpop
gluon returns a vec of parser errors (wrapped in some structs and stuff).

@MichaReiser

Ruff is interested in having error recovery too but I believe that this will also require changes to the AST because errors should be represented in the tree in some way or another.

The errors will definitely need to be represented in the AST; just skipping over them (even if returning errors separately) is not an option IMO.

As for BogusExpression vs. synthesized nodes, synthesized nodes are more useful, but will need to see how complex it gets.


BTW, I forgot to mention the lexer, but as matklad says, "Lexer itself should be resilient, but that’s easy — produce an Error token for anything which isn’t a valid token.". This can be the first step, though it's mostly orthogonal to parser error recovery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants