Error recovery/resilience #68

bluetech · 2023-05-24T11:58:41Z

Currently, all of the parse functions return Result<ast, ParseError>. This means that if the file has a syntax error, we get the error and nothing else. However, there are some use cases in which having access to parts which did parse successfully would be beneficial:

Language servers usually want to keep functioning when the file is temporarily broken during editing.
Linters might want to skip over broken parts and continue reporting lints, or maybe up to the syntax error.

On the other hand, interpreters usually don't have too much use for this AFAIK.

lalrpop seems to have some support for error recovery: https://lalrpop.github.io/lalrpop/tutorial/008_error_recovery.html
Alex Kladov has written about error recovery, most recently here: https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html

Is RustPython/Parser interested in error recovery or is it out of scope?

The text was updated successfully, but these errors were encountered:

youknowone · 2023-05-24T12:11:27Z

That sounds really great!
I wish we have it for our parser.

Will the return type be (ast, Vec<ParseError>) in that case?

MichaReiser · 2023-05-24T12:22:28Z

Ruff is interested in having error recovery too but I believe that this will also require changes to the AST because errors should be represented in the tree in some way or another.

A possible approach is to introduce a BogusExpression node that the parser uses in places where it can't make sense of the syntax (we may want to have more than that). Another issue is that it could be interesting to synthesize nodes. Let's say we have a + where the right hand side is missing. The parser could synthesize a non-existing name expression with an empty name (it would need to mark the node as synthesized).

TypeScript uses synthesized nodes. I don't remember if they have BogusExpressions or if they simply try to force the syntax to somehow fit into an existing node type. I do know, that some nodes have properties that only exist for error recovery. Meaning, it can never be a valid tree if the node has one of those child-attributes initialized.

bluetech · 2023-05-24T16:24:30Z

@youknowone

Will the return type be (ast, Vec) in that case?

I think that makes sense, yes. The lalrpop doc seems to suggest this.

I also saw that the person who added the error recovery support to lalrpop implemented a language in rust called gluon, and its grammar uses error recovery so we can have a real example: https://github.com/gluon-lang/gluon/blob/be67982154319dea5e5486dfd1c46f8ca587c82e/parser/src/grammar.lalrpop
gluon returns a vec of parser errors (wrapped in some structs and stuff).

@MichaReiser

Ruff is interested in having error recovery too but I believe that this will also require changes to the AST because errors should be represented in the tree in some way or another.

The errors will definitely need to be represented in the AST; just skipping over them (even if returning errors separately) is not an option IMO.

As for BogusExpression vs. synthesized nodes, synthesized nodes are more useful, but will need to see how complex it gets.

BTW, I forgot to mention the lexer, but as matklad says, "Lexer itself should be resilient, but that’s easy — produce an Error token for anything which isn’t a valid token.". This can be the first step, though it's mostly orthogonal to parser error recovery.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error recovery/resilience #68

Error recovery/resilience #68

bluetech commented May 24, 2023

youknowone commented May 24, 2023

MichaReiser commented May 24, 2023

bluetech commented May 24, 2023

Error recovery/resilience #68

Error recovery/resilience #68

Comments

bluetech commented May 24, 2023

youknowone commented May 24, 2023

MichaReiser commented May 24, 2023

bluetech commented May 24, 2023