Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Ensure that unparsable text is not lost in the generated output (#1012)
The default ErrorStrategy for ANTLR generated parsers performs a sophisticated error recovery synchronization process for both PLUS_LOOP_BACK and STAR_LOOP_BACK as well as token manufacture/insertion and single token deletion within token sequences. This culminates and a call to `recover()` which finds the next token in the followSet for an alt that allows the parser to resume. We intercept `recover()` in order to record where we were unable to parse text. While errors are reported, the default error strategy does not preserve any discarded input in the generated ParseTree and so this is lost if we generate those parts of the ParseTree that were successfully generated. Here, we implement custom strategies that gather un-parsable input and preserve them as custom error nodes in the ParserTree at strategic insertion points in the higher level rules such as `sqlCommand` (in the case of Snowflake) and `sqlClauses` in the case of TSQL. The visitors for these rules can then first check for an error node in the children and generate an Ir node representing the unparsed text. For this PR to be usable, we need our PLanParser to no longer stop when syntax errors are discovered as it is now safe to walk the ParseTree. That improvement is for a separate PR.
- Loading branch information