-
Notifications
You must be signed in to change notification settings - Fork 886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider whether parse-less tokenizing is viable in the long run #589
Comments
In effe659, I've changed the parser to use its own knowledge about the syntax structure, without running the sweet.js token context magic at all, but left that magic in for the case where we're only reading tokens. It's a bit kludgy, but not quite as disruptive as completely dropping the tokenizer-only functionality, so I think it's an okay compromise. |
Reopening (and I've reverted the patches in 1a07466). I realized that my alternative approach completely broke independent tokenizing of template strings with interpolated fields, so that's a non-starter unless we decide to drop support for tokenizing entirely. Also, the way it required the parser to drive the tokenizer (via re-tokenizing slashes when at an expression, and setting a flag at the right moment when tokenizing template strings) was pretty awkward in its own right. So the problem described in this issue remains until we come up with some better approach, and I'm reopening #552. |
Going to punt on this for version 6.0. Can't find a good solution, and it's better to stick with the existing bad solution than to pivot to a new bad solution. |
Since f0cbb35 the parser forces a regexp token when it sees a |
A few years ago we adopted the 'sweet.js algorithm', which tries to distinguish syntactical syntax from the token stream to be able to disambiguate things like the division op and the start of a regexp, so that the tokenizer can be ran without also running the parser.
New ES versions have increasingly complicated the story here, and neither sweet.js itself nor Esprima, which also implements this, seems to really be motivated to keep up with that. We've been slowly complicating our algorithm, but it's starting to get shaky, and now PR #575 is the first place where we've reintroduced a dependency of the tokenizer on the parser.
Maybe, instead of putting further energy into this, and dealing with the bugs that come from this complexity, we should just deprecate tokenizing without also running a parser?
(Though that would have consequences like making it impossible to, in the future, reframe our lookahead approach or loose parser in terms of actual token lookahead, so that'd have to be considered carefully.)
The text was updated successfully, but these errors were encountered: