Skip to content

Commit

Permalink
add GLR Parser
Browse files Browse the repository at this point in the history
    - bump version to 2.5.0
    - add lookahead predefined variable to reduce action
    - add README about GLR parser
    - add(parser) %glr syntax
    - invalid non-terminal now returns InvalidTerminalError, no unreachable
    - add(core) Parser and Context trait
    - fix(core) add `allow_conflict` option to grammar builder
  • Loading branch information
ehwan committed Aug 24, 2024
1 parent 03ea7a7 commit da23178
Show file tree
Hide file tree
Showing 32 changed files with 2,012 additions and 886 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ members = [
"rusty_lr_executable",
"example/calculator",
"example/calculator_u8",
"example/lrtest",
"example/lrtest", "example/glr",
]
72 changes: 43 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
[![crates.io](https://img.shields.io/crates/v/rusty_lr.svg)](https://crates.io/crates/rusty_lr)
[![docs.rs](https://docs.rs/rusty_lr/badge.svg)](https://docs.rs/rusty_lr)

yacc-like LR(1) and LALR(1) Deterministic Finite Automata (DFA) generator from Context Free Grammar (CFGs).
GLR, LR(1) and LALR(1) parser generator for Rust.

RustyLR provides [procedural macros](#proc-macro) and [buildscript tools](#integrating-with-buildrs) to generate LR(1) and LALR(1) parser.
RustyLR provides [procedural macros](#proc-macro) and [buildscript tools](#integrating-with-buildrs) to generate GLR, LR(1) and LALR(1) parser.
The generated parser will be a pure Rust code, and the calculation of building DFA will be done at compile time.
Reduce action can be written in Rust code,
Reduce action can be written in Rust,
and the error messages are [readable and detailed](#readable-error-messages-with-codespan).
For huge and complex grammars, it is recommended to use the [buildscipt](#integrating-with-buildrs).

Expand Down Expand Up @@ -125,7 +125,7 @@ println!("userdata: {}", userdata);
- [Proc-macro](#proc-macro)
- [Integrating with `build.rs`](#integrating-with-buildrs)
- [Start Parsing](#start-parsing)
- [Error Handling](#error-handling)
- [GLR Parser](#glr-parser)
- [Syntax](#syntax)


Expand Down Expand Up @@ -234,32 +234,30 @@ for token in input_sequence {
let start_symbol_value = context.accept();
```

## Error Handling
There are two error variants returned from `feed()` function:
- `InvalidTerminal(InvalidTerminalError)` : when invalid terminal symbol is fed
- `ReduceAction(ReduceActionError)` : when the reduce action returns `Err(Error)`

For `ReduceActionError`, the error type can be defined by [`%err`](#error-type-optional) directive. If not defined, `DefaultReduceActionError` will be used.

When printing the error message, there are two ways to get the error message:
- `e.long_message( &parser, &context )` : get the error message as `String`, in a detailed format
- `e as Display` : briefly print the short message through `Display` trait.

The `long_message` function requires the reference to the parser and the context.
It will make a detailed error message of what current state was trying to parse, and what the expected terminal symbols were.
### Example of long_message
## GLR Parser
The GLR (Generalized LR parser) can be generated by `%glr;` directive in the grammar.
```
Invalid Terminal: *. Expected one of: , (, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
>>> In:
M -> M * • M
>>> Backtrace:
M -> M • * M
>>> Backtrace:
A -> A + • A
>>> Backtrace:
A -> A • + A
// generate GLR parser;
// from now on, shift/reduce, reduce/reduce conflicts will not be treated as errors
%glr;
...
```
GLR parser can handle ambiguous grammars that LR(1) or LALR(1) parser cannot.
When it encounters any kind of conflict during parsing,
the parser will diverge into multiple states, and will try every paths until it fails.
Of course, there must be single unique path left at the end of parsing (the point where you feed `eof` token).

### Resolving Ambiguities
You can resolve the ambiguties through the [reduce action](#reduceaction-optional).
Simply, returning `Result::Err(Error)` from the reduce action will revoke current path.
The `Error` variant type can be defined by [`%err`](#error-type-optional) directive.

### Note on GLR Parser
- Still in development, not have been tested enough (patches are welcome!).
- Since there are multiple paths, the reduce action can be called multiple times, even if the result will be thrown away in the future.
- Every `RuleType` and `Term` must implement `Clone` trait.
- User must be aware of the point where shift/reduce or reduce/reduce conflicts occur.
Every time the parser diverges, the calculation cost will increase.

## Syntax
To start writing down a context-free grammar, you need to define necessary directives first.
Expand Down Expand Up @@ -301,6 +299,7 @@ Every line in the macro must follow the syntax below.
- [`%err`, `%error`](#error-type-optional)
- [`%derive`](#derive-optional)
- [`%derive`](#derive-optional)
- [`%glr`](#glr-parser-generation)


---
Expand Down Expand Up @@ -409,7 +408,8 @@ A(i32): ... ;
### Accessing token data in ReduceAction

**predefined variables** can be used in `ReduceAction`:
- `data` : userdata passed to `feed()` function.
- `data` ( `&mut UserData` ) : userdata passed to the `feed()` function.
- `lookahead` ( `&Term` ) : lookahead token that caused the reduce action.

To access the data of each token, you can directly use the name of the token as a variable.
- For non-terminal symbols, the type of variable is `RuleType`.
Expand Down Expand Up @@ -682,4 +682,18 @@ let mut context = parser.begin();

println!( "{:?}", context ); // debug-print context
let cloned_context = context.clone(); // clone context, you can re-feed the input sequence using cloned context
```
```


---

### GLR parser generation
```
%glr;
```
Swith to GLR parser generation.

If you want to generate GLR parser, add `%glr;` directive in the grammar.
With this directive, any Shift/Reduce, Reduce/Reduce conflicts will not be treated as errors.

See [GLR Parser](#glr-parser) section for more details.
7 changes: 7 additions & 0 deletions example/glr/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[package]
name = "glr"
version = "0.1.0"
edition = "2021"

[dependencies]
rusty_lr = { path = "../../rusty_lr", features = ["fxhash"] }
44 changes: 44 additions & 0 deletions example/glr/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
pub mod parser;
// pub mod parser_expanded;

fn main() {
let p = parser::EParser::new();
let mut c = p.begin();

let input = "1+2*3+4";
for ch in input.chars() {
println!("feed: {}", ch);
match p.feed(&mut c, ch) {
Ok(_) => {
println!("nodes: {}", c.current_nodes.nodes.len());
}
Err(e) => {
println!("Error: {:?}", e);
break;
}
}
}
println!("feed eof");
match p.feed(&mut c, '\0') {
Ok(_) => {
println!("nodes: {}", c.current_nodes.nodes.len());
}
Err(e) => {
println!("Error: {:?}", e);
}
}

println!("{}", c.accept());

// for mut n in c.current_nodes.nodes.into_iter() {
// loop {
// println!("{}", n.state());
// if let Some(par) = n.parent() {
// n = std::rc::Rc::clone(par);
// } else {
// break;
// }
// }
// println!("---");
// }
}
31 changes: 31 additions & 0 deletions example/glr/src/parser.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
use rusty_lr::lr1;

lr1! {
%glr;
%tokentype char;
%start E;
%eof '\0';

%left [plus star];

%token plus '+';
%token star '*';
%token zero '0';
%token one '1';
%token two '2';
%token three '3';
%token four '4';
%token five '5';
%token six '6';
%token seven '7';
%token eight '8';
%token nine '9';

Digit(char): ch=[zero-nine] { println!("Digit: {}", ch); ch };

E(i32) : E plus e2=E { println!("Plus"); E + e2 }
| E star e2=E { println!("Star"); E * e2 }
| Digit { println!("D2E"); Digit as i32 - '0' as i32 }
;

}
12 changes: 6 additions & 6 deletions rusty_lr/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
[package]
name = "rusty_lr"
version = "2.4.0"
version = "2.5.0"
edition = "2021"
license = "MIT"
description = "yacc-like, LR(1) and LALR(1) parser generator with custom reduce action"
description = "GLR, LR(1) and LALR(1) parser generator with custom reduce action"
repository = "https://github.com/ehwan/RustyLR"
readme = "../README.md"
keywords = ["parser", "yacc", "context-free-grammar", "lr", "compiler"]
keywords = ["parser", "bison", "context-free-grammar", "lr", "compiler"]
categories = ["parsing", "compilers", "parser-implementations"]

[dependencies]
rusty_lr_core = "2.4"
rusty_lr_derive = "1.11"
rusty_lr_buildscript = { version = "0.6", optional = true }
rusty_lr_core = "2.5"
rusty_lr_derive = "1.12"
rusty_lr_buildscript = { version = "0.7", optional = true }
# rusty_lr_core = { path = "../rusty_lr_core" }
# rusty_lr_derive = { path = "../rusty_lr_derive" }
# rusty_lr_buildscript = { path = "../rusty_lr_buildscript", optional = true }
Expand Down
65 changes: 34 additions & 31 deletions rusty_lr/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
//! # RustyLR
//! yacc-like LR(1) and LALR(1) Deterministic Finite Automata (DFA) generator from Context Free Grammar (CFGs).
//! GLR, LR(1) and LALR(1) parser generator for Rust.
//!
//! RustyLR provides [procedural macros](#proc-macro) and [buildscript tools](#integrating-with-buildrs) to generate LR(1) and LALR(1) parser.
//! RustyLR provides [procedural macros](#proc-macro) and [buildscript tools](#integrating-with-buildrs) to generate GLR, LR(1) and LALR(1) parser.
//! The generated parser will be a pure Rust code, and the calculation of building DFA will be done at compile time.
//! Reduce action can be written in Rust code,
//! Reduce action can be written in Rust,
//! and the error messages are **readable and detailed**.
//! For huge and complex grammars, it is recommended to use the [buildscipt](#integrating-with-buildrs).
//!
Expand All @@ -27,11 +27,11 @@
//!
//! These macros will generate structs:
//! - `Parser` : contains DFA tables and production rules
//! - [`ParseError`] : type alias for `Error` returned from `feed()`
//! - `ParseError` : type alias for `Error` returned from `feed()`
//! - `Context` : contains current state and data stack
//! - `enum NonTerminals` : a list of non-terminal symbols
//! - [`Rule`](`ProductionRule`) : type alias for production rules
//! - [`State`] : type alias for DFA states
//! - `State` : type alias for DFA states
//!
//! All structs above are prefixed by `<StartSymbol>`.
//! In most cases, what you want is the `Parser` and `ParseError` structs, and the others are used internally.
Expand Down Expand Up @@ -125,31 +125,34 @@
//! let start_symbol_value = context.accept();
//! ```
//!
//! ## Error Handling
//! There are two error variants returned from `feed()` function:
//! - `InvalidTerminal(InvalidTerminalError)` : when invalid terminal symbol is fed
//! - `ReduceAction(ReduceActionError)` : when the reduce action returns `Err(Error)`
//!
//! For `ReduceActionError`, the error type can be defined by [`%err`](#error-type-optional) directive. If not defined, `String` will be used.
//!
//! When printing the error message, there are two ways to get the error message:
//! - `e.long_message( &parser, &context )` : get the error message as `String`, in a detailed format
//! - `e as Display` : briefly print the short message through `Display` trait.
//!
//! The `long_message` function requires the reference to the parser and the context.
//! It will make a detailed error message of what current state was trying to parse, and what the expected terminal symbols were.
//! ### Example of long_message
//! ```text
//! Invalid Terminal: *. Expected one of: , (, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
//! >>> In:
//! M -> M * • M
//! >>> Backtrace:
//! M -> M • * M
//! >>> Backtrace:
//! A -> A + • A
//! >>> Backtrace:
//! A -> A • + A
//!
//!
//! ## GLR Parser
//! The GLR (Generalized LR parser) can be generated by `%glr;` directive in the grammar.
//! ```
//! // generate GLR parser;
//! // from now on, shift/reduce, reduce/reduce conflicts will not be treated as errors
//! %glr;
//! ...
//! ```
//! GLR parser can handle ambiguous grammars that LR(1) or LALR(1) parser cannot.
//! When it encounters any kind of conflict during parsing,
//! the parser will diverge into multiple states, and will try every paths until it fails.
//! Of course, there must be single unique path left at the end of parsing (the point where you feed `eof` token).
//!
//! ### Resolving Ambiguities
//! You can resolve the ambiguties through the reduce action.
//! Simply, returning `Result::Err(Error)` from the reduce action will revoke current path.
//! The `Error` variant type can be defined by `%err` directive.
//!
//! ### Note on GLR Parser
//! - Still in development, not have been tested enough (patches are welcome!).
//! - Since there are multiple paths, the reduce action can be called multiple times, even if the result will be thrown away in the future.
//! - Every `RuleType` and `Term` must implement `Clone` trait.
//! - User must be aware of the point where shift/reduce or reduce/reduce conflicts occur.
//! Every time the parser diverges, the calculation cost will increase.
//!
//!
//!
//! ## Syntax
//! To start writing down a context-free grammar, you need to define necessary directives first.
Expand All @@ -168,8 +171,8 @@
//! }
//! ```
//!
//! `lr1!` macro will generate a parser struct with LR(1) DFA tables.
//! If you want to generate LALR(1) parser, use `lalr1!` macro.
//! [`lr1!`] macro will generate a parser struct with LR(1) DFA tables.
//! If you want to generate LALR(1) parser, use [`lalr1!`] macro.
//! Every line in the macro must follow the syntax below.
//!
//! Syntax can be found in [repository](https://github.com/ehwan/RustyLR/tree/main?tab=readme-ov-file#syntax).
Expand Down
6 changes: 3 additions & 3 deletions rusty_lr_buildscript/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rusty_lr_buildscript"
version = "0.6.0"
version = "0.7.0"
edition = "2021"
license = "MIT"
description = "buildscipt tools for rusty_lr"
Expand All @@ -14,7 +14,7 @@ categories = ["parsing"]
proc-macro2 = { version = "1.0.86", features = ["span-locations"] }
quote = "1.0"
# rusty_lr_parser = { path = "../rusty_lr_parser" }
rusty_lr_parser = "3.9"
rusty_lr_parser = "3.10"
# rusty_lr_core = { path = "../rusty_lr_core", features = ["fxhash", "builder"] }
rusty_lr_core = { version = "2.4", features = ["fxhash", "builder"] }
rusty_lr_core = { version = "2.5", features = ["fxhash", "builder"] }
codespan-reporting = "0.11"
2 changes: 1 addition & 1 deletion rusty_lr_core/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rusty_lr_core"
version = "2.4.0"
version = "2.5.0"
edition = "2021"
license = "MIT"
description = "core library for rusty_lr"
Expand Down
Loading

0 comments on commit da23178

Please sign in to comment.