add GLR Parser

- bump version to 2.5.0 - add lookahead predefined variable to reduce action - add README about GLR parser - add(parser) %glr syntax - invalid non-terminal now returns InvalidTerminalError, no unreachable - add(core) Parser and Context trait - fix(core) add `allow_conflict` option to grammar builder
ehwan · Aug 24, 2024 · da23178 · da23178
1 parent 03ea7a7
commit da23178
Show file tree

Hide file tree

Showing 32 changed files with 2,012 additions and 886 deletions.
diff --git a/Cargo.toml b/Cargo.toml
@@ -8,5 +8,5 @@ members = [
   "rusty_lr_executable",
   "example/calculator",
   "example/calculator_u8",
-  "example/lrtest",
+  "example/lrtest", "example/glr",
 ]
diff --git a/README.md b/README.md
@@ -2,11 +2,11 @@
 [![crates.io](https://img.shields.io/crates/v/rusty_lr.svg)](https://crates.io/crates/rusty_lr)
 [![docs.rs](https://docs.rs/rusty_lr/badge.svg)](https://docs.rs/rusty_lr)
 
-yacc-like LR(1) and LALR(1) Deterministic Finite Automata (DFA) generator from Context Free Grammar (CFGs).
+GLR, LR(1) and LALR(1) parser generator for Rust.
 
-RustyLR provides [procedural macros](#proc-macro) and [buildscript tools](#integrating-with-buildrs) to generate LR(1) and LALR(1) parser.
+RustyLR provides [procedural macros](#proc-macro) and [buildscript tools](#integrating-with-buildrs) to generate GLR, LR(1) and LALR(1) parser.
 The generated parser will be a pure Rust code, and the calculation of building DFA will be done at compile time.
-Reduce action can be written in Rust code,
+Reduce action can be written in Rust,
 and the error messages are [readable and detailed](#readable-error-messages-with-codespan).
 For huge and complex grammars, it is recommended to use the [buildscipt](#integrating-with-buildrs).
 
@@ -125,7 +125,7 @@ println!("userdata: {}", userdata);
  - [Proc-macro](#proc-macro)
  - [Integrating with `build.rs`](#integrating-with-buildrs)
  - [Start Parsing](#start-parsing)
- - [Error Handling](#error-handling)
+ - [GLR Parser](#glr-parser)
  - [Syntax](#syntax)
 
 
@@ -234,32 +234,30 @@ for token in input_sequence {
 let start_symbol_value = context.accept();
 ```
 
-## Error Handling
-There are two error variants returned from `feed()` function:
- - `InvalidTerminal(InvalidTerminalError)` : when invalid terminal symbol is fed
- - `ReduceAction(ReduceActionError)` : when the reduce action returns `Err(Error)`
-
-For `ReduceActionError`, the error type can be defined by [`%err`](#error-type-optional) directive. If not defined, `DefaultReduceActionError` will be used.
-
-When printing the error message, there are two ways to get the error message:
- - `e.long_message( &parser, &context )` : get the error message as `String`, in a detailed format
- - `e as Display` : briefly print the short message through `Display` trait.
-
-The `long_message` function requires the reference to the parser and the context.
-It will make a detailed error message of what current state was trying to parse, and what the expected terminal symbols were.
-### Example of long_message
+## GLR Parser
+The GLR (Generalized LR parser) can be generated by `%glr;` directive in the grammar.
 ```
-Invalid Terminal: *. Expected one of:  , (, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
->>> In:
-	M -> M * • M
->>> Backtrace:
-	M -> M • * M
->>> Backtrace:
-	A -> A + • A
->>> Backtrace:
-	A -> A • + A
+// generate GLR parser;
+// from now on, shift/reduce, reduce/reduce conflicts will not be treated as errors
+%glr; 
+...
 ```
+GLR parser can handle ambiguous grammars that LR(1) or LALR(1) parser cannot.
+When it encounters any kind of conflict during parsing,
+the parser will diverge into multiple states, and will try every paths until it fails.
+Of course, there must be single unique path left at the end of parsing (the point where you feed `eof` token).
 
+### Resolving Ambiguities
+You can resolve the ambiguties through the [reduce action](#reduceaction-optional).
+Simply, returning `Result::Err(Error)` from the reduce action will revoke current path.
+The `Error` variant type can be defined by [`%err`](#error-type-optional) directive.
+
+### Note on GLR Parser
+ - Still in development, not have been tested enough (patches are welcome!).
+ - Since there are multiple paths, the reduce action can be called multiple times, even if the result will be thrown away in the future.
+    - Every `RuleType` and `Term` must implement `Clone` trait.
+ - User must be aware of the point where shift/reduce or reduce/reduce conflicts occur.
+ Every time the parser diverges, the calculation cost will increase.
 
 ## Syntax
 To start writing down a context-free grammar, you need to define necessary directives first.
@@ -301,6 +299,7 @@ Every line in the macro must follow the syntax below.
  - [`%err`, `%error`](#error-type-optional)
  - [`%derive`](#derive-optional)
  - [`%derive`](#derive-optional)
+ - [`%glr`](#glr-parser-generation)
 
 
 ---
@@ -409,7 +408,8 @@ A(i32): ... ;
 ### Accessing token data in ReduceAction
 
 **predefined variables** can be used in `ReduceAction`:
- - `data` : userdata passed to `feed()` function.
+ - `data` ( `&mut UserData` ) : userdata passed to the `feed()` function.
+ - `lookahead` ( `&Term` ) : lookahead token that caused the reduce action.
 
 To access the data of each token, you can directly use the name of the token as a variable.
  - For non-terminal symbols, the type of variable is `RuleType`.
@@ -682,4 +682,18 @@ let mut context = parser.begin();
 
 println!( "{:?}", context );          // debug-print context
 let cloned_context = context.clone(); // clone context, you can re-feed the input sequence using cloned context
-```
+```
+
+
+---
+
+### GLR parser generation
+```
+%glr;
+```
+Swith to GLR parser generation.
+
+If you want to generate GLR parser, add `%glr;` directive in the grammar.
+With this directive, any Shift/Reduce, Reduce/Reduce conflicts will not be treated as errors.
+
+See [GLR Parser](#glr-parser) section for more details.
diff --git a/example/glr/Cargo.toml b/example/glr/Cargo.toml
@@ -0,0 +1,7 @@
+[package]
+name = "glr"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
+rusty_lr = { path = "../../rusty_lr", features = ["fxhash"] }
diff --git a/example/glr/src/main.rs b/example/glr/src/main.rs
@@ -0,0 +1,44 @@
+pub mod parser;
+// pub mod parser_expanded;
+
+fn main() {
+    let p = parser::EParser::new();
+    let mut c = p.begin();
+
+    let input = "1+2*3+4";
+    for ch in input.chars() {
+        println!("feed: {}", ch);
+        match p.feed(&mut c, ch) {
+            Ok(_) => {
+                println!("nodes: {}", c.current_nodes.nodes.len());
+            }
+            Err(e) => {
+                println!("Error: {:?}", e);
+                break;
+            }
+        }
+    }
+    println!("feed eof");
+    match p.feed(&mut c, '\0') {
+        Ok(_) => {
+            println!("nodes: {}", c.current_nodes.nodes.len());
+        }
+        Err(e) => {
+            println!("Error: {:?}", e);
+        }
+    }
+
+    println!("{}", c.accept());
+
+    // for mut n in c.current_nodes.nodes.into_iter() {
+    //     loop {
+    //         println!("{}", n.state());
+    //         if let Some(par) = n.parent() {
+    //             n = std::rc::Rc::clone(par);
+    //         } else {
+    //             break;
+    //         }
+    //     }
+    //     println!("---");
+    // }
+}
diff --git a/example/glr/src/parser.rs b/example/glr/src/parser.rs
@@ -0,0 +1,31 @@
+use rusty_lr::lr1;
+
+lr1! {
+    %glr;
+    %tokentype char;
+    %start E;
+    %eof '\0';
+
+    %left [plus star];
+
+    %token plus '+';
+    %token star '*';
+    %token zero '0';
+    %token one '1';
+    %token two '2';
+    %token three '3';
+    %token four '4';
+    %token five '5';
+    %token six '6';
+    %token seven '7';
+    %token eight '8';
+    %token nine '9';
+
+    Digit(char): ch=[zero-nine] { println!("Digit: {}", ch); ch };
+
+    E(i32) : E plus e2=E  {  println!("Plus"); E + e2 }
+           | E star e2=E  {  println!("Star"); E * e2 }
+           | Digit { println!("D2E"); Digit as i32 - '0' as i32 }
+           ;
+
+}
diff --git a/rusty_lr/Cargo.toml b/rusty_lr/Cargo.toml
@@ -1,18 +1,18 @@
 [package]
 name = "rusty_lr"
-version = "2.4.0"
+version = "2.5.0"
 edition = "2021"
 license = "MIT"
-description = "yacc-like, LR(1) and LALR(1) parser generator with custom reduce action"
+description = "GLR, LR(1) and LALR(1) parser generator with custom reduce action"
 repository = "https://github.com/ehwan/RustyLR"
 readme = "../README.md"
-keywords = ["parser", "yacc", "context-free-grammar", "lr", "compiler"]
+keywords = ["parser", "bison", "context-free-grammar", "lr", "compiler"]
 categories = ["parsing", "compilers", "parser-implementations"]
 
 [dependencies]
-rusty_lr_core = "2.4"
-rusty_lr_derive = "1.11"
-rusty_lr_buildscript = { version = "0.6", optional = true }
+rusty_lr_core = "2.5"
+rusty_lr_derive = "1.12"
+rusty_lr_buildscript = { version = "0.7", optional = true }
 # rusty_lr_core = { path = "../rusty_lr_core" }
 # rusty_lr_derive = { path = "../rusty_lr_derive" }
 # rusty_lr_buildscript = { path = "../rusty_lr_buildscript", optional = true }

diff --git a/rusty_lr/src/lib.rs b/rusty_lr/src/lib.rs
@@ -1,9 +1,9 @@
 //! # RustyLR
-//! yacc-like LR(1) and LALR(1) Deterministic Finite Automata (DFA) generator from Context Free Grammar (CFGs).
+//! GLR, LR(1) and LALR(1) parser generator for Rust.
 //!
-//! RustyLR provides [procedural macros](#proc-macro) and [buildscript tools](#integrating-with-buildrs) to generate LR(1) and LALR(1) parser.
+//! RustyLR provides [procedural macros](#proc-macro) and [buildscript tools](#integrating-with-buildrs) to generate GLR, LR(1) and LALR(1) parser.
 //! The generated parser will be a pure Rust code, and the calculation of building DFA will be done at compile time.
-//! Reduce action can be written in Rust code,
+//! Reduce action can be written in Rust,
 //! and the error messages are **readable and detailed**.
 //! For huge and complex grammars, it is recommended to use the [buildscipt](#integrating-with-buildrs).
 //!
@@ -27,11 +27,11 @@
 //!
 //! These macros will generate structs:
 //!  - `Parser` : contains DFA tables and production rules
-//!  - [`ParseError`] : type alias for `Error` returned from `feed()`
+//!  - `ParseError` : type alias for `Error` returned from `feed()`
 //!  - `Context` : contains current state and data stack
 //!  - `enum NonTerminals` : a list of non-terminal symbols
 //!  - [`Rule`](`ProductionRule`) : type alias for production rules
-//!  - [`State`] : type alias for DFA states
+//!  - `State` : type alias for DFA states
 //!
 //! All structs above are prefixed by `<StartSymbol>`.
 //! In most cases, what you want is the `Parser` and `ParseError` structs, and the others are used internally.
@@ -125,31 +125,34 @@
 //! let start_symbol_value = context.accept();
 //! ```
 //!
-//! ## Error Handling
-//! There are two error variants returned from `feed()` function:
-//!  - `InvalidTerminal(InvalidTerminalError)` : when invalid terminal symbol is fed
-//!  - `ReduceAction(ReduceActionError)` : when the reduce action returns `Err(Error)`
-//!
-//! For `ReduceActionError`, the error type can be defined by [`%err`](#error-type-optional) directive. If not defined, `String` will be used.
-//!
-//! When printing the error message, there are two ways to get the error message:
-//!  - `e.long_message( &parser, &context )` : get the error message as `String`, in a detailed format
-//!  - `e as Display` : briefly print the short message through `Display` trait.
-//!
-//! The `long_message` function requires the reference to the parser and the context.
-//! It will make a detailed error message of what current state was trying to parse, and what the expected terminal symbols were.
-//! ### Example of long_message
-//! ```text
-//! Invalid Terminal: *. Expected one of:  , (, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
-//! >>> In:
-//! 	M -> M * • M
-//! >>> Backtrace:
-//! 	M -> M • * M
-//! >>> Backtrace:
-//! 	A -> A + • A
-//! >>> Backtrace:
-//! 	A -> A • + A
+//!
+//!
+//! ## GLR Parser
+//! The GLR (Generalized LR parser) can be generated by `%glr;` directive in the grammar.
 //! ```
+//! // generate GLR parser;
+//! // from now on, shift/reduce, reduce/reduce conflicts will not be treated as errors
+//! %glr;
+//! ...
+//! ```
+//! GLR parser can handle ambiguous grammars that LR(1) or LALR(1) parser cannot.
+//! When it encounters any kind of conflict during parsing,
+//! the parser will diverge into multiple states, and will try every paths until it fails.
+//! Of course, there must be single unique path left at the end of parsing (the point where you feed `eof` token).
+//!
+//! ### Resolving Ambiguities
+//! You can resolve the ambiguties through the reduce action.
+//! Simply, returning `Result::Err(Error)` from the reduce action will revoke current path.
+//! The `Error` variant type can be defined by `%err` directive.
+//!
+//! ### Note on GLR Parser
+//!  - Still in development, not have been tested enough (patches are welcome!).
+//!  - Since there are multiple paths, the reduce action can be called multiple times, even if the result will be thrown away in the future.
+//!     - Every `RuleType` and `Term` must implement `Clone` trait.
+//!  - User must be aware of the point where shift/reduce or reduce/reduce conflicts occur.
+//!  Every time the parser diverges, the calculation cost will increase.
+//!
+//!
 //!
 //! ## Syntax
 //! To start writing down a context-free grammar, you need to define necessary directives first.
@@ -168,8 +171,8 @@
 //! }
 //! ```
 //!
-//! `lr1!` macro will generate a parser struct with LR(1) DFA tables.
-//! If you want to generate LALR(1) parser, use `lalr1!` macro.
+//! [`lr1!`] macro will generate a parser struct with LR(1) DFA tables.
+//! If you want to generate LALR(1) parser, use [`lalr1!`] macro.
 //! Every line in the macro must follow the syntax below.
 //!
 //! Syntax can be found in [repository](https://github.com/ehwan/RustyLR/tree/main?tab=readme-ov-file#syntax).

diff --git a/rusty_lr_buildscript/Cargo.toml b/rusty_lr_buildscript/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "rusty_lr_buildscript"
-version = "0.6.0"
+version = "0.7.0"
 edition = "2021"
 license = "MIT"
 description = "buildscipt tools for rusty_lr"
@@ -14,7 +14,7 @@ categories = ["parsing"]
 proc-macro2 = { version = "1.0.86", features = ["span-locations"] }
 quote = "1.0"
 # rusty_lr_parser = { path = "../rusty_lr_parser" }
-rusty_lr_parser = "3.9"
+rusty_lr_parser = "3.10"
 # rusty_lr_core = { path = "../rusty_lr_core", features = ["fxhash", "builder"] }
-rusty_lr_core = { version = "2.4", features = ["fxhash", "builder"] }
+rusty_lr_core = { version = "2.5", features = ["fxhash", "builder"] }
 codespan-reporting = "0.11"
diff --git a/rusty_lr_core/Cargo.toml b/rusty_lr_core/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "rusty_lr_core"
-version = "2.4.0"
+version = "2.5.0"
 edition = "2021"
 license = "MIT"
 description = "core library for rusty_lr"