-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d634236
commit a5e5146
Showing
12 changed files
with
488 additions
and
17 deletions.
There are no files selected for viewing
Binary file not shown.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,372 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from IPython.core.display import HTML\n", | ||
"with open('../style.css') as file:\n", | ||
" css = file.read()\n", | ||
"HTML(css)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Testing an <span style=\"font-variant:small-caps;\">Antlr</span> Grammar via `grun`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In order for the examples using <span style=\"font-variant:small-caps;\">Antlr</span> to work, \n", | ||
"we first have to install <span style=\"font-variant:small-caps;\">Antlr</span>. This can be done by executing \n", | ||
"the following commands in an *Anaconda environment* that has been activated:\n", | ||
"```\n", | ||
"conda install -y -c conda-forge antlr4-python3-runtime\n", | ||
"conda install -y -c conda-forge antlr\n", | ||
"```\n", | ||
"Alternatively, you can download https://www.antlr.org/download/antlr-4.13.1-complete.jar. I will assume that this `.jar`file is \n", | ||
"stored in the directory `/usr/local/lib/`. Furthermore, I assume that both a *java runtime*\n", | ||
"and a *java compiler* are available. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!conda install -y -c conda-forge antlr4-python3-runtime" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!conda install -y -c conda-forge antlr" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Our grammar is stored in the file `Expr.g4`. In order to inspect it, we use the command line tool `cat`. This will work with MacOs and Linux. On Windows,\n", | ||
"either use the power shell, which understands `cat`, or use the command `type` instead. The option `-n` of `cat` provides numbered output." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!cat -n Expr.g4" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Note that this grammar does not contain any *embedded actions*. \n", | ||
"Hence we cannot compute anything with it. We will only be able to \n", | ||
"check whether a given string is generated by this grammar. We can generate both the scanner and the parser using the following command:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!antlr4 -Dlanguage=Python3 Expr.g4" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!ls -l" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The files `ExprLexer.py` and `ExprParser.py` contain the generated scanner and parser, respectively.\n", | ||
"If we want to test the parser in this notebook, we have to import these files." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from ExprLexer import ExprLexer\n", | ||
"from ExprParser import ExprParser" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Of course, we also have to import `antlr4`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import antlr4" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now we are able to parse a string. The function `parser_string` takes the string `s` as its argument and checks,\n", | ||
"whether this string can be parsed as an arithmetic expression. This is done in five steps:\n", | ||
"- The string is converted into an `antlr4.InputStream`.\n", | ||
"- The input stream is converted into a lexer.\n", | ||
"- The lexer is converted into an `antlr4.CommonTokenStream`.\n", | ||
"- The token stream is converted into a parser.\n", | ||
"- The parser tries to parse with `start` symbol." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def parse_string(string): \n", | ||
" inputStream = antlr4.InputStream(string)\n", | ||
" lexer = ExprLexer(inputStream)\n", | ||
" tokenStream = antlr4.CommonTokenStream(lexer)\n", | ||
" parser = ExprParser(tokenStream)\n", | ||
" parser.expr()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"parse_string('1 + 2 * 3 - 4')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"As there is no syntax error, the string `'1 + 2 * 3 - 4'` adheres to the specification given by our grammar.\n", | ||
"Lets try a string that is not generated by our grammar." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"parse_string('1 + 2 * 3 ** 4')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"As the operator `**` is not supported by our grammar, we get a *syntax error* at the \n", | ||
"last occurrence of the character `*` in the given string. \n", | ||
"Note that the column count starts at `0`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"parse_string('1 < 2')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This time we get a *lexical error* as the character `<` is not a legal token." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can also generate a *parse tree* with our grammar. However, for this to work <span style=\"font-variant:small-caps;\">Antlr</span>\n", | ||
"first has to generate a `java` parser. Hence we have to call `antlr4` again, but this time with `Java` as the target language." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!java -jar /usr/local/lib/antlr-4.13.1-complete.jar -Dlanguage=Java Expr.g4 " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This command has generated some files for us that contain a both a lexer and a parser. \n", | ||
"However, this time these are `.java`-files." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!ls -l *.java" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We have to compile the generated `.java` files. Below, you might have to change the path to \n", | ||
"the file `antlr-4.8-complete.jar` to make this work." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!javac -cp .:/usr/local/lib/antlr-4.13.1-complete.jar *.java" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Next, we can start the so called *TestRig* to generate and display the <em style=\"color:blue\">parse tree</em> for a given string." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!echo \"1+2*3-4\" | java -cp .:/usr/local/lib/antlr-4.13.1-complete.jar org.antlr.v4.gui.TestRig Expr expr -gui" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Let us clean up the working directory." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!ls" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!rm *.py *.tokens *.interp *.java *.class\n", | ||
"!rm -r __pycache__/" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!ls -l" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.5" | ||
}, | ||
"varInspector": { | ||
"cols": { | ||
"lenName": 16, | ||
"lenType": 16, | ||
"lenVar": 40 | ||
}, | ||
"kernels_config": { | ||
"python": { | ||
"delete_cmd_postfix": "", | ||
"delete_cmd_prefix": "del ", | ||
"library": "var_list.py", | ||
"varRefreshCmd": "print(var_dic_list())" | ||
}, | ||
"r": { | ||
"delete_cmd_postfix": ") ", | ||
"delete_cmd_prefix": "rm(", | ||
"library": "var_list.r", | ||
"varRefreshCmd": "cat(var_dic_list()) " | ||
} | ||
}, | ||
"types_to_exclude": [ | ||
"module", | ||
"function", | ||
"builtin_function_or_method", | ||
"instance", | ||
"_Feature" | ||
], | ||
"window_display": false | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
grammar RegularExpressions; | ||
|
||
regExp returns [result] | ||
: e=regExp '+' p=product {$result = ('+', $e.result, $p.result)} | ||
| p=product {$result = $p.result } | ||
; | ||
|
||
product returns [result] | ||
: p=product '⋅' f=factor {$result = ('⋅', $p.result, $f.result)} | ||
| f=factor {$result = $f.result } | ||
; | ||
|
||
factor returns [result] | ||
: f=factor '*' {$result = ('*', $f.result) } | ||
| '(' e=regExp ')' {$result = $e.result } | ||
| c=LETTER {$result = $c.text } | ||
; | ||
|
||
LETTER : [a-zA-Z]; | ||
WS : [ \t\n\r] -> skip; |
File renamed without changes.
Oops, something went wrong.