Write a parser for a subset (which we'll call SimpleDOT) of the GraphViz DOT Language. The intention to parse input of this SimpleDOT language into an in-memory structure for visualization purposes. This will open the way for additional libraries to actually perform the visualization.
Work in progress. Currently most work is going into the initial step: parsing the data into an intermediate representation that can then be parsed into the actual syntax tree.
The original DOT Language is defined as
Name | Rule |
---|---|
graph | [ strict ] (graph | digraph) [ ID ] '{' stmt_list '}' |
stmt_list | [ stmt [ ';' ] stmt_list ] |
stmt | node_stmt |
edge_stmt | |
attr_stmt | |
ID '=' ID | |
subgraph | |
attr_stmt | (graph | node | edge) attr_list |
attr_list | '[' [ a_list ] ']' [ attr_list ] |
a_list | ID '=' ID [ (';' | ',') ] [ a_list ] |
edge_stmt | (node_id | subgraph) edge_rhs [ attr_list ] |
edge_rhs | edgeop (node_id | subgraph) [ edge_rhs ] |
node_stmt | node_id [ attr_list ] |
node_id | ID [ port ] |
port | ':' ID [ ':' compass_pt ] |
':' compass_pt | |
subgraph | [ subgraph [ ID ] ] '{' stmt_list '}' |
compass_pt | (n | ne | e | se | s | sw | w | nw | c | _) |
Where ID is one of the following:
- Any string of alphabetic (
[a-zA-Z\200-\377]
) characters, underscores ('_'
) or digits([0-9]
), not beginning with a digit; - a numeral [
-
]?(.
[0
-9
]⁺|
[0
-9
]⁺(.
[0
-9
]*)? ); - any double-quoted string (
"..."
) possibly containing escaped quotes (\"
)¹; - an HTML string (
<...>
).
The SimpleDOT language is a small subset of the full DOT language intended to limit support to what is needed to provide descriptions of fairly simple tree structures. Any graph definition defined in SimpleDOT should be parseable and renderable by standard DOT parsers and renderers.
Name | Rule |
---|---|
graph | [ strict ] (graph | digraph) [ ID ] '{' stmt_list '}' |
stmt_list | [ stmt [ ';' ] stmt_list ] |
stmt | node_stmt |
edge_stmt | |
attr_stmt | |
ID '=' ID | |
subgraph | |
attr_stmt | (graph | node | edge) attr_list |
attr_list | '[' [ a_list ] ']' [ attr_list ] |
a_list | ID '=' ID [ (';' | ',') ] [ a_list ] |
edge_stmt | node_id edge_rhs [ attr_list ] |
edge_rhs | edgeop node_id [ edge_rhs ] |
node_stmt | node_id [ attr_list ] |
node_id | ID |
Where ID is the same as in the full DOT language with the exception of HTML strings, which are excluded for simplicity reasons.
Otherwise, at a purely language grammar level, the only real change is the removal of the subgraph, port, and compass_pt constructs.
A more significant change between the full DOT langauge and SimpleDOT is a significantly limited attribute set, as described by the following table. The Used By colum uses the characters E
, N
, and G
to denote whether the attribute applies to edge, node, or graph, respectively. The Type column refers to attribute types in full DOT language definition.
Name | Used By | Type | Default |
---|---|---|---|
bgcolor |
G | color ,colorList |
<none> |
color |
EN | color ,colorList |
black |
comment |
ENG | string |
"" |
fontcolor |
G | color |
black |
fontname |
G | string |
Times-Roman |
fontsize |
G | double |
14.0 |
height |
N | double |
0.5 |
image |
N | string |
"" |
imagepos |
N | string |
"" |
imagescale |
N | bool ,string |
false |
label |
ENG | lblString |
"\N" (nodes), "" (otherwise) |
width |
N | double |
0.75 |
We're going to use the nom
parser combinators library to construct the language parser.
Rough draft:
enum GraphKind {
Directed,
Undirected
}
struct Graph {
kind: GraphKind,
strict: boolean,
name: String,
attributes: Vec<GraphAttribute>,
component: Vec<Component>
}
enum Component {
Edge(Edge),
Node(Node),
Subgraph(Graph),
ClusterSubgraph(Graph)
}
struct GraphAttribute {
name: AttributeName,
value: String,
}