Skip to content

jblondin/simpledot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimpleDOT Language Parser

Goal

Write a parser for a subset (which we'll call SimpleDOT) of the GraphViz DOT Language. The intention to parse input of this SimpleDOT language into an in-memory structure for visualization purposes. This will open the way for additional libraries to actually perform the visualization.

Status

Work in progress. Currently most work is going into the initial step: parsing the data into an intermediate representation that can then be parsed into the actual syntax tree.

DOT Language Grammar

The original DOT Language is defined as

Name Rule
graph [ strict ] (graph | digraph) [ ID ] '{' stmt_list '}'
stmt_list [ stmt [ ';' ] stmt_list ]
stmt node_stmt
edge_stmt
attr_stmt
ID '=' ID
subgraph
attr_stmt (graph | node | edge) attr_list
attr_list '[' [ a_list ] ']' [ attr_list ]
a_list ID '=' ID [ (';' | ',') ] [ a_list ]
edge_stmt (node_id | subgraph) edge_rhs [ attr_list ]
edge_rhs edgeop (node_id | subgraph) [ edge_rhs ]
node_stmt node_id [ attr_list ]
node_id ID [ port ]
port ':' ID [ ':' compass_pt ]
':' compass_pt
subgraph [ subgraph [ ID ] ] '{' stmt_list '}'
compass_pt (n | ne | e | se | s | sw | w | nw | c | _)

Where ID is one of the following:

  • Any string of alphabetic ([a-zA-Z\200-\377]) characters, underscores ('_') or digits([0-9]), not beginning with a digit;
  • a numeral [-]?(.[0-9]⁺ | [0-9]⁺(.[0-9]*)? );
  • any double-quoted string ("...") possibly containing escaped quotes (\")¹;
  • an HTML string (<...>).

SimpleDOT Language Grammar

The SimpleDOT language is a small subset of the full DOT language intended to limit support to what is needed to provide descriptions of fairly simple tree structures. Any graph definition defined in SimpleDOT should be parseable and renderable by standard DOT parsers and renderers.

Name Rule
graph [ strict ] (graph | digraph) [ ID ] '{' stmt_list '}'
stmt_list [ stmt [ ';' ] stmt_list ]
stmt node_stmt
edge_stmt
attr_stmt
ID '=' ID
subgraph
attr_stmt (graph | node | edge) attr_list
attr_list '[' [ a_list ] ']' [ attr_list ]
a_list ID '=' ID [ (';' | ',') ] [ a_list ]
edge_stmt node_id edge_rhs [ attr_list ]
edge_rhs edgeop node_id [ edge_rhs ]
node_stmt node_id [ attr_list ]
node_id ID

Where ID is the same as in the full DOT language with the exception of HTML strings, which are excluded for simplicity reasons.

Otherwise, at a purely language grammar level, the only real change is the removal of the subgraph, port, and compass_pt constructs.

SimpleDOT Supported Attributes

A more significant change between the full DOT langauge and SimpleDOT is a significantly limited attribute set, as described by the following table. The Used By colum uses the characters E, N, and G to denote whether the attribute applies to edge, node, or graph, respectively. The Type column refers to attribute types in full DOT language definition.

Name Used By Type Default
bgcolor G color,colorList <none>
color EN color,colorList black
comment ENG string ""
fontcolor G color black
fontname G string Times-Roman
fontsize G double 14.0
height N double 0.5
image N string ""
imagepos N string ""
imagescale N bool,string false
label ENG lblString "\N" (nodes), "" (otherwise)
width N double 0.75

Approach

We're going to use the nom parser combinators library to construct the language parser.

Target Data Structure

Rough draft:

enum GraphKind {
	Directed,
	Undirected
}

struct Graph {
	kind: GraphKind,
	strict: boolean,
	name: String,
	attributes: Vec<GraphAttribute>,
	component: Vec<Component>
}

enum Component {
	Edge(Edge),
	Node(Node),
	Subgraph(Graph),
	ClusterSubgraph(Graph)
}

struct GraphAttribute {
	name: AttributeName,
	value: String,
}

About

Parser for a subset of the DOT language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages