ProgQuery extracts syntactic and semantic information from source code programs and stores it in a Neo4j graph database. That information can be used for many different purposes, such as static program analysis, advanced code search, coding guideline checking, software metrics computation, and extraction of semantic and syntactic information to create predictive models.
For each compilation unit (program, project, component or API), ProgQuery extracts the following seven graph structures:
- Abstract Syntax Tree
- Call Graph
- Type Graph
- Control Flow Graph
- Program Dependency Graph
- Class Dependency Graph
- Package Graph
The previous representations are overlaid, meaning that syntactic and semantic nodes of the different graphs are interconnected to allow the combination of different kinds of information in the queries/analyses. This makes it easier to express queries that combine syntactic and semantic information
ProgQuery uses Maven to resolve the binaries' dependencies. Unless you have Maven already installed, you need to install it and follow the installation steps detailed in this link.
The next step consists of cloning the repository of ProgQuery project and generating a .jar file. Once you have cloned and downloaded the repository, please execute the following command in the local repository you have just created:
mvn clean compile package
The previous command generates the jar file ProgQuery-3.0.0.jar in Maven's default output folder target. This jar file allows you to extract information from source code programs and store it in a Neo4j graph database.
What follows is the list of commands supported by the generated jar:
java -jar ProgQuery-3.0.0.jar -user="<user_id>" -program="<program_id>" -neo4j_mode="local" -neo4j_database_path="<database_path>" "<javac_options_1>" ... "<javac_options_n>"
-user
: (Mandatory param) It specifies User ID.-program
: (Mandatory param) It specifies Program ID.-neo4j_mode
: (Mandatory param) It specifies the Neo4j mode: local or server.-neo4j_database_path
: (Mandatory param, when Neo4j mode local is used) It specifies the path to the directory where the database will be stored."javac_options_1" ... "javac_options_n"
: (Mandatory param) It specifies the options used to run the Java compiler p.e."-d .\Example\target\classes -classpath .\Example\target\classes; -sourcepath .\Example\src\main\java; -g -nowarn -target 8 -source 8"
or just"Java_File.java"
After that execution, a graph containing the abovementioned seven structures is inserted in a Neo4j graph database.
java -jar ProgQuery-3.0.0.jar -help
This command displays all the available options:
-user=<user_id>
: User id. (Short form:-u=<user_id>
).-program=<program_id>
: Program identifier. (Short form:-p=<program_id>
).-neo4j_mode={local,server}
: NEO4J mode: local or server. (Default value is server, short form:-nm={local,server}
).-neo4j_user=<user_name>
: NEO4J User name. (Default value is neo4j, short form:-nu=<user_name>
).-neo4j_password=<user_password>
: User password. (Short form:-np=<user_pasword>
).-neo4j_host=<host>
: NEO4J Host address. (Short form:-nh=<host>
).-neo4j_port_number=<port_number>
: NEO4J Port number. (Default value is 7687, short form:-npn=<port_number>
).-neo4j_database=<database_name>
: NEO4J Database name. (Default value is the -user parameter value, short form:-ndb=<database_name>
).-neo4j_database_path=<database_path>
: NEO4J Database path, when Local mode is used. (Short form:-ndbp=<database_path>
).-max_operations_transaction=<number>
: Maximum number of operations per transaction. (Default value is 80000, short form:-mot=<number>
)."javac_options_1" ... "javac_options_n"
: A list of sets of options between " separated by white spaces used to run the Java compiler several times. For each set of options, ProgQuery creates and executes one Java Compiler Task.-verbose
: Shows log info (Default value isfalse
).
[1] Oscar Rodriguez-Prieto, Alan Mycroft, Francisco Ortin. An efficient and scalable platform for Java source code analysis using overlaid graph representations. IEEE Access, volume 8, pp. 72239-72260. December 2020.
[2] Oscar Rodriguez-Prieto. Big Code infrastructure for building tools to improve software development. Ph.D. Thesis at the Computer Science Department of the University of Oviedo. Ph.D. Supervisor, Dr. Francisco Ortin. June 2020.