In this repository, we provide lightweight implementation of Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks.
- Python=3.6
- Pytorch==1.4.0
- Deep Graph Library
python main.py \
--dataset <name_of_the_dataset> \
--input_dir <directory_of_the_input>;
The input_dir
should contain three json files namely
train_GGNNinput.json
valid_GGNNinput.json
test_GGNNinput.json
Each json file should contain a list of json object of the following structure
{
'node_features': <A list of features representing every nodes in the graph>,
'graph': <A list of edges>
'target': <0 or 1 representing the vulnerability>
}
-
Let's assume
n
nodes in the graph are indexed as0
ton-1
. The length ofnode_features
list should ben
. Each feature vector should be 100 elements long. Thus thenode_features
list should be a 2D list of shape(n, 100)
. -
The length of
graph
list should be the number of the edges. Each edge should be represented as a three element tuple[source, edge_type, destination]
. Where thesource
anddestinations
are indices of corresponding node innode_features
list. Edge types should be from0
tomax_edge_types
.
- In this implementation, we followed Devign's paper. We could NOT recreate the result in the original paper though.
[1] Zhou, Yaqin, et al. "Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks." arXiv preprint arXiv:1909.03496 (2019).