Evaluating a community detection algorithm is a complex task due to the lack of a shared and universally accepted definition of community. In literature, one of the most common way to assess the performances of a community detection algorithm is to compare its output with given ground-truth communities by using computationally expensive metrics (i.e., Normalized Mutual Information). We proposed a novel approach aimed at evaluate the adherence of a community partition to the ground truth: our methodology provides more information than the state-of-the-art ones and is fast to compute on large-scale networks. By defining a classification problem on the real community label we compute an average F1-score that captures the level of approximation reached by network partitions obtained through community discovery algorithms w.r.t. ground-truth ones. Moreover, our approach allows for a visual inspection of the partition quality exploiting density scatter plots.
N.B.: NF1 can be used also to measure the resemblance of partitions obtained applying different algorithms to the same graph.
If you use our algorithm please cite the following works:
Rossetti, Giulio; Pappalardo, Luca; Rinzivillo, Salvatore "A novel approach to evaluate community detection algorithms on ground truth" 7th Workshop on Complex Networks, Springer-Verlag, 2016.
The NF1 score can be computed using a standalone interface as well as within a python script.
To install the nf1 package use pip
pip install nf1
The following command line has to be used from the shell:
NF1_evaluate community_file ground_truth_file [--plot output_filename] [--maxpts points_to_plot] [--title plot_title]
Where:
community_file: is the file containing the community identified by a Community Discovery algorithm;
ground_truth_file: is the file containing the ground-truth community;
output_filename: defines the plot filename (optional, if not specified the plot will not be generated);
points_to_plot: defines the number of points to plot (optional – if not specified all the points will be plotted);
plot_title: defines the plot title (optional).
The script expects as minimal inputs two text files containing community descriptions. Each line of the input files should identify a single community as a list of node ids (the accepted separator are: tab, space, comma).
Community file example:
(0,1,2,3)
(5,6,7,8)
(1,6,3)
Python code example
from nf1 import NF1
import networkx as nx
from networkx.algorithms import community
g = nx.karate_club_graph()
kclique = list(community.k_clique_communities(g, 4))
kcoms = [tuple(x) for x in kclique]
lp = list(community.label_propagation_communities(g))
lpcoms = [tuple(x) for x in lp]
# Computing the NF1 scores and statistics
nf = NF1(lpcoms, kcoms)
results = nf.summary()
print(results['scores'])
print(results['details'])
# Visualising the Precision-Recall density scatter-plot
nf.plot()
In order to avoid biased evaluations be sure to comply with the following rules:
- A community must be composed by at least 3 nodes.
- No nested communities: a community must not be a proper subset of another community.