C++ implementation of K-nearest neighbors using MPI. This project is implemented as a part of a homework exercise for [050] - Parallel & Distributed Systems course of ECE Department, AUTh.
- CMake
- Make
- g++
- OpenMPI
- OpenBLAS
- pkgconf (Optional)
To install them on variant Linux distributions follow the instructions below
$ sudo dnf upgrade --refresh # updates installed packages and repositories metadata
$ sudo dnf install cmake make gcc-c++ openmpi python3 \
openmpi-devel openblas openblas-devel pkgconf python3-pybind11
# replace ${arch} with your CPU architecture eg .x86-64, aamd64
$ module load mpi/openmpi-${arch}
# updates installed packages and repositories metadata
$ sudo apt-get update && sudo apt-get upgrade
#installs dependencies
$ sudo apt-get install cmake make g++ libopenmpi-dev \
libopenblas-dev pkg-config python3 python-pybind11
-
Clone the repository
$ git clone [email protected]:pkarakal/knn.git
-
Go to that directory
$ cd knn/
-
a. Generate Makefiles from the CMakefiles.txt
$ cmake -S .
b. There are additional cmake files that import OpenBLAS and OpenMPI and change the compiler to use the OpenMPI g++ compiler and there is also the option to include ClangTidy support. To enable the build of the executable that leverages OpenBLAS run
$ cmake -S . -DENABLE_OPENBLAS=ON
To enable the build of the executable that leverages OpenMPI run
$ cmake -S . -DENABLE_OPENMPI=ON -DCMAKE_CXX_COMPILER=$(command -v mpic++)
mpic++
is a wrapper for g++ that already utilizes the corrent compiler and linker flags. Alternatively, if you don't want to change the compiler for all the executables and you already havempi
in your path, you can apply the patch present in the git tree which adds a custom cmake target that builds the executables in the same way but is more verbose$ git apply ./cmake.patch $ cmake -S . -DENABLE_OPENMPI=ON
To enable Clang Tidy support use the following flag
$ cmake -S . -DENABLE_CLANGTIDY=ON
-
Build and run the application
$ cmake --build . && ./knn_v${variant}
-
To execute multiple instances of the program on your current run-tme environment you can use
mpirun
command like so$ mpirun -n <number of instances> ./knn_v${variant}
-
There is also a converter program that converts files in dictionary of keys format
(col:data)
to csv excluding the "Label" column. This is calls python under the hood for reading and converting to csv. To compile it run$ pip3 install -r requirements.txt $ cmake -S . -DENABLE_PYBIND11=ON $ make converter
To run it use the following format:
$ ./converter /path/to/file # or for multiple files $ ./converter [ /path/to/file1 /path/to/file2 /path/to/fileN ]
Make sure you have a python >=3.6 interpreter installed and in your PATH.
-
There is, finally, a preprocessor for CSV files that removes headers from them, and selects only numeric data which it converts to
float64
. When item is NAN, it generates a random number for it using a python lambda. To compile it run$ pip3 install -r requirements.txt $ cmake -S . -DENABLE_PYBIND11=ON $ make preprocessor
To run it use the following format:
$ ./preprocessor /path/to/file <rows_to_skip> "<delimeter>" # or for multiple files $ ./converter /path/to/file1 <rows_to_skip> "<delimeter>" \ /path/to/file2 <rows_to_skip> "<delimeter>" \ /path/to/fileN <rows_to_skip> "<delimeter>"