Automatically assign semantics to large data sets from heterogeneous sources based on their features using several Statistical and Machine Learning techniques.
- Elasticsearch
- Pyspark
- scikit-learn
- pandas
- Build docker image
cd container; docker build -t isi/semantic-labeling .
- Start elasticsearch:
docker-compose up
- Calling API
bin/semantic_labeling.sh <train_dataset> <test_dataset> <train_dataset2>