European Court of Human Rights OpenData ML Classification Hyperparameter Tuning

Paper that describes full experiment is under: Project_paper.docx

It is recommendet to have an Anaconda with python 3.5 environment installed with preinstalled data science packages like: pandas, numpy, scikit-learn etc.
Dependencies:

conda install tensorflow keras

For hyperparameter tuning HpBandSter was used.

pip install hpbandster

To load the datasets ECHR-OD_loader was used with data storage environment variable named: ECHR_OD_PATH

Directory structure:

├── different_models
│   ├── results
│   │   ├── bow
│   │   │	├── dnn
│   │   │	├── extratrees
│   │   │	├── random_forest
│   │   │	├── svm
│   │   │	├── xgboost
│   │   ├── desc
│   │   │	├── ...
│   │   ├── desc_bow
│   │   │	├── ...
│   ├── all_results
│   │   ├── 1_gram/1000_tokens/1_article...
│   │   ├── 2_gram/1000_tokens/1_article...
│   │   ├── ...
│   ├── best_results.csv
│   ├── customizable_optimizer.py
│   ├── model_workers.py
│   ├── onedrive_api.py
│   ├── run_optimization.py
│   ├── store_all_results.py
│   ├── store_best.py
├── DNN
├── ExtraTrees
├── ...
├── ...

Docker image

Full working environment is available with docker image:

sudo docker run -ti amasend/echr:version2

Docker environment only includes OneDrive version of experiment. Local storage will be included in the next release.

Functionality

How to run from a repository?

Firstly create a directory that will be containing all the generated datasets.

mkdir ~/dataset_storage

Download all datasets from: (All datasets could be found on https://1drv.ms/f/s!AkC_hbDmloKIjHh80Vdqc9OUSnwX)
Create additional directory for temporary dataset storage where specific datasets will be placed during computation.

mkdir ~/data

Export environmental variable pointing a data directory for ECHR adta loader.

export ECHR_OD_PATH=~/data

Run optimization process. You can specify which model you want to evaluate, on which flavors, n_grams and tokens.

python run_optimization.py --flavors --models --n_grams --tokens --onedrive --dataset_storage

flavors - ['desc', 'bow', 'desc+bow']
models - ['extratrees', 'random_forest', 'dnn', 'xgboost', 'svm']
n_grams - [1, 2, 3, 4, 5, 6]
tokens - [1000, 5000, 7000, 10000, 30000, 60000, 80000, 100000]
onedrive - True/False (if True use OneDrive storage, if False use local storage)
dataset_storage - ~/dataset_storage (path to dataset storage)

Please have in mind that computation time for all combinations is really huge. By runnig this experiment with a default parameters you should be aware that each n_gram and token is computed separetly (one after another), only all models and flavors are multiprocessed.

Generating results plots

All components are under '/different_models' directory.
After generating all results or partial results we can make plots of that results for better understanding.

python show_results.py --results_path "path_to_results_folder" --model "model_name" --n_grams "number_of_grams" --tokens "number_of_tokens" --flavor "flavor_name" --metric "metric_name_to_display" --option "which_plot_to_display"

There are couple of methods implemented for user use and based on that, we can plot as follows:

best new results comparison per model and n-gram (after closing one plot, another will appear for the next n-gram)

python show_results.py --results_path "all_results" --model "random_forest" --metric "test_mean_c_MCC" --option "new_best_results"

old best results vs new results (old best among all models, new best only from one model), per n-gram

python show_results.py --results_path "all_results" --model "random_forest" --metric "test_mean_c_MCC" --option "old_vs_new" --n_grams 5

normalized confusion matrices (only for new results)

python show_results.py --results_path "all_results" --model "random_forest" --metric "test_mean_c_MCC" --option "confusion_matrix" --n_grams 5 --tokens 5000

computing times (run times) per flavor and per optimizer run

python show_results.py --results_path "all_results" --model "random_forest" --metric "test_mean_c_MCC" --option "computing_times" --n_grams 5 --tokens 5000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

European Court of Human Rights OpenData ML Classification Hyperparameter Tuning

Paper that describes full experiment is under: Project_paper.docx

Directory structure:

Docker image

Functionality

How to run from a repository?

Generating results plots

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
plots		plots
Project_paper.docx		Project_paper.docx
README.md		README.md
best_results.csv		best_results.csv
customizable_optimizer.py		customizable_optimizer.py
file_manager.py		file_manager.py
mermaid_graph.PNG		mermaid_graph.PNG
model_workers.py		model_workers.py
onedrive_api.py		onedrive_api.py
run_optimization.py		run_optimization.py
show_results.py		show_results.py
store_all_results.py		store_all_results.py
store_best.py		store_best.py

amasend/ECHR_hyperparameter_tuning

Folders and files

Latest commit

History

Repository files navigation

European Court of Human Rights OpenData ML Classification Hyperparameter Tuning

Paper that describes full experiment is under: Project_paper.docx

Directory structure:

Docker image

Functionality

How to run from a repository?

Generating results plots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages