Classify apps to specific categories according to their descriptions. There are 20,104 apps with descriptions. Tf-idf value are extracted and pre-processed.
External libraries for classification are not allowed.
Implements Naive Bayes through python3 and reaches the average accuracy rate of 52.25%(tenfold cross validation).
Stage | Time |
---|---|
Training | 256s |
Judgement | 702s |
Item | Specification |
---|---|
Processor | 2.7GHz Intel Core i5 |
Memory | 8GB 1867 MHz DDR3 |
USYD 2017S1 COMP5318 Asignment 1
- restore training dataset into the input folder
- execute main.py under the algorithm folder with python3 interpreter
- check output folder, the predicted labels.csv will be created after the program is nished
- the code of ten-fold cross validation is saved in experiment.py. if this python script is executed, an output of average confusion matrix will be printed in console.
- this program is written in pycharm community version but same IDE is not required to execute the submitted version of program