Here Spacy is used for training ner.
Please Read docs/Training Documentation.md to get details about
- setup venv
- pip install -r requirements.txt
- python -m spacy info
run the following command from project root
- -f for training file
- -d for model output directory
python app/train_ner.py -f lifebit-nlp-data/train.pkl -d app/model/
I have saved my local model in the 'best_model' directory and colab model google_colab_model directory
run the following command from project root
- -f for validation data file
- -d for from where to load the saved model
python app/evaulate_ner_model.py -f lifebit-nlp-data/test.pkl -d app/best_model/
or
python app/evaulate_ner_model.py -f lifebit-nlp-data/test.pkl -d app/google_colab_model/
if you want run ner from scratch and evaluation
python app/train_ner.py -f lifebit-nlp-data/train.pkl -d app/model/
python app/evaulate_ner_model.py -f lifebit-nlp-data/test.pkl -d app/model/
your current ner training model will be saved in 'app/model'.
.
├── app # all the source code to run NER, test the model and preprocessing
├── docs # Documentation details about model and training
├── lifebit-nlp-data # train and test data
├── test # unit test of the preprocessing script
├── venv # python
├── .gitignore # gitignore
├── requirments.txt # all dependencies list
├── Train_NER___lifebit.ipynb # notebook created for google colab
└── README.md # project documenation
.
├── ...
├── app
│ ├── __init__.py
│ ├── train_ner.py # script to run custom training on the biomedical data using spacy
│ ├── evaulate_ner_model.py # script to test the performance on the unseen data
│ ├── tuple_to_spacy_converter.py # converts training data to spacy formatted training data
│ ├── best_model # saved best model ( manual save)
│ ├── google_colab_model # model that trained on google colab
│ └── model # saved model directory for future run
└── ...
.
├── ...
├── test
│ ├── __init__.py
│ └── test_tuple_to_spacy_converter.py # Unit tests for data processing
└── ...
.
├── ...
├── lifebit-nlp-data
│ ├── test.pkl # test data file
│ └── train.pkl # training data file
└── ...
.
├── ...
├── docs
│ ├── Training Documentation.md # Brief discussion about Training
└── ...