This project is a TensorFlow reimplementation of the paper On the Automatic Generation of Medical Imaging Reports by Jing et all, published in the year 2018.
Check out the paper here!
Medical images like xrays, CTs, MRIs and other type of scans are used for diagnosis of a lot of diseases. Specialized medical professionals read and interpret these medical images. Report writing for these scans can be time-consuming, and to address this issue, we looked into automatic generation of these reports.
A medical report has three main points:
- Impressions, which provide diagnosis
- Findings, which lists all observations
- Tags, which list keywords which represent the critical information in the findings
The dataset used in the paper was the Indiana University Chest X-Ray Collection (IU X-Ray) (Demner-Fushman et al., 2015), which is a set of chest x-ray images paired with their corresponding diagnostic reports. The images were obtained from here and the reports were obtained from here.
Due to computational difficulties, we used a sample set of 1000 scans for training and 200 scans for testing, the details of which are present in the directory /data.
The architecture proposed by the paper is shown below.
The three main proposals of the paper are:
- A multi-task framework which jointly performs the prediction of tags and generation of paragraphs for reports
- Co-attention mechanism which takes visual features as well as semantic features into account
- Hierarchical LSTM model
- The zip files containing images and reports were mounted from Google Drive.
- A dataset was prepared through a data cleaning process that consists of two images per report, one frontal and one lateral view.
- Reports were extraced from .xml files and the frontal and lateral views were combined to prepare the above mentioned dataset and this was used to generate features.
- glove.840B.300d was used for obtaining vector representations and generating the embedding matrix. It is available here.
- To run the model, download the glove file and add to MedGen folder.
- Features were extracted using DenseNet121 model loaded with ChexNet weights (available here). The paper used a VGG-19 network.
- The features are available in ./features directory.
- The features were fed into a model with the following structure
- To train the model, run encoder_decoder.ipynb in root directory.
- The model was trained for 10 epochs. Due to computational difficulties, we were unable to train for more epochs and hence the model did not converge.
- Final BLEU score was 0.643
- TensorFlow 2.4.1
- Keras 2.4.3
- Numpy
- Pandas 1.1.5
- Sklearn 0.23.2
- PIL 8.0.1
- Nltk 3.5
- Matplotlib 3.3.2
- Opencv 4.5.1
- Tqdm 4.50.2
- OS
- Complete tag prediction using MLC
- Integrate semantic features in co-attention model
For any queries, please open an issue at the repository, or email any of the contributors.