This is my code for training a two-stage system (YOLOv7 Object detector+ OCR model) for the "Benetech - Making Graphs Accessible" kaggle competition. I got 20th place in the competition.
My approach to the problem involves two main steps: object detection and Optical Character Recognition(OCR)
I trained a yolov7 model to detect the x-axis labels, y-axis labels, the chart bounding box, and the data points on the chart. The coordinates of the data-points were not provided in the dataset. I was able to accurately compute the position of the datapoints on the chart images by linear interpolation of the x-axis and y-axis tick coordinates with respect to the x-axis/y-axis labels (values). During inference, I inverted this process by calulating the data-series from the linear interpolation of the values of x-axis and y-axis labels with respect to the co-ordinates of the data points
This approach also works relatively well for scatter plots, compared to other approaches like Donut.
Some notes:
- This object detection model was also used as the chart-type classification model.
- There are overlapping bounxing boxes for some x-axis labels like the image below. However, my OCR model was able to extract the correct text despite the input image including text from neighbour bboxes.
Using the EasyOCR libary, I trained a ResNet(feature extractor)+BidirectionalLSTM model with Connectionist Temporal Classification(CTC) loss. The additional dataset improved the accuracy of the OCR model by about 5% from 84% to 89%.
After receiving the bounding boxes from the model, I performed some post-processing based on some simple heuristics like: removing the data points that lie outside the chart bbox, restricting x-labels(y-labels for horizontal-bar) to lie under the chart bbox, and restricting y-labels(x-labels for horizontal-bar) to the left side of the chart bbox. Also, the x/y axis tick coordinates are calculated using the x/y-axis bbox, and the chart bbox. I use the nearest point that lies on the chart bbox from the center of the x/y label bbox as the respective x/y tick coordinate. I chose this approach because the precision and recall of the x/y labels was higher than the x/y axis ticks in an older version of the model.
I participated in this competition only for the last 4 weeks. So, due to lack of time, I wasnt able to try out other approaches like Donut . I think there is a lot of room for improvement for this model. For example,about 25% of the predictions made by the model automatically get scored 0, because of mismatching number of predictions. This mismatch is due to only 1 or 2 points for charts besides scatter plot.
First, download the following three datasets: Competition dataset ICPR 2022 CHART-Infographics UB PMC Training Dataset ICPR 2022 CHART-Infographics UB-Unitec PMC Testing Dataset
Run the following three notebooks to create the datasets for both detection model, and OCR model
OCR dataset:
create_OCR_dataset_COMPETION+ADDITIONAL_DATA.ipynb
Yolov7 dataset with additional data:
create_yolo_dataset_ADDITIONAL_DATA.ipynb
Yolov7 dataset with competition data:
create_yolo_dataset_COMPETITION_DATA.ipynb
Train yolov7 model
cd yolov7
bash train.sh
Train OCR model
cd OCR
bash train.sh
The checkpoints of the model that I trained are listed in the following directories:
yolov7:
yolov7/runs/train/yolov7-custom-sgd
OCR:
OCR/saved_models/en_filtered_old