Early training bias due to random sampling with uneven # from sites or cameras #34

abfleishman · 2018-10-01T15:46:02Z

Another thought that I have been pondering with the active-learning pipeline is how to avoid biasing your detector to the types of images that you labeled first? For instance, say I have 10 cameras and each camera has taken a different number of images from 1000 to 100,000 over a 1-year deployment. If you label 100 randomly selected images to start with, the majority will be from the camera that took the most images, and maybe the background in that camera is distinct. if you train a model with those initial 100 images, it may be highly biased toward detecting things in images from that camera (because of some characteristic of those images). Images from other cameras might not even have detections and might not get "served" to the person tagging?
Essentially I see it as the same idea as the class imbalance, but instead, it is an imbalance in the raw data. How/does this normally get addressed in active learning?

olgaliak · 2018-10-23T15:55:33Z

How about down sampling N of images for the camera that took 100K images (vs camera that took 1K)?

olgaliak added the needs testing Workaround is provided, needs verification label Oct 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early training bias due to random sampling with uneven # from sites or cameras #34

Early training bias due to random sampling with uneven # from sites or cameras #34

abfleishman commented Oct 1, 2018

olgaliak commented Oct 23, 2018

Early training bias due to random sampling with uneven # from sites or cameras #34

Early training bias due to random sampling with uneven # from sites or cameras #34

Comments

abfleishman commented Oct 1, 2018

olgaliak commented Oct 23, 2018