Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early training bias due to random sampling with uneven # from sites or cameras #34

Open
abfleishman opened this issue Oct 1, 2018 · 1 comment
Labels
needs testing Workaround is provided, needs verification

Comments

@abfleishman
Copy link

Another thought that I have been pondering with the active-learning pipeline is how to avoid biasing your detector to the types of images that you labeled first? For instance, say I have 10 cameras and each camera has taken a different number of images from 1000 to 100,000 over a 1-year deployment. If you label 100 randomly selected images to start with, the majority will be from the camera that took the most images, and maybe the background in that camera is distinct. if you train a model with those initial 100 images, it may be highly biased toward detecting things in images from that camera (because of some characteristic of those images). Images from other cameras might not even have detections and might not get "served" to the person tagging?
Essentially I see it as the same idea as the class imbalance, but instead, it is an imbalance in the raw data. How/does this normally get addressed in active learning?

@olgaliak
Copy link
Owner

How about down sampling N of images for the camera that took 100K images (vs camera that took 1K)?

@olgaliak olgaliak added the needs testing Workaround is provided, needs verification label Oct 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs testing Workaround is provided, needs verification
Projects
None yet
Development

No branches or pull requests

2 participants