We love the video frame auto extraction feature and the OpenCV assistance of label tracking. It simplified the heavy lifting of data tagging.
However, we believe there’s a limit of 10 labels in total per frame. And when it’s reaching 10 labels, we cannot drag any exiting bounding box from top left nor bottom right to fine tune the range on the fly when paused. We believe it’s a potential UI bug.
You are correct, there is a limit of 10 bounding boxes within the frame. To see all of the limits that we currently have, see Section 7.2 part 8 in the manual, I tried to list all of the limits that I could put my finger on. Thanks for the bug report, not being able to fine-tune existing bounding boxes once you reach the limit was not intended.
One thing to note about using the the table to adjust values is that as of now it allows you input coordinates that are not on the image. For example in a 1920x1080 video you are currently able to select (4000, 4000) as a coordinate for the bounding box. This is not an actual point on the image but it would only cause you errors later. This is a known bug that has been reported. For now just be careful not to enter points that don’t exist.
Something I noticed when looking at your training frame is that the end result might not work out. The way TensorFlow Object Detection(TFOD) works your input images are actually scaled down to in the case of FTC-ML either 320x320 or 640x640. You will most likely end up using 320x320 because 640x640 will give an unbearably low fps. If you can imagine shrinking this image down to 320x320 you can see how this is just going to turn into a blurry yellow mess. While I can’t say for sure I highly suspect that this will mean that detection will be at best suboptimal.
Thank you Uday. Before using FTC-ML, we’ve trained our own customized model in Google Cloud with an accuracy around 83%, where we also noticed input down-sampling would impact the results.
We’ll probably compare the results with another video sample center cropped by 320x320 maybe. Only waiting for the training to be available…
btw: any hint on which TF model is used for the training? In previous version of ML toolchain we noticed some MobileNet-SSD base model was provided. Our team evolved that to faster models and we believe higher fps should enable more potential than a static image for just randomization.