Hello! I’m trying to train a model for two custom props, a red and blue one, and I’m struggling with getting the red prop trained. For some reason, the loss values dramatically increase as I continued training, and in the end the model wasn’t able to recognize the prop.
This is especially strange because we had success with the blue team prop. The loss values look good, and the evaluations labeled by TensorFlow were able to correctly identify the prop. I can’t think of any differences in how we recorded the training data. I’d post the loss graph for that, but since I’m a new user I can’t post multiple images yet.
Would anyone have any ideas as to why this might happen, especially with such similar datasets producing completely different training results? I’m aware that using videos where the team prop is closer might help TensorFlow detect it, but I figured that we could get away with just a few static positions and randomized rotations, since the robot only needs to detect it once from a static starting location. And this method worked for the blue prop, so I’m confused about why it might not work for the red prop.
I took a look at your team’s workspace within ftc-ml, and the issue I think you’re running into is that your object(s) are not dissimilar enough to the training engine. What you’ve got is:
- You have an object in color BLUE and in color RED that is about 100 pixels wide by 100 pixels tall in a 720p image. When that gets scaled down to 300x300 for the model, that object is about 30 pixels by 40 pixels. While it might initially make sense to capture the objects the exact distance from the camera that it will be in real life, what you want to provide the model is AS MUCH INFORMATION AS YOU CAN, and the model will then “learn” the object, and it will then be able to recognize it (as much as it can) far away. For the model, that’s the first handicap.
- Your objects have terrible lighting. Even though that might be realistic with the way the model will be in real life, the model itself cannot differentiate any recognizable patterns from the object. So all it really has is the outline of the shape to go on.
- TensorFlow Object Detection does not care about colors. Like, at all. Unless the color is a pattern in itself (like a zebra to differentiate a horse from a zebra) TensorFlow doesn’t care. At all. A brown horse, a black horse, a white horse, are all just horses to TensorFlow. TensorFlow is looking for patterns, not overall object qualities.
- You have WAAAAY too many repeating images. You have individual “still” videos at 30fps for literally a single object orientation - that means you’re training the model on ~70’ish of the exact same image, no variation. That creates a dataset that is extremely too homogenous, which is very bad for training. Essentially what’s happening is that you’re getting the exact same training images as test images in most cases, so the model has an extremely hard time learning because the VAST MAJORITY of the images have way too much “confirmation bias” - the model looks at an image, and makes a decision (like “assume all objects of this type have a
blah”). The next test image is the exact same image it just trained with, so of course it says "oh, hey, my decision must be right because that has a
blah too. And so on for the next couple images. Then it sees a new image, and says “wait, I’m confused now, what happened to my
blah”? Oh, I must need to retrain completely. It takes too many passes through the dataset to “settle” on what parameters to use, and each pass it keeps getting more and more confused.
- Because of (1), (2), (3), and (4) your red and blue objects are essentially THE EXACT SAME OBJECT. So now you’re showing two different “objects”, but they look exactly the same. The model is now getting confused because you’re saying “This is a RED” but then the next image says “this is a BLUE” but the model is saying “wait, that was the same object, what’s going on here? I cannot tell the difference.”
So the model worked fine for one object because it didn’t have much to compare to, single objects are generally easier to train anyway. But now you have more than one object with conflicting data - extremely small objects with very little discernable patterns that have nearly the exact same shape. The model cannot seem to figure out how to distinguish the two.
My honest recommendations would be:
- Use a different shaped object for RED and BLUE (or, just label them BOTH as a single label, and just say “this is my object, dude, it can be either red or blue who cares?”).
- Make your videos pan around the object to show more than one frame of reference.
- Get way closer, keep the object in at LEAST 50% of the frame.
- Use a free online tool to decimate your videos (change the fps to like 2 or 3 for still videos).
If you have any more questions, please don’t hesitate to ask!