Our Models Recognition is really Bad

ddiaz · December 4, 2023, 7:36pm

Greetings @TRATOON,

I looked at your team workspace in order to provide some comments regarding your training:

I’m glad that you decimated your video (to cut down on frames), but your video resolution (1920x1080) is very high for the ultimate model resolution (300x300). You will likely be using 640x480 for the video resolution on the robot, unless you are modifying the default resolution (and if you are, WHY?, it just literally takes longer because of the mountains of data you’re now forcing the system to process). If you’re not modifying the default resolution, consider taking video at a smaller resolution - the effects of scaling will be less severe. Taking video at this resolution almost completely negates small objects, for example in Frame 62 and 373 of video “2023120303_Crown_Props_V3” the camera is just way to far away to be of any use to the model. The model will have difficulty training on those objects.
Your video labeling (“2023120303_Crown_Props_V3”) seems rushed - you have several frames with unlabeled objects without setting the image to ignore. This will cause big problems with training, it will have a hard time settling on training parameters. If you have objects in the frame that are actually supposed to be recognized but you don’t label them, the model thinks it’s made a mistake and will cause the training to be poor. For some examples, Look at frames 286, 273, 129, 514, 576, and others.
Motion blur is a huge problem in your videos. Consider taking video REALLY slowly, and then just decimate (change the fps) the video afterwards using a free tool (looks like you’re skilled at that already). I find that when I take video I try to get the cleanest, crispest images for my model, and let TensorFlow handle motion blur in camera later. You don’t have to “train” the model to recognize motion blur, if anything it hurts the model.
If you look at your training, you can see that the training still has a ways to go for your model. When I see artifacts like what you’re showing, it usually tells me one of two things - (A) The model still has more training to go, or (B) what your model is trained for is conflicting with what the model is seeing. Your training images need more “mat puzzle pieces” if you’re trying to ignore them, because the “puzzle piece connections” on the mat is clearly conflicting with the top of your “crown.” You do a good job of showing edges of the red and blue mats (which I assume are being used to differentiate between the objects, to train TensorFlow “not all red and blue objects are the crown”) but you don’t do the same for the gray tiles. I can see your training data Loss graph (look at the TRAINING Loss graph, NOT the Evaluation Loss graph) looks like below. You want to keep training until the TRAINING loss ideally reaches below 0.2 - though to be honest I train everything with 3,000 steps - sure, it takes extra time, but the training metrics have proven time and time again that 3,000 steps is a “magic number” for FTC-ML training:

Lighting differences will hurt you. The training video you have is lit incredibly well, but your sample you show in your original post is lit incredibly poorly. TensorFlow has no idea about color - nope, none, don’t even try. Color is just a red herring (haha). TensorFlow will use CONTRASTS to differentiate objects. What you’ve done in the training data is teach TensorFlow that lightly contrasted objects are to be ignored (the well-lit gray tiles) but the darker contrasted objects are, well, objects. Then in your samples you have a very poorly lit room. The gray tiles are a MUCH darker contrast, and so TensorFlow seems to be confusing those tiles with the object. Your training data needs to represent the full breadth of lighting conditions that your model will be used in.
Given (5), and that in CENTERSTAGE there is no physical need to differentiate Blue or Red props, I recommend labeling all props the same. Allow the differing contrasts to let TensorFlow understand that the props can be a range of contrasts. You still need to have differing lighting conditions so that TensorFlow doesn’t use Contrasts as the ONLY differentiator, which it will do if it thinks it can get away with it (from the training data).

If you have more questions, let me know. Good luck!

-Danny