Custom tflite model doesn't detect any objects at all

Hi,
I’m a coach and just recently discovered the ftc-ml toolchain website and the forum here. I’m grateful both exist!
My team created a custom sleeve for PowerPlay with colorful patterns, since the documentation indicated TensorFlow is good at detecting image patterns. They have a side with stripes, checkerboard, and polka dots. They spent hours recording videos and drawing boxes around the patterns in the frames and labeling them. We trained our model… and it doesn’t recognize any objects at all. When we look at the images from the model training, it just looks like the training completely failed. For example, all of the image output looks like this.

I’m not sure what the kids did wrong. Are we supposed to draw boxes around the entire cone instead of a portion of the pattern? Did we train it for too many steps? Too few?
If any of you created a successful model for a PowerPlay sleeve, what tips do you have for us? Thank you.
Dawn
Coach, #14659 Pleiades Robotics

@ddiaz is really the expert here on generating models, but looking at the photo above, this might perhaps apply?

When creating videos for TensorFlow training, be very careful about the backgrounds being used. Machine Learning involves passing data and answers to a model, and letting the model determine the rules for detecting objects. If the background of an object is always consistent – let’s say the object is a duck on dark gray tiles – the model may include in its rules that the object must always be on a dark gray background, and will not recognize the duck on light gray tiles. In order to create a more generic model, the object would need to be found on multiple different backgrounds. “Negative Frames”, or frames that have no labels in them and are just of the background, can be used to help the model intentionally recognize “what is in the background” and that those elements of the background should be ignored; TensorFlow does this by adding the background patterns to an internal “background” label that is never shown to the user. It’s not typically necessary to include “Negative Frames”, however, unless there is content in the background that is only seen when the object is not present, and you feel it’s advantageous to ignore that content. TensorFlow and modern Machine Learning algorithms isolate portions of each frame that do not include bounding boxes and add those portions of the image to the “background” label.

Did you only partially label the pattern within the cone itself, and could TensorFlow perhaps be looking at the unlabeled portion of the pattern, thinking it’s not part of the object being detected and getting confused? e.g. The pattern is in both the label and the background.

I wondered about that. The kids took the idea that Tensorflow is good at “recognizing patterns” and ran with it. What’s the proper way to put a patterned image on the cone? If they want a checkerboard pattern… should we not fill the entire frame with a checkerboard? Just make a circle with a checkerboard in it or something? On a non-white background?

We tried to draw the boxes so they didn’t capture any of the cone or the background. Maybe we should draw a box around the whole cone?

Hey Dawn.

I looked at your videos and your models, and your model definitely is not training properly. If you look at your training metrics, the Loss/classification_loss graph tells the whole story here. Here’s what yours looks like:

image

As you can see for the first 1,000 steps the model seems okay (we’ll get to that in a second) but then as your model trains it just becomes more and more confused about what’s going on - your loss catastrophically increases, showing that the model really goes south. No, that doesn’t mean you should only train for 1,000 steps, that means we should try to correct the problem (more on that later).

As a corollary, here’s what a “good” Loss/classification_loss graph looks like:

image

Notice the value approaches zero, which is what we want (we want these values as low as we can possibly get them, but they absolutely must be under 1 in order to even remotely be reasonable). In your case, you might have started training well, but when you get a loss of tens of thousands, the model becomes unable to compensate with more training.

The issue here is definitely in how you’re labeling your videos. Let me be clear and say I absolutely LOVE your videos, you’re doing a GREAT job changing the pose (distance, size, angle, orientation) in them. However, what you’re essentially doing is confusing the heck out of the model by not providing it with a consistent object. By labeling a portion of your pattern, you’re making it impossible for the model to really see the entire pattern - this is making it impossible for the model to consistently determine what the full pattern is. TensorFlow expects to see the full “object” that you’re trying to identify (at least few times), plus some “background” material surrounding it. Currently you have this:

image

But if it’s your intent to actually recognize the patch of blue/polka, your label should REALLY be this:

image

I also noticed that you’ve got some “negative” images, but negatives for objects and negatives for images are very different from one another. I highly recommend that you read this post, it does a good job of explaining a lot of the nuances for dealing with images versus dealing with objects, and how negatives in this case works. In your case remember you DO still have a background, but the background is the WHITE PART OF THE SLEEVE plus the “scene” in the background. So when creating negatives, you CANNOT simply remove the object. That’s like a painter removing their canvas. This discussion post really explains it:

https://ftc-community.firstinspires.org/t/proper-process-for-object-detection/277/2

Also, please be sure to check out the TensorFlow for PowerPlay document on ftc-docs.

If you have additional questions after reading all this, please don’t hesitate to ask!

-Danny

Thank you so much for the detailed and thorough response - and for looking at our videos and giving feedback! That exceeded our expectations, and we’re so grateful. The team got together today and we reviewed your feedback. The kids re-drew the detection boxes in our existing videos, and we added a few videos with blue cones, and we re-created our negative-frame video with a plain unlabeled sleeve. The training data looks a lot better. We’ll test the model tomorrow and see if it works ok in real life. Thank you so much again!