Train custom model - outcome not consistent

There are multiple objects that we tried to train in fmltc. Two models were trained successfully (under the “image” tab, each of 10 images shown the object is boxed on both sides, assume one side is the manual boxed and the other is the test result for validation purpose). The graphs look normal as the learning rate approaching to the top and loss rate descending to the bottom.

However, I have trouble training the third object although the background including lighting were very similar to the first two. I made another couple videos and still resulted with failed model. The graphs have the opposite directions when comparing to the first two models. Images contain only one box and I assume tensor flow training failed because it cannot recognized the object in test image. The following are some setting details and please instruct and advise what might be the reason:

1)There are two videos shooting from different directions of same object. Both were when robot coming forward and one is when the object is on the right side and the other is the object located at left side. The webcam was placed on robot in slow movement included moving forward and backward, turning to different angles, in various distances (close ~ 5-8 inches, far ~ 8-12 inches ) . I checked the frames after finished tracing, and the frames are clear without blurriness.

  1. A white board is placed outside of the field so the background is pretty simple in both videos.

  2. The field is located in basement, all ceiling lights were on and no adjustment in lighting in both videos.

  3. I combined the two datasets for each object and started training. The first two success models each had ~ 500 frames and ~800 frames. The failed model contain close to 800 frames. I assume the number of frames does not matters in this case, since when compare the complexity, the failed model was simpler than the other two.

  4. The rate of frames per second on webcam was set to 15 frames/second. The total length of each video was 30-35 seconds.

  5. Training model type is SSDMobileNet V2 320X320.

  6. The training steps were ranged from 110-150. Each adjustment was based on the epochs number would ended around 100.

  7. The object in the images was with tilted angles most of time due to the turns. This is not an issue in first two models.

So far these are the information that I can think of. I don’t know if there are more reasons that can generate this inconsistent outcome of datasets training. Thank you.

Since you said fmltc, I presume this is your own instance of fmltc? There are lots of reasons what you said can happen, mostly due to inconsistent labeling. Can you provide me login details into your instance, or provide me with your team number if you’re using ftc-ml? Otherwise, I’ll probably spend two hours shooting from the hip and totally not hit the target. If I can see your videos and labeling, I might be able to help you.


It is our own instance of fmltc. I use tracking for labeling. I did double check the label afterwards to minimize the blurry images. Our team number is 10138. The good models are the xxx_all_5 and xxx_all_4, the failed one is xxx_all_3. Thank you so much!

@ddiaz needs to be able to inspect your videos and labeling which means he needs to be able to login to your instance. You can PM him those details, but otherwise, if you are using a privately hosted instance, there’s a limited amount of analysis that can be done as your training datasets are invisible to the people who host and maintain ftc-ml.

@laubarbara3 I’m afraid @crabigator is absolutely correct, when you created your own instance of fmltc you made it infinitely more difficult for anyone to help you. The only way I can help you is if you private message me a login for your instance of fmltc that has access to your team space. Then I can login to your instance and see what you see, and perhaps identify some areas for improvement.

To Private Message (PM) me, click on my username and it will open my user profile. In the upper-right will be a blue “Message” button, click that to open a private message window to send me a message that only you and I can see. I would need a username and password as well as the URL for your instance. I promise I will do nothing but “look.”


Define “failed”.

How close is the “type of image” of the 3rd one from the other two? Are you willing to share an image of each of the three types of labels on which you are attempting to train the model?

They’ve already identified that the graphs are showing bad training results, so based on this it seems clear that the model training is being confused. This is almost always due to inconsistency in labeling. This is a really good thread showing how the automatic tracking can completely butcher your labeling if you don’t watch it like a hawk.

Yeah, that’s the key here I think. The hardest thing for those who are just starting is taking the time to label their videos correctly - it’s a time-intensive process even when you’re using auto-tracking (no auto-tracking algorithm is perfect for all situations). When I developed the default TensorFlow PowerPlay model with the 3 signal images I initially spent about 1.5 hours on each of 12 videos labeling 30 seconds of video at 15 frames per second (450 frames per video, ~5400 frames total). Then when I ran into issues I had to redo videos, and I decided to completely change how I was taking the videos in order to make labeling them simpler. I made the motion a lot slower and smoother, I took 2 minutes of video per video and changed the framerate afterwards (from 30fps to 5fps, only keeping every 6th frame throwing away most of the frames since they were mostly duplicates anyway), and others. The end result was a video that the default tracker could more easily track with great results and it took me significantly less time to label. If you try to cut corners in labeling, it will significantly bite you later. But I can’t really tell what’s going on in their specific case without seeing their labeled videos. And unfortunately a sample isn’t good enough, I need to see them all.


In this statement I wasn’t referring to the labeling process by boxes, but the actual image that they are wanting to train the model to recognize for the 3 parking locations (or other applications of object detection). For instance, for the default POWERPLAY signal the images are the Bolts, Bulbs and Panels.

Oh, I see. Since they kept referring to “objects” rather than “images”, it seems like your “(or other applications of object detection)” seems to be the most applicable. From those who have reached out to me for help or who have posted elsewhere online, people have been doing a lot of different kinds of object detection in PowerPlay than I ever expected - included detecting the springs, ground junctions, and even the tops of the poles. I’m excited to see how well teams are able to use Machine Learning (and TensorFlow) this season!


1 Like

Thank you for your replies. I realized to access my instance I need to sign in with my personal google username and password. I should set up the instance under our team’s account earlier…

This is our team’s first year using tensor flow datasets training. We use openCV in past years and this year as well. The purpose of building this Tensor Flow project is mostly for team members’ new learning experience at the moment. We need to find out more details about TF by experimenting, e.g. we need to experiment and see what is the result of combining multiple datasets with different labels into one .ftlite file. How fast and accurate does the robot identify multiple objects in different labeling on robot’s controller… OpenCV can do similar things with customize calculations…

Thank you for sharing the information about using auto-tracking and fps setting in video. We will double check the labeling. It is just interesting that the first two models have good results with the same auto-tracking processing as the third one (We ended up spending more time on retaking the third video).

Maybe we will try taking 2 mins of video per video with 5fps, ~600 total frames for each video, and maybe 2-3 videos total ~1800 frame each object? I guess we truly have the beginner’s luck when we tried the first two models…

Thank you for helping our team.

1 Like

@laubarbara3 , looking forward to seeing what you discover.

Bearbotics is exploring the same…how far can object detection be used in the game. Was able to get a custom signal model working, but…in an attempt to define some additional objects to detect in the game the model started having problems with the original 3 custom signal objects to detect. Basically the loss was converging to zero but when running the model the detections that worked before were wrong and it wasn’t detecting the new objects.

I’m assuming I will need to reassess the new objects and their labelling.

Coach Breton

Thank you for your replies. The third one is very similar to the first two. The PowerPlay.tflite works really well. We made the decision to use our team elements in OpenCV earlier. At this moment, we are experimenting TF for other possible object detection and estimate the result with OpenCV.

If you haven’t solved yet, then if the third is very similar I would recommend changing the image or at least the color to see if the training loss converges towards zero.

Coach Breton

Thank you for your suggestion. I will try for different color and background.