Chart Tells the Training Story


Lowered the video resolution to HD enabled me to cross the finish line during training. Liked the fact that the chart indicated 3,000 steps resulted in over training. Will continue on tuning training and tracking parameters while the students replicate the process and then let the “rubber meet the road.”

@ddiaz and team deserve an extra pat on making the process seamless once the user Reads The Manual. Well planned! Thanks.

Kind regards.

1 Like

Please be aware that over-training is especially easy to do with extremely small datasets - we are not optimized for extremely small dataset training (called “few-shot training”). We anticipated most teams to be using datasets including 2-3 different labels using around 1,000-1,500 frames total and the “warm-up” algorithm is currently optimized for training around 3,000 steps (the tell-tale learning_rate graph will show the linear ramp up and exponential drop off of the learning rate for “warm-up”) . We tried tweaking the warm-up algorithm right before going live with GPU training that better optimized smaller datasets, but I noticed some oddities in model training and decided not to push those tweaks without more testing.

The take-away here is that mostly the proof of over-training is more “in the pudding” than “in the metrics”. You want the model to converge (where the graphs level out and begin showing no improvement with more training) but you’ll find that when the model is over-trained it does a very poor job of generalizing and only responds to input data that is similar to the training data; the difficulty is that this could also just mean that your training data wasn’t varied enough, and the model picked up on unintended features like lighting or backgrounds or seeing a particular color in a particular spot.

So if you’re training with an extremely small dataset, you want to be careful of over-training but you also want to be careful about making sure you have enough variation in your data so that the model won’t make an observation about the data that you didn’t intend and key off of whatever it found. The articles I linked to in section 7 about tanks is an interesting read (the “did this really happen” is questionable, but the “is this possible” is fascinating because it is).


1 Like

Excellent points from you, @ddiaz, here. Thanks.

On the subject of “tanks” I’d simply like to state that the narratives are good examples of the Chinese whisper game. Also, I remember the IFF incident during the Falklands war (and since I was then employed in automated contour mapping - now obsolete) and have always been eager to stay informed on detection/classification techniques.

Kind regards.

1 Like

As another data point, our team’s first model ran into the UI issue reported in a separate post that ultimately resulted in the bounding boxes being drawn too high and cutting off the bottom of the shipping element (the one label for which we trained). The dataset included 360 total images and we used 1000 steps and 21 minutes of training time.

When we tried out the model the team was amazed at how well it found the shipping element in almost every scenario, even with the flawed bounding boxes.

Lots of ideas were generated from this first experiment so the team will be creating another set of input videos with improved variation and without the bounding box issue. They can’t wait to see if that works even better.

But really, they were somewhat shocked at how well the first flawed model worked. Thanks to all of you for this tool. It’s a great experience for the team and will really help them this season. You may have created at least one budding data scientist; she is graduating from high school and will start in computer science in college next year and this experience is opening her eyes to the breadth of the field.

Confidential Robotics - Team 13243, Eagan, Minnesota


Hello @Brett,

I can confirm your observation on Bounding Boxes with similar results (and I did experiment with some of the Trackers).

What prompted me to respond was your comment about your student(s). What are your thoughts on the fact that the next generation of Data Scientists can skip the math but still solve interesting problems?

Kind regards.

My opinion, on the surface, is that yes, a programmer (and in some cases a non-programmer) with the right tools and basic knowledge can create useful machine learning models. I’ve seen it done at my work by folks using Amazon Sagemaker to create useful, albeit not the most finely tuned, models.

However, I adhere to the philosophy that knowing what is going on under the covers allows a practitioner (in almost any field) to solve problems more efficiently, accurately, and with more reliability. In-depth knowledge also allows a deeper level of sophistication and cross-pollination between domains, which can often produce creative and novel solutions.

The student on my team who is most interested in this also happens to enjoy both programming and math. If she decides to further study data science long term, I’m sure she will go down the road of learning the statistics and math behind the state-of-the-art because that’s her personality.

Having said that, I don’t necessarily think you need a degree in math or statistics to understand the fundamental underpinnings of machine learning. Books like “Mathematics for Machine Learning” (downloadable here: are excellent resources to narrow the field of study for those motivated enough to dig through it. Also lecture series like the Bloomberg “Foundations of Machine Learning” taught by David Rosenberg are excellent as well (but challenging - see here: Foundations of Machine Learning).

My advice to young programmers who want to make a career out of data science is to get through the math and statistics at least once using resources like the two above or through formal education.

But again, I do believe that the types of tools and pre-built models coming out today will enable a lot of useful problem solving without deep theoretical underpinnings, but if you have a personality like the student on my team, learning the theory behind the practice can help immensely.

That’s my 2 cents. :slightly_smiling_face:

1 Like

Thanks for taking the time to expound a little bit, @Brett.

I’m 73 years old (i.e. studied and practiced a lot but have forgotten most of the details and remember only the names and some gotchas) working as a Mentor for the 1st time. Your comments brought a smile to my face. There is not one sentence in your note that I could disagree with. I sought your opinion because another person had posed the same question in another forum. To my surprise he (an experienced ML practitioner by now) explained to me that he started with Keras only and didn’t see the value of studying (not simply knowing) the underlying math.

I feel a little reassured by your thoughts because that is indeed my approach for the students in the team.

Kind regards.

Thanks so much for being a mentor @baqwas!

We just finished running through creating a model, and very much appreciate being part of the beta group. Based on what was just stated about over-training, have a few questions to ask to determine how we should move forward. We used the default calculations for the training steps (3000) with the dataset we had created. Looking through the graphs, it appears that at around 2300 steps, the model becomes unstable, and does not recover through the remaining steps up to 3000 like it did when the graphs showed a dip in numbers at around 1300 steps. Question, should we be rerunning the training but limit it to 2100 (or perhaps even 1000) steps to get a good model to use.? Or should we be looking to add additional video that gives a more varied set of images/conditions?

Appreciate everyone’s effort in providing this tool for teams to use. We found the instructions to be easy to follow. One area that we are not quite clear about is how tight should the bounding boxes be on the object. Should we be striving to adjust them to go right to the outermost edge of the object or is it OK to be slightly larger than the object.

FTC14706, Armada Pi-Gears

Hello @MarkG,

I may have forgotten to mention earlier but our experience was 2,000 steps as the optimal for a single basic Duck dataset where we panned from the front only for the video.

Something happened (with CSRT tracking) that after the upper “colliding” with the upper edge of the frame, the bounding box began to shrink and excluded most of the Duck’s body except the neck, head and beak. I feel that this is normal. I’ve informed the team members to stay alert when they perform their hands-on exercises.

All in all, except for the initial browser challenges, a very satisfying experience as demonstrated to the team members who are now eager to get their feet wet. Thanks.

Kind regards.