Looking for some guidance on Number of Training Steps. All my 20 datasets (different game elements - individual and mixes) FAIL at the 100 step mark during training. Since the datasets are varied and the User Guide explains the equation, is there any further guidance on data preparation?
Very few frames (most videos were ~5 secs segments, 3 were ~10 secs mark taken before I read the manual) were dropped and I independently checked the tracking (by eyeballing the box coordinates and feeding these to a standalone program with the same set of OpenCV trackers) and everything looked satisfactory.
Why would all training jobs fail at the 100 step mark?
We looked at the logs, and it turns out that your model job ran out of memory. I noticed that your videos are all 3840 x 2160, and it’s possible that’s the issue - our batch size (32) and the huge frames might be overflowing the memory buffer for the GPU processing. Our maximum frame size was determined using TPUs, which had a larger batch size but also MASSIVELY larger memory.
Is it possible for you to downsample the videos to a smaller resolution? If you can, the smaller frames would likely not overflow the memory buffer. We can look at options of scaling images FOR you within the tool, but that’s not something we can do at the moment.
Is there any way we could ask you to put one of your videos onto Google Drive (or any file sharing service) and email me a share link? We’d like to use one of your videos for our internal testing if that’s okay.
BIG, big difference in file sizes now! E.g. From 32 MB to 2.6 MB at 1080p HD.
Best part is that I already have an OpenCV Python script for image conversion (for an entire folder of files) that I can cannibalize for video conversion. Thanks again for helping me towards the finish line.