What exactly in the model determines the effect

What exactly are the factors that have the biggest impact when we train a model in the course of our work? Model depth, model width, number of HEADs, HEAD depth, input size, output size, and so on.

Deep learning based inspection models are especially widely used in industry. This year, due to internal changes, I was exposed to a variety of different projects and models. I found a pattern, the model can be very small, but when the amount of data is very large, it can improve the effect as well. At the same time, when the input size becomes large, it is more useful than how much model reconstruction is done, of course, provided that the backbone and the head are similar.

The previous model was bloated and time consuming, not to mention that it took up video memory, so that the batchsize size actually had an impact. So, I cut the head of the model to a very shallow degree, and then cut the backbone a lot. Found that the time consumption was reduced a lot, and the LOSS training didn't change much. Then when I deployed to hardware again, I increased the input size to 2x the previous small size, and also increased the data a lot due to changing the complexity of the model. So the results were still 5-6 points higher than the previous very large (1.5 times larger than this current size) size.

This means two things: 1. I cut down the backbone and head of the model, and increase the data, but the model still works better than the model with bigger input size, which means the model is far from saturated, i.e., the model is still too complex relative to the data. 2. I made the model smaller, and the input size is bigger, and the effect is much better than that of the model with smaller input size, which means the model should be upgraded to improve the actual effect compared to the model complexity. This means that compared to the model complexity, the input size of the model should be increased to improve the actual results.

So in engineering, don't pursue any model complexity and say reduce the input size and increase the model complexity to improve the effect. The input is so small that when the model is output, it is downsized to nothing, so where is the effect? Don't do these very amateurish experiments. In a word, in engineering, when the amount of computation is limited, we should increase the data and input size as much as possible. In scientific research, there is no limit to the amount of computation anyway, and the inputs are not very small, so increase the data and use a powerful model.