While writing the previous post on the two ‘cultures’ of statistical modeling for prediction and inference, I realised that I was glossing over an extremely important area of predictive modeling, and judging by frequent StackExchange posts, one that is often misunderstood. As you will have summarised from the title, I’m talking about cross-validation.
If done correctly, cross-validation (CV) will provide a thorough assessment of a predictive model providing you with: unbiased, publishable results; a means of selecting the final model instance to use for your application; and an accurate estimate of the model’s performance on future data.