Introduction This is the fourth in a series of posts looking back at the various statistical and machine learning models that have been used to predict football match outcomes as part of Predictaball. Here’s a quick summary of the first 3 parts:
Part 1 used a Bayesian hierarchical regression to model a team’s latent skill, where skill was constant over time Part 2 used an Elo rating system to handle time, but the functions and parameters were hardcoded and a match prediction model was bolted on top to replace Elo’s basic prediction Part 3 used Evolutionary Algorithms (EA) to simultaneously optimize the rating system and match prediction model without requiring any hardcoding parameters The EA model has working reliably for the last 5 and a half seasons and hasn’t been tweaked since.
“Two Cultures” One aspect of statistical modeling which can be taken for granted by those with a bit of experience, but may not be immediately obvious to newcomers, is the difference between modeling for explanation and modeling for prediction. When you’re a newbie to modeling you may think that this only has an effect on how you interpret your results and what conclusions you’re aiming to make, but it has a far bigger impact than that, from influencing the way you form the models, to the types of learning algorithms you use, and even how you evaluate their performance.