machine learning

Implementing an Elo rating system for European football

My football prediction has previously relied upon a Bayesian approach to quantify a team’s skill level, by modelling it as a random intercept in a hierarchical model of the outcome of a match. While this model performed very well (62% accuracy last season), I was never fully satisfied since this measure of skill is an average across the last ten seasons that I had data for, rather than being updated to reflect the time-varying nature of form.

Predicting football results in 2016-2017 with machine learning - Automated betting system

The last post showed that using a fully Bayesian multi-level model of the match outcomes helped Predictaball achieve a 58% overall prediction accuracy on the four European leagues, up 8% from last season. This post will describe the betting system I used to try and profit by identifying value bets in the offered odds. Betting system Before delving into the profit analysis I’ll firstly quickly summarise the staking model I used since I haven’t mentioned it anywhere before.

Predicting football results in 2016-2017 with machine learning - Bayesian hierarchical modelling

And so we come to the end of another season of football, and more importantly, Predictaball! This season has seen several large updates that I was meaning to detail these at the start of the season but life got in the way. The predictive model is now fully Bayesian I’ve added a betting system that identifies value bets I’ve expanded it to include the 3 other main European leagues: La liga Serie A Bundesliga Rather than detailing these new aspects as well as summarising the season’s performance in one massive blog, I’ll split this into two parts.

Predicting AFL results with hierarchical Bayesian models using JAGS

I’ve recently expanded my hierarchical Bayesian football (aka soccer) prediction football prediction framework to predict the results of Australian Rules Football (AFL) matches. I have no personal interest in AFL, instead I got involved through an email sent to a statistics mailing list advertising a competition that’s held by Monash University in Melbourne. Sensing an opportunity to quickly adapt my soccer prediction method to AFL results and to compare my technique to others, I decided to get involved.

Predictaball end of season review for 2015-2016

This post summarises Predictaball’s performance in the 2015-2016 season. I’ll look at overall performance, accuracy per week, how it fared in terms of making profit, and finally the annual comparison with Lawro. Compared to last year when it achieved 48% overall, Predictaball has fared less well this season with 43%. This isn’t largely surprising since this season has been full of surprises to say the least, with Leicester beating out the traditional top four for the title, and Spurs doing their best to break the monopoly (despite failing in typical Spurs fashion).

Generating Iron Maiden lyrics with Markov chains

I’ve been wanting to play with Markov Chains for a while now, and now that I’m starting to get into Bayesian analysis I’m going to need to use them more often. One fun use of them is to generate text which can (at a stretch) pass as written by a drunk person. For nice examples of them in action have a look at Garkov (generated Garfield strips) or even an entire subreddit generated with them reddit.

Appreciating the distinction between explanatory and predictive modelling

“Two Cultures” One aspect of statistical modeling which can be taken for granted by those with a bit of experience, but may not be immediately obvious to newcomers, is the difference between modeling for explanation and modeling for prediction. When you’re a newbie to modeling you may think that this only has an effect on how you interpret your results and what conclusions you’re aiming to make, but it has a far bigger impact than that, from influencing the way you form the models, to the types of learning algorithms you use, and even how you evaluate their performance.

Incorporating odds into Predictaball

I’ve tinkered around with Predictaball a bit recently in an effort to increase its accuracy, with the overall goal of beating Paul Merson and Lawro so that I can claim ‘human competitiveness’. I’ve mentioned in previous posts that I envisage 2 potential ways to achieve this. Include more player data Incorporate bookies odds Adding more player data (such as a variable for each player indicating whether they are in the squad or not) would allow the model to account for situations when a player who is strongly associated with the team winning is now injured - for an example see City’s abysmal record when Kompany isn’t playing.

Predictaball end of season review

It’s been a while since I’ve posted anything as I’ve spent my summer in a thesis related haze, which I’m starting to come out of now so expect more frequent updates - particularly as I work my way through the backlog of ideas I’ve been meaning to write about. I’ll start with assessing Predictaball’s performance last season. Just to summarise, this was a classification task attempting to predict the outcome (W/L/D) of every premier league match from the end of September onwards.

Implementation of Multi-Area Under Curve (MAUC) in Python

Receiver Operating Characteristics (ROC) are becoming increasingly commonly used in machine learning as they offer a valuable insight into how your model is performing that isn’t captured with just log-loss, facilitating diagnosis of any issues. I won’t go into much detail of what ROC actually is here, as this post is more intended to help navigate people looking for a MAUC Python implementation. If however you are looking for an overview of ROC then I’d recommend Fawcett’s tutorial here.