Bayesian statistics

Predictaball retrospective part 4 - Particle Filters

Introduction This is the fourth in a series of posts looking back at the various statistical and machine learning models that have been used to predict football match outcomes as part of Predictaball. Here’s a quick summary of the first 3 parts: Part 1 used a Bayesian hierarchical regression to model a team’s latent skill, where skill was constant over time Part 2 used an Elo rating system to handle time, but the functions and parameters were hardcoded and a match prediction model was bolted on top to replace Elo’s basic prediction Part 3 used Evolutionary Algorithms (EA) to simultaneously optimize the rating system and match prediction model without requiring any hardcoding parameters The EA model has working reliably for the last 5 and a half seasons and hasn’t been tweaked since.

Predictaball retrospective part 1 - Hierarchical Bayesian regression

Introduction This is the first in a retrospective series of posts looking at the evolution of Predictaball - my football match prediction engine - and reflect on how it mirrors my own development as a data scientist. I’ve been fortunate to work in a wide variety of domains, exposing me to a range of statistical paradigms and perspectives, which have been reflected in the models used in Predictaball. At the end of the series I’ll do a full comparison of all the algorithms too, as this is something I’ve never done before but have been interested in for a long time.

Dirichlet Process Mixture Models Part III: Chinese Restaurant Process vs Stick-breaking

In the previous entry of what has evidently become a series on modelling binary mixtures with Dirichlet Processes (part 1 discussed using pymc3 and part 2 detailed writing custom Gibbs samplers), I ended by stating that I’d like to look into writing a Gibbs sampler using the stick-breaking formulation of the Dirichlet Process, in contrast to the Chinese Restaurant Process (CRP) version I’d just implemented. Actually coding this up this was rather straight forward and took less time than I expected, but I found the differences and similarities between these two same ways of expressing the same mathematical model interesting enough for a post of its own.

Writing Gibbs Samplers for clustering binary data using Dirichlet Processes

Back at the start of the year (which really doesn’t seem like that long a time ago) I was looking at using Dirichlet Processes to cluster binary data using PyMC3. I was unable to get the PyMC3 mixture model API working using the general purpose Gibbs Sampler, but after some tweaking of a custom likelihood function I got something reasonable-looking working using Variational Inference (VI). While this was still useful for exploratory analysis purposes, I’d prefer to use MCMC sampling so that I have more confidence in the groupings (since VI only approximates the posterior) in case I wanted to use these groups to generate further research questions.

Modelling Bernoulli Mixture Models with Dirichlet Processes in PyMC

I’ve been spending a lot of time over the last week getting Theano working on Windows playing with Dirichlet Processes for clustering binary data using PyMC3. While there is a great tutorial for mixtures of univariate distributions, there isn’t a lot out there for multivariate mixtures, and Bernoulli mixtures in particular. This notebook shows an example of fitting such a model using PyMC3 and highlights the importance of careful parameterisation as well as demonstrating that variational inference can prove advantageous over standard sampling methods like NUTS for such problems.