Stuart Lacy

Data Scientist

Wolfson Atmospheric Chemistry Laboratories, University of York

Biography

I’m a data scientist interested in applying machine learning to some of the biggest challenges facing society. In my current role I work on high resolution air quality measurements recorded from low-cost sensors using a combination of deep learning and Bayesian statistics. Previously I have applied machine learning to form complex survival models of haematological malignancies and analysed movement disorder data from neurodegenerative conditions. I also have an interest in software development, and in particular developing tools for others to use, such as statistical packages in R and interactive dashboards. My main tools are R, probabilistic machine learning in Stan, and PyTorch.

Interests

Machine Learning
Bayesian statistics
Air Quality
Health Informatics
Artificial Intelligence

Education

PhD in Electronic Engineering, 2016
University of York
MEng in Electronic Engineering, 2012
University of York

Recent blog articles

StravaR

Dec 5, 2023 6 min read

I’ve finally managed to combine my two main hobbies of data science and running into one project: a Shiny web-app that allows you to explore your Strava fitness data, as well as providing a local database of all your activities that you can analyse with R. My initial motivation was that I wanted quick access to certain visualisations and metrics that either Strava doesn’t provide, or are awkward to get so I wrote a basic app for my own use.

An introduction to Recurrent Neural Networks for scientific applications with a case study on air quality modelling

Oct 27, 2023 19 min read

Introduction Deep learning has attracted considerable attention for its near-human ability in a variety of complex problems such as image recognition, playing games, and recently conversational AI through large language models. Each of these applications requires unimaginable volumes of data and computational resources beyond the reach of all but the richest companies. This resource hungry nature, coupled with the huge hype that accompanies any deep-learning application, makes it challenging to gain a realistic assessment of their real-world potential for less demanding use-cases, such as scientific time-series modelling.

Speeding up R workshop

Jun 28, 2023 3 min read

Just a quick update, more to test that the website infrastructure is still running than anything else, as it’s been 4 years since my last post. I ran a workshop (slides here) at the University’s Research Coding Club on speeding up data analysis in R last month that might be useful for anyone who stumbles across this page in the future. The Research Coding Club is an informal collective of people from across the entire University who write software to aid their research.

Dirichlet Process Mixture Models Part III: Chinese Restaurant Process vs Stick-breaking

Sep 26, 2019 9 min read

In the previous entry of what has evidently become a series on modelling binary mixtures with Dirichlet Processes (part 1 discussed using pymc3 and part 2 detailed writing custom Gibbs samplers), I ended by stating that I’d like to look into writing a Gibbs sampler using the stick-breaking formulation of the Dirichlet Process, in contrast to the Chinese Restaurant Process (CRP) version I’d just implemented. Actually coding this up this was rather straight forward and took less time than I expected, but I found the differences and similarities between these two same ways of expressing the same mathematical model interesting enough for a post of its own.

Writing Gibbs Samplers for clustering binary data using Dirichlet Processes

Sep 9, 2019 11 min read

Back at the start of the year (which really doesn’t seem like that long a time ago) I was looking at using Dirichlet Processes to cluster binary data using PyMC3. I was unable to get the PyMC3 mixture model API working using the general purpose Gibbs Sampler, but after some tweaking of a custom likelihood function I got something reasonable-looking working using Variational Inference (VI). While this was still useful for exploratory analysis purposes, I’d prefer to use MCMC sampling so that I have more confidence in the groupings (since VI only approximates the posterior) in case I wanted to use these groups to generate further research questions.

Recent publications

More Publications

Using multi-state modelling to facilitate informed personalised treatment planning in Follicular Lymphoma

This talk discussed on an application of multi-state modelling to predict treatment pathways of a disease with heterogeneous disease management options, often involving multiple lines of active treatment.

Stuart E Lacy

Survival Analysis for Junior Researchers 2018, 2018.

Slides

Using echo state networks for classification: A case study in Parkinon's disease diagnosis

Despite having notable advantages over established machine learning methods for time series analysis, reservoir computing methods, such as echo state networks (ESNs), have yet to be widely used for practical data mining applications. In this paper, we address this deficit with a case study that demonstrates how ESNs can be trained to predict disease labels when stimulated with movement data. Since there has been relatively little prior research into using ESNs for classification, we also consider a number of different approaches for realising input–output mappings. Our results show that ESNs can carry out effective classification and are competitive with existing approaches that have significantly longer training times, in addition to performing similarly with models employing conventional feature extraction strategies that require expert domain knowledge. This suggests that ESNs may prove beneficial in situations where predictive models must be trained rapidly and without the benefit of domain knowledge, for example on high-dimensional data produced by wearable medical technologies. This application area is emphasized with a case study of Parkinson’s disease patients who have been recorded by wearable sensors while performing basic movement tasks.

Stuart E Lacy, Stephen L Smith, Michael A Lones

Artificial Intelligence in Medicine, 2018.

Preprint Published article

Exploring Multi-State Models using Interactive Shiny Web Applications

This work presented an interactive web application for building multi-state models of disease pathways. The app is flexible, allowing for both parametric and semi-parametric models, with transition-specific distributions. The presentation won the award for Best Presentation.

Stuart E Lacy, Stephanie J Lax

Survival Analysis for Junior Researchers 2017, 2017.

Evaluating Semi-Parametric Predictive Models

A survey of possible ways to evaluate survival models that are intended for prognostic, rather than inferential aims. The work was demonstrated on a clinically motivated data set of Follicular Lymphoma. This presentation won the Best in Session Award.

Stuart E Lacy, Stephanie J Lax

Survival Analysis for Junior Researchers 2016, 2016.

Evolving Ensembles: What Can We Learn from Biological Mutualisms?

Ensembles are groups of classifiers which cooperate in order to reach a decision. Conventionally, the members of an ensemble are trained sequentially, and typically independently, and are not brought together until the final stages of ensemble generation. In this paper, we discuss the potential benefits of training classifiers together, so that they learn to interact at an early stage of their development. As a potential mechanism for achieving this, we consider the biological concept of mutualism, whereby cooperation emerges over the course of biological evolution. We also discuss potential mechanisms for implementing this approach within an evolutionary algorithm context

Michael Lones, Stuart E Lacy, Stephen L. Smith

Information and Processing in Cells and Tissues, 2015.

Projects

Contact

stuart.lacy@gmail.com

Stuart Lacy

Data Scientist

Wolfson Atmospheric Chemistry Laboratories, University of York

Biography

Interests

Education

Recent blog articles

StravaR

An introduction to Recurrent Neural Networks for scientific applications with a case study on air quality modelling

Speeding up R workshop

Dirichlet Process Mixture Models Part III: Chinese Restaurant Process vs Stick-breaking

Writing Gibbs Samplers for clustering binary data using Dirichlet Processes

Recent publications

Using multi-state modelling to facilitate informed personalised treatment planning in Follicular Lymphoma

Using echo state networks for classification: A case study in Parkinon's disease diagnosis

Exploring Multi-State Models using Interactive Shiny Web Applications

Evaluating Semi-Parametric Predictive Models

Evolving Ensembles: What Can We Learn from Biological Mutualisms?

Projects

A Multi-State Modelling web app

Predictaball

rprev

Tags

Contact