clustering

Clustering running data using Dynamic Time Warping

Motivation I recently needed to implement some time-series clustering at work on a high-dimensional dataset. Having not tackled this specific problem before, I wanted to practice on a smaller, univariate dataset where I already had a strong intuition about the underlying structure. I turned to my exported Strava running data as a testbed to get familiar with clustering using Dynamic Time Warping (DTW). Heartrate data Show the code library(tidyverse) library(duckdb) library(plotly) library(dtw) library(parallel) library(tidytext) library(ggwordcloud) library(patchwork) library(dendextend) library(ggridges) library(broom) library(cluster) I started by pulling data from my existing database, restricting the scope to runs that contain both heart rate and GPS location data.

Fixing bug with predicting clusters in flexmix

A second post in 2 days on mixture modelling? No awards for guessing what type of analysis I’ve been preoccupied with recently! Today’s post provides an ugly hack to fix a bug in the R flexmix package for likelihood-based mixture modelling and provides a cautionary tale about environments. In short, I’ve encountered problems when trying to predict the cluster membership for out-of-sample data using this package, and judging from a couple of posts I found online, I’m not the only one.