StravaR

Dec 5, 2023 6 min read

I’ve finally managed to combine my two main hobbies of data science and running into one project: a Shiny web-app that allows you to explore your Strava fitness data, as well as providing a local database of all your activities that you can analyse with R.

My initial motivation was that I wanted quick access to certain visualisations and metrics that either Strava doesn’t provide, or are awkward to get so I wrote a basic app for my own use. However, I then realised that there almost certainly would be others who would value from this work, either from the app itself or the infrastructure to create and maintain a local database of your Strava data, and I decided to open it up. I stuck with the local route of everyone installing (and customising!) their own Shiny app, rather than me hosting a central version in the Cloud as I strongly feel that everyone should have complete ownership of their own data.

TL;DR: Install from GitHub

App functionality

The app itself looks as below, with the main page providing a quick GitHub-style calendar showing activities in the last year, along with an interactive table of all activities, which I find far quicker to search for a specific activity than the Strava website.

NB: the ‘Activity Type’ checkbox on the left allows you choose which sports are shown for this page and all pages. By default this is set to Runs only but you can change in the settings (cog icon in the top right).

The main metric that guides my training programs is my mileage, so I want to be able to see at a glance both how my weekly mileage is doing, and also how my year-to-date cumulative load compares with previous years.

routes

I like being aware of my heart-rate stress score based training load too, which is the same metric used in the Elevate extension.

routes

The final page displays an interactive map of all activities. When I’m not in a specific training cycle, my main motivation (especially during the winter!) comes from exploring new routes. One feature I’d like to add is the calculation of what percent of runs in a specified area you have run on, watch this space…

routes

Uploading data

On first use, you will be prompted to upload an exported archive from Strava containing all your data. This can be downloaded from the (deliberately?) confusingly named page Delete Your Account (NB: this link WILL NOT delete your account!). If you don’t want to click that scary link, in a web browser click your Profile Pic in the top right, Settings, My Account, scroll down to Download or Delete Your Account and click Get Started. Your archive will be emailed to you shortly

Linking to Strava

For subsequent use you can either continue the tedious manual process of requesting archives, waiting for the email, and uploading it, or you can directly link your account to Strava through the API. You’ll need to create an App through Strava’s developer scheme and substitute the Client ID and Secret in the line STRAVA_APP <- oauth_app(...) in server.R as the key and secret arguments respectively. Then you are good to go by clicking the Connect With Strava button in the top right!

Why can’t I just use your key and secret? Because the oauth 2.0 authentication method used by the Strava API isn’t ideal for Desktop Apps (as in those that get installed to each users’ computer, rather than running a single instance in the Cloud), as there’s no way of installing the app without making the key and secret accessible to the user. These values are considered sensitive because anyone could use them to make you think you were authorising your Strava account to connect to my app, but in reality they get access to your data. I didn’t want to write a Cloud based app as I’d rather everyone be able to own and manage their own data, rather than hand it off to yet another third party!

Database

The app creates a duckdb database to store your data, which is an embedded database designed for analytics (i.e. it is column-based compared to SQLite’s row-based), this saved in the working directory as data.db. It supports querying through standard SQL commands, see the documentation, although it is much easier to interface to it using the fantastic dbplyr package which allows you to use standard tidyverse functions instead.

For example, to load all your activities you would do the following:

library(tidyverse)
library(duckdb)

# Connect to the database
con <- dbConnect(duckdb(), "data.db")

tbl(con, "activities") |>
    collect()  # This command reads the data from the DB into memory

# A tibble: 1,171 × 7
   activity_id activity_type name                                            start_time          distance duration elevation
         <dbl> <chr>         <chr>                                           <dttm>                 <dbl>    <dbl>     <dbl>
 1  9329577757 Run           "Stretching legs"                               2023-06-25 06:43:26     7.5      2534      18.3
 2  9337323631 Run           "Cow cow cow"                                   2023-06-26 11:22:44     8.01     2667      42.9
 3  9342310857 Run           "Bish"                                          2023-06-27 06:58:44     8        2605      20.2
 4  9355605801 Run           "Acaster Malbis river"                          2023-06-29 09:12:45    16        5516      17.0
 5  9361568591 Run           "Recovery"                                      2023-06-30 10:19:55     5.01     1830      29.7
 6  9370223236 Run           "Chill evening run"                             2023-07-01 18:21:45     8.01     2615      15.5
 7  9378957042 Run           "Askham Richard - Bilbrough - Cop - Bog - Bish" 2023-07-03 06:06:38    23        8126     102. 
 8  9385907307 Run           "Cows"                                          2023-07-04 11:12:57     6.3      2094      20.8
 9  9392630511 Run           "Golf course \"\"10\"\""                        2023-07-05 11:08:25    11.5      3790      29.3
10  9399049616 Run           "Threshold run 5x1km"                           2023-07-06 11:41:49    10.0      3119      16.8
# ℹ 1,161 more rows
# ℹ Use `print(n = ...)` to see more rows

You can use standard tidyverse functions on the database connection before you load it into R with collect, i.e. if I want to find all my long runs from Summer 2020 I could do the following:

tbl(con, "activities") |>
    filter(
      start_time >= '2020-05-01', 
      start_time < '2020-09-01',
      distance > 20,
      activity_type == 'Run'
    ) |>
    arrange(desc(distance)) |>
    collect()

# A tibble: 12 × 7
   activity_id activity_type name                      start_time          distance duration elevation
         <dbl> <chr>         <chr>                     <dttm>                 <dbl>    <dbl>     <dbl>
 1  3887193271 Run           Lockdown marathon 3:51:42 2020-08-09 07:16:46     42.3    13945     152. 
 2  3786050932 Run           Too hot!                  2020-07-19 11:08:24     34      12905      79.0
 3  3578083277 Run           Lunch Run                 2020-06-07 10:50:03     32.2    12970     101. 
 4  3648703947 Run           Morning Run               2020-06-21 09:41:21     32.2    12080      92.2
 5  3717455954 Run           Lunch Run                 2020-07-05 10:51:51     32.2    11588      83.9
 6  3751088445 Run           Lunch Run                 2020-07-12 10:38:14     27.5     9485      78.2
 7  3542179455 Run           Lunch Run                 2020-05-31 11:27:01     27.5     9934      67.5
 8  3683255248 Run           Lunch Run                 2020-06-28 10:15:23     24.2     7936      71.9
 9  3467388733 Run           Afternoon Run             2020-05-17 13:48:47     24.1     8420      47.5
10  3613589532 Run           Half marathon pb 1:42:31  2020-06-14 11:39:48     21.1     6164      51.8
11  3820814279 Run           Half marathon pb 1:39:53  2020-07-26 11:11:18     21.1     6074      51.4
12  3430728862 Run           Afternoon Run             2020-05-10 13:59:18     21       7568      61.9

NB: When you are finished with the database, close the connection and shut it down!

dbDisconnect(con, shutdown=TRUE)

Installation

Simply clone the GitHub repository (from within RStudio, File -> New Project -> Version Control -> Git, then enter https://github.com/stulacy/StravaR.git as the Repository URL), then install any missing packages that are loaded in server.R (RStudio should highlight these). You can then run the app in a web browser by opening app.R, clicking the arrow next to Run App and selecting Run External, and then clicking Run App. On first use it will create the database, prompt you to enter details about yourself, and prompt for a Strava archive file.

Extending the app

Feel free to modify the source code to your heart’s content to meet your specific needs. If you come up with any interesting visualisations or models please let me know - I’d be interested to see what people get up to!

R duckdb running

StravaR

App functionality

Uploading data

Linking to Strava

Database

Installation

Extending the app

Related