Dplyr incompatibility with xtable

I’ve been working on another paper today and decided to update my previous xtable function (as described here) to use dplyr, as I want to fully get to grips with Hadley Wickham’s wonderful ecosystem of packages including dplyr (and its predecessor plyr), ggplot2 and tidyr (and its predecessor reshape2). I mentioned this before Christmas but have only got round to it now, which included a few hours of struggling with tidyr to make it do what I want! However in updating my function to use dplyr’s summarise function instead of aggregate I came across an odd bug that got me stuck for an hour or so.

Let’s start off by making an example dataframe with the accuracies of three different classification algorithms on 3 standard UCI datasets, with each algorithm being run on each dataset 5 times to account for the stochastic nature of these models.

library(dplyr)
library(xtable)
df.cls <- data.frame(dataset=rep(c("Iris", "Heart", "Liver")),
                             algorithm=rep(c("ANN", "GP", "CGP"), each=15),
                             run=rep(seq(5), each=3),
                            accuracy=runif(45))

Then using dplyr we can easily form a new dataframe with the mean accuracy from each run for each algorithm on each dataset.

results <- df.cls %>%
                   group_by(dataset, algorithm) %>%
                   summarise(mean=mean(accuracy))

The problem comes when we try to form an xtable from this dataframe, to be used in LaTeX documents.

print(xtable(results))

This will produce the following error: Error in .subset2(x, i, exact = exact) : subscript out of bounds. I spent at least an hour trying to debug this, thinking the problem was due to my processing with dplyr as I was chaining together more than 5 different functions which I was not familiar with. I eventually came across a bug report on Github which described the same problem. The person indicated their problem was solved upon updating to dplyr 0.3.0.9, although I prefer to stick to CRAN releases for stability (currently on 0.3.0.2).

To solve it simply cast the object as a dataframe in the xtable call, as such:

print(xtable(as.data.frame(results)))

Simply using data.frame (without the as.) will work but has some odd effects, such as spaces in column names being turned into full stops, presumably to make it ‘correct’ R syntax. I hope anyone who encounters the same issue comes across this blog straight away and saves themselves the hassle I went through!

Related

comments powered by Disqus