epitab - Contingency Tables in R

Nov 13, 2017 7 min read

I’ve just released a new package onto CRAN and while it doesn’t perform any complex calculations or fit a statistical niche, it may be one of the most useful everyday libraries I’ll write. In short, epitab provides a framework for building descriptive tables by extending contingency tables with additional functionality.

I initially developed it for my work in epidemiology, as I kept coming across situations where I wanted to programmatically generate tables containing various descriptive statistics to facilitate reproducible research, but I could not find any existing software that met my requirements. I tried Epi::stats.table, but found it limited by not being able to display multiple independent variables; adding a third variable builds a 3-way table instead. It also lacks the ability to calculate statistics that aren’t from a cross-tabulated combination of covariate and outcome. My final requirement was that I wanted a way of tidying the table for publication in various formats.

This post will provide a brief introduction to epitab and detail how to use its basic functionality. For further guidance see the vignette, or the reference manual.

Installation

The current version (0.2.1 as of the time of writing) is hosted on CRAN and can easily be installed in the usual manner.

install.packages("epitab")

Development is managed on Github, and so the latest release can be installed with devtools.

devtools::install_github("stulacy/epitab")

Basic usage

The mtcars data set will be used to demonstrate the types of tables that can be built with epitab. Note that discrete variables are coerced into factors to simplify subsequent analysis; in epidemiology it is even common to discretise continuous variables.

library(epitab)
library(dplyr)
library(knitr)
facs <- c('cyl', 'am', 'gear', 'carb', 'vs')
mtcars[facs] <- lapply(mtcars[facs], factor)
head(mtcars) %>%
    kable()

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

A standard contingency table built with epitab is displayed below. The covariates (displayed on the table rows) are defined in argument independents; outcomes specifies the outcome variables (columns). These are both provided in the form of named lists, with the names giving the column/row labels. The crosstab_funcs argument defines statistics to calculate for each covariate/outcome combination; in this example the freq function (provided in epitab) calculates the frequency.

contingency_table(independents=list("Cylinders"="cyl", "Num gears"="gear"),
                  outcomes=list("Manual transmission"="am"),
                  data=mtcars,
                  crosstab_funcs = list(freq()))

    ##               |          |        |Manual transmission     |            |
    ##               |          |All     |0                       |1           |
    ## -------------------------------------------------------------------------
    ##               |          |        |                        |            |
    ##               |Total     |32      |19 (59)                 |13 (41)     |
    ##               |          |        |                        |            |
    ## Cylinders     |4         |11      |3 (27)                  |8 (73)      |
    ##               |6         |7       |4 (57)                  |3 (43)      |
    ##               |8         |14      |12 (86)                 |2 (14)      |
    ##               |          |        |                        |            |
    ## Num gears     |3         |15      |15 (100)                |0 (0)       |
    ##               |4         |12      |4 (33)                  |8 (67)      |
    ##               |5         |5       |0 (0)                   |5 (100)     |

The above table is suitable for use in an interactive R console, however, if the table is to be shared with others then a clean, easily exportable version is required. The function neat_table produces a knitr::kable object that can be exported to either HTML or LaTeX. This allows for the generation of descriptive tables alongside the analysis in RMarkdown, thereby facilitating reproducible research.

contingency_table(independents=list("Cylinders"="cyl", "Num gears"="gear"),
                  outcomes=list("Manual transmission"="am"),
                  data=mtcars,
                  crosstab_funcs = list(freq())) %>%
    neat_table() %>%
    kableExtra::kable_styling(bootstrap_options = c('striped', 'hover'),
                              full_width=FALSE)

		Manual transmission
	All	0	1
Total	32	19 (59)	13 (41)
Cylinders
4	11	3 (27)	8 (73)
6	7	4 (57)	3 (43)
8	14	12 (86)	2 (14)
Num gears
3	15	15 (100)	0 (0)
4	12	4 (33)	8 (67)
5	5	0 (0)	5 (100)

Note that multiple outcomes can be passed into contingency_table.

contingency_table(independents=list("Carburetors"="carb", "Num gears"="gear"),
                  outcomes=list("Manual transmission"="am", "Cylinders"="cyl"),
                  data=mtcars,
                  crosstab_funcs = list(freq())) %>%
    neat_table() %>%
    kableExtra::kable_styling(bootstrap_options = c('striped', 'hover'),
                              full_width=FALSE)

		Manual transmission		Cylinders
	All	0	1	4	6	8
Total	32	19 (59)	13 (41)	11 (34)	7 (22)	14 (44)
Carburetors
1	7	3 (43)	4 (57)	5 (71)	2 (29)	0 (0)
2	10	6 (60)	4 (40)	6 (60)	0 (0)	4 (40)
3	3	3 (100)	0 (0)	0 (0)	0 (0)	3 (100)
4	10	7 (70)	3 (30)	0 (0)	4 (40)	6 (60)
6	1	0 (0)	1 (100)	0 (0)	1 (100)	0 (0)
8	1	0 (0)	1 (100)	0 (0)	0 (0)	1 (100)
Num gears
3	15	15 (100)	0 (0)	1 (7)	2 (13)	12 (80)
4	12	4 (33)	8 (67)	8 (67)	4 (33)	0 (0)
5	5	0 (0)	5 (100)	2 (40)	1 (20)	2 (40)

Other summary statistics

Alongside the cross-tabulated frequencies, summary statistics that are dependent on only the covariates or the outcomes can be displayed. Column-wise functions are those that act on every outcome column, independently from the covariates. This behaviour is useful for identifying a relationship between a continuous independent variable and the outcome(s). Two column-wise functions are supplied with epitab: summary_mean and summary_median, which calculate the mean and median values of a specified covariate for each outcome level. The table below shows that in this data set, manual cars have less power but greater fuel economy than automatics.

contingency_table(independents=list("Cylinders"="cyl", "Num gears"="gear"),
                  outcomes=list("Manual transmission"="am"),
                  data=mtcars,
                  crosstab_funcs = list(freq()),
                  col_funcs=list("Mean MPG"=summary_mean("mpg"),
                                 "Mean horsepower"=summary_mean("hp"))) %>%
    neat_table() %>%
    kableExtra::kable_styling(bootstrap_options = c('striped', 'hover'),
                              full_width=FALSE)

		Manual transmission
	All	0	1
Total	32	19 (59)	13 (41)
Cylinders
4	11	3 (27)	8 (73)
6	7	4 (57)	3 (43)
8	14	12 (86)	2 (14)
Num gears
3	15	15 (100)	0 (0)
4	12	4 (33)	8 (67)
5	5	0 (0)	5 (100)
Mean MPG		17.15	24.39
Mean horsepower		160.26	126.85

Row-wise functions are the opposite, and act on each covariate level independently of the outcomes. This is useful for displaying measures such as regression coefficients. The example below shows the odds ratios of a car having a manual transmission for each number of cylinders and engine shape, with a car being less likely to have manual transmission the greater the number of cylinders. This is calculated with the epitab::odds_ratio function, which obtains the odds ratios for each covariate level from a logistic regression on a specified outcome. epitab also provides functionality for displaying hazard ratios for time-to-event outcomes. Note that in the example below the odds ratios are obtained from univariate models of each covariate in turn, although functionality is provided for adjusting for other factors. See the help page for odds_ratio for further details.

contingency_table(independents=list("Cylinders"="cyl", "Engine shape"="vs"),
                  outcomes=list("Manual transmission"="am"),
                  data=mtcars,
                  crosstab_funcs = list(freq()),
                  col_funcs=list("Mean MPG"=summary_mean("mpg"),
                                 "Mean horsepower"=summary_mean("hp")),
                  row_funcs=list("OR"=odds_ratio("am"))) %>%
    neat_table() %>%
    kableExtra::kable_styling(bootstrap_options = c('striped', 'hover'),
                              full_width=FALSE)

		Manual transmission
	All	0	1	OR
Total	32	19 (59)	13 (41)
Cylinders
4	11	3 (27)	8 (73)	1
6	7	4 (57)	3 (43)	0.28 (0.03 - 1.99)
8	14	12 (86)	2 (14)	0.06 (0.01 - 0.39)
Engine shape
0	18	12 (67)	6 (33)	1
1	14	7 (50)	7 (50)	2.00 (0.48 - 8.76)
Mean MPG		17.15	24.39
Mean horsepower		160.26	126.85

Custom functions

This brief introduction has shown the use of inbuilt functions for displaying the three types of summary statistics, such as freq, odds_ratio, and summary_mean. However, the main strength of the package as I see it is that any user-defined function can be used in each of these three roles, provided they meet the required parameterisations. Please see the vignette for details.

Further work

It’s a rather simple package but one that I’ve already incorporated quite frequently into my exploratory analysis workflow. I’m always keen to improve it and would appreciate any comments and feedback (contact me here). Particular features I’d like to add in future releases include the ability to calculate statistics related to each covariate and outcome but not necessarily calculated for each level, such as displaying the output of a chi-square test between the covariate and outcome. I’d also welcome feedback on different options for formatting the table; the layout that is currently used is one that makes sense to me but others may have different preferences.