Introduction

epitable produces tables in the style of epidemiological papers. The output in this vignette is

freq_by

The freq_by-function creates a frequency and percentage table commonly used in epidemiological papers. It is a wrapper for the htmlTables-package.

head(example_data)
## # A tibble: 6 x 10
##   carat       cut color clarity depth table price     x     y     z
##   <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
## 2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
## 3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
## 4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
## 5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48

The by_group variable must be of class factor, to ensure correct order. The argument min_cell_count sets the minimum number of observations in a cell. It is inteded to preserve anonymity in case of sensitive data. When the cell count is <= min_cell_count, the number is deleted from that cell.

It is possible to change the font using the font_css argument, but the font needs to be web-friendly. Below the font is set to “font-family: Times”:

freq_by(dataset = example_data, var_vector = c("color", "clarity"), by_group = "cut", font_css = "font-family: Times" )
  Total  Fair  Good  Very Good  Premium  Ideal
  n (%)   n (%)   n (%)   n (%)   n (%)   n (%)
Color
D   6 775 13%   163 10%   662 13%   1 513 13%   1 603 12%   2 834 13%
E   9 797 18%   224 14%   933 19%   2 400 20%   2 337 17%   3 903 18%
F   9 542 18%   312 19%   909 19%   2 164 18%   2 331 17%   3 826 18%
G   11 292 21%   314 20%   871 18%   2 299 19%   2 924 21%   4 884 23%
H   8 304 15%   303 19%   702 14%   1 824 15%   2 360 17%   3 115 14%
I   5 422 10%   175 11%   522 11%   1 204 10%   1 428 10%   2 093 10%
J   2 808 5%   119 7%   307 6%   678 6%   808 6%   896 4%
Clarity
I1   741 1%   210 13%   96 2%   84 1%   205 1%   146 1%
SI2   9 194 17%   466 29%   1 081 22%   2 100 17%   2 949 21%   2 598 12%
SI1   13 065 24%   408 25%   1 560 32%   3 240 27%   3 575 26%   4 282 20%
VS2   12 258 23%   261 16%   978 20%   2 591 21%   3 357 24%   5 071 24%
VS1   8 171 15%   170 11%   648 13%   1 775 15%   1 989 14%   3 589 17%
VVS2   5 066 9%   69 4%   286 6%   1 235 10%   870 6%   2 606 12%
VVS1   3 655 7%   17 1%   186 4%   789 7%   616 4%   2 047 9%
IF   1 790 3%   <10 -   71 1%   268 2%   230 2%   1 212 6%

model_to_html

The model_to_html-function can print simple models to HTML, also printing the reference levels of the categorical covariates. It does not support variables of class ordered.

Printing a coxph-model

 df         <- survival::lung
 df$age_bin <- Hmisc::cut2( df$age, g = 5)
 df$ph_bin  <- Hmisc::cut2( df$ph.karno, g = 5)
 df$sex     <- factor( df$sex)
 model1      <- survival::coxph( survival::Surv( time = time, event = status==1) ~ age_bin + factor(sex) + ph_bin + wt.loss, data = df )
 model_to_html(model1, exponentiate = TRUE )
  HR CI
Age_bin
[39,56) 1 Ref
[56,61) 0.71 [ 0.33, 1.51]
[61,67) 1.12 [ 0.55, 2.30]
[67,72) 0.68 [ 0.30, 1.55]
[72,82] 0.74 [ 0.28, 1.97]
Factor(Sex)
1 1 Ref
2 1.86 [ 1.08, 3.18]
Ph_bin
[ 50, 80) 1 Ref
80 3.05 [ 1.12, 8.29]
90 2.63 [ 1.02, 6.73]
100 2.88 [ 0.94, 8.84]
 
wt.loss 1.00 [ 0.98, 1.02]

Printing a logistic regression

 diamonds <- ggplot2::diamonds
 diamonds$color <- factor(diamonds$color, ordered = FALSE)
 diamonds$clarity <- factor(diamonds$clarity, ordered = FALSE)
 glm_logistic <- glm( cut=="Ideal" ~  color + clarity + x , data = diamonds, family = "binomial")
 model_to_html(glm_logistic, exponentiate = TRUE)
  OR CI
Color
D 1 Ref
E 0.86 [ 0.81, 0.92]
F 0.88 [ 0.82, 0.93]
G 0.95 [ 0.89, 1.01]
H 0.88 [ 0.82, 0.94]
I 0.96 [ 0.89, 1.04]
J 0.78 [ 0.71, 0.86]
Clarity
I1 1 Ref
SI2 1.49 [ 1.24, 1.80]
SI1 1.67 [ 1.39, 2.02]
VS2 2.32 [ 1.94, 2.81]
VS1 2.54 [ 2.11, 3.08]
VVS2 3.20 [ 2.65, 3.89]
VVS1 3.68 [ 3.03, 4.48]
IF 6.07 [ 4.93, 7.50]
 
x 0.82 [ 0.81, 0.84]

Printing a general linear regression

 glm_linear <- glm( Sepal.Width ~  Petal.Width + Sepal.Length + Petal.Length +  Species, data = iris)
 model_to_html(glm_linear) 
  β CI
Species
setosa Ref Ref
versicolor -1.16 [-1.54,-0.78]
virginica -1.40 [-1.94,-0.86]
 
Petal.Width 0.63 [ 0.38, 0.87]
 
Sepal.Length 0.38 [ 0.25, 0.51]
 
Petal.Length -0.19 [-0.35,-0.02]

Printing a two lists of models

The example below tages two lists of models. One list of univariate models, and another list of multivariate models. The multivariate models include incrementally more variables, so that the final multivariate model include all the variables used in the univariate models.

# univariate models in a list:
c("Age" , "Embarked" , "Sex" , "Fare" , "Pclass") %>%
  paste0( "Survived ~ ", . ) %>%
  purrr::map( ~ glm( as.formula(.), data = titanic, family = "binomial" )) -> univar_list

# multivariate models in a list:
glm_logistic_1 <- glm( Survived ~  Age + Embarked, data = titanic, family = "binomial")
glm_logistic_2 <- glm( Survived ~  Age + Embarked + Sex + Fare, data = titanic, family = "binomial")
glm_logistic_3 <- glm( Survived ~  Age + Embarked + Sex + Fare + Pclass, data = titanic, family = "binomial")
multi_model_list <- list(glm_logistic_1, glm_logistic_2,  glm_logistic_3 )

model_to_html(univariate_models_list = univar_list,
              multivariate_models_list = multi_model_list,
              exponentiate = TRUE ) 
  Univariate models  Model 1  Model 2  Model 3
    OR CI   OR CI   OR CI   OR CI
 
Age   0.99 [ 0.98, 1.00]   0.99 [ 0.98, 1.00]   0.99 [ 0.98, 1.00]   0.96 [ 0.95, 0.98]
Embarked
Cherbourg   1 Ref   1 Ref   1 Ref   1 Ref
Queenstown   0.51 [ 0.29, 0.89]   0.25 [ 0.09, 0.58]   0.25 [ 0.08, 0.69]   0.44 [ 0.14, 1.32]
Southampton   0.41 [ 0.29, 0.58]   0.36 [ 0.24, 0.53]   0.49 [ 0.30, 0.80]   0.61 [ 0.36, 1.03]
Sex
female   1 Ref     1 Ref   1 Ref
male   0.08 [ 0.06, 0.11]     0.09 [ 0.06, 0.14]   0.08 [ 0.05, 0.12]
 
Fare   1.02 [ 1.01, 1.02]     1.01 [ 1.01, 1.02]   1.00 [ 1.00, 1.00]
Pclass
1st.   1 Ref       1 Ref
2nd.   0.53 [ 0.35, 0.79]       0.32 [ 0.17, 0.59]
3rd.   0.19 [ 0.13, 0.26]       0.09 [ 0.05, 0.17]

To do in a possible future:

  • Add totals to each of the by_groups
  • option to specify better names to the rgroup, not just the variable names.
  • include other summary statistics