| Title: | Miscellaneous Statistical Functions Used in 'guide-R' |
|---|---|
| Description: | Companion package for the manual 'guide-R : Guide pour l’analyse de données d’enquêtes avec R' available at <https://larmarange.github.io/guide-R/>. 'guideR' implements miscellaneous functions introduced in 'guide-R' to facilitate statistical analysis and manipulation of survey data. |
| Authors: | Joseph Larmarange [aut, cre] (ORCID: <https://orcid.org/0000-0001-7097-700X>) |
| Maintainer: | Joseph Larmarange <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.9.0.9000 |
| Built: | 2026-05-09 07:39:16 UTC |
| Source: | https://github.com/larmarange/guider |
step()
Add potential relevant interactions to a model usind
stats::step(). The
function extract the formula of the model, identifies all potential
interactions and pass them as the upper component of the scope argument
to stats::step(). The current model formula is passed as the lower
component of scope.
add_interactions_by_step(model, ...) ## Default S3 method: add_interactions_by_step(model, ...)add_interactions_by_step(model, ...) ## Default S3 method: add_interactions_by_step(model, ...)
model |
A model object. |
... |
Additional parameters passed to |
The stepwise-selected model.
mod <- glm(as.factor(Survived) ~ ., data = titanic, family = binomial()) mod |> add_interactions_by_step()mod <- glm(as.factor(Survived) ~ ., data = titanic, family = binomial()) mod |> add_interactions_by_step()
Considering a multiple answers question coded as several binary variables (one per item), create a new variable (list column or character) combining all positive answers. If defined, use variable labels (see examples).
combine_answers(data, answers, into, value = NULL, sep = NULL)combine_answers(data, answers, into, value = NULL, sep = NULL)
data |
A data frame, data frame extension (e.g. a tibble), or a survey design object. |
answers |
< |
into |
Names of new variables to create as character vector. |
value |
Value indicating a positive answer. By default, will use the maximum observed value and will display a message. |
sep |
An optional character string to separate the results and return a
character. If |
If NA is observed for at least one item, return NA.
d <- dplyr::tibble( q1a = sample(c("y", "n"), size = 200, replace = TRUE), q1b = sample(c("y", "n", "n", NA), size = 200, replace = TRUE), q1c = sample(c("y", "y", "n"), size = 200, replace = TRUE), q1d = sample("n", size = 200, replace = TRUE) ) d |> combine_answers(q1a:q1d, into = "combined") d |> combine_answers(q1a:q1d, into = "combined", sep = ", ", value = "y") d |> combine_answers(q1a:q1d, into = "combined", sep = " | ", value = "n") # works with survey objects d |> srvyr::as_survey() |> combine_answers(q1a:q1d, into = "combined")d <- dplyr::tibble( q1a = sample(c("y", "n"), size = 200, replace = TRUE), q1b = sample(c("y", "n", "n", NA), size = 200, replace = TRUE), q1c = sample(c("y", "y", "n"), size = 200, replace = TRUE), q1d = sample("n", size = 200, replace = TRUE) ) d |> combine_answers(q1a:q1d, into = "combined") d |> combine_answers(q1a:q1d, into = "combined", sep = ", ", value = "y") d |> combine_answers(q1a:q1d, into = "combined", sep = " | ", value = "n") # works with survey objects d |> srvyr::as_survey() |> combine_answers(q1a:q1d, into = "combined")
Convenient function to quickly cut a numeric vector into quartiles, i.e. by
applying cut(x, breaks = fivenum(x)). Variable label is preserved by
cut_quartiles().
cut_quartiles(x, include.lowest = TRUE, ...)cut_quartiles(x, include.lowest = TRUE, ...)
x |
a numeric vector which is to be converted to a factor by cutting. |
include.lowest |
logical, indicating if an ‘x[i]’ equal to
the lowest (or highest, for |
... |
further arguments passed to |
mtcars$mpg |> cut_quartiles() |> summary()mtcars$mpg |> cut_quartiles() |> summary()
gtsummary
A series of helpers for grouped tables generated by gtsummary::tbl_stack()
or gtsummary::tbl_regression() in case of multinomial models,
multi-components models or other grouped results.
grouped_tbl_pivot_wider() allows to display results in a a wide format,
with one set of columns per group. multinom_add_global_p_pivot_wider() is
a specific case for multinomial models, when displaying global p-values in a
wide format: it calls gtsummary::add_global_p(), followed by
grouped_tbl_pivot_wider(), and then keep only the last column with p-values
(see examples). Finally, as grouped regression tables doesn't have exactly
the same structure as ungrouped tables, functions as
gtsummary::bold_labels() do not always work properly. If the grouped table
is kept in a long format, style_grouped_tbl() could be use to improve the
output by styling variable labels, levels and/or group names.
TO BE NOTED: to style group names, style_grouped_tbl() convert the
table into a gt object with gtsummary::as_gt(). This function should
therefore be used last. If the table is intended to be exported to another
format, do not use style_grouped_tbl().
grouped_tbl_pivot_wider(x) multinom_add_global_p_pivot_wider( x, ..., p_value_header = "**Likelihood-ratio test**" ) style_grouped_tbl( x, bold_groups = TRUE, uppercase_groups = TRUE, bold_labels = FALSE, italicize_labels = TRUE, indent_labels = 4L, bold_levels = FALSE, italicize_levels = FALSE, indent_levels = 8L )grouped_tbl_pivot_wider(x) multinom_add_global_p_pivot_wider( x, ..., p_value_header = "**Likelihood-ratio test**" ) style_grouped_tbl( x, bold_groups = TRUE, uppercase_groups = TRUE, bold_labels = FALSE, italicize_labels = TRUE, indent_labels = 4L, bold_levels = FALSE, italicize_levels = FALSE, indent_levels = 8L )
x |
A grouped table generated with |
... |
Additional arguments passed to |
p_value_header |
Header for the p-value column. |
bold_groups |
Bold group group names? |
uppercase_groups |
Convert group names to upper case? |
bold_labels |
Bold variable labels? |
italicize_labels |
Italicize variable labels? |
indent_labels |
Number of spaces to indent variable labels. |
bold_levels |
Bold levels? |
italicize_levels |
Italicize levels? |
indent_levels |
Number of spaces to indent levels. |
A gtsummary or a gt table.
mod <- nnet::multinom( grade ~ stage + marker + age, data = gtsummary::trial, trace = FALSE ) tbl <- mod |> gtsummary::tbl_regression(exponentiate = TRUE) tbl tbl |> grouped_tbl_pivot_wider() tbl |> multinom_add_global_p_pivot_wider() |> gtsummary::bold_labels() tbl |> style_grouped_tbl() t1 <- gtsummary::trial |> gtsummary::tbl_summary(include = grade, by = trt) t2 <- gtsummary::trial |> gtsummary::tbl_summary(include = stage, by = trt) gtsummary::tbl_stack(list(t1, t2), group_header = c("Table 1", "Table 2")) |> style_grouped_tbl()mod <- nnet::multinom( grade ~ stage + marker + age, data = gtsummary::trial, trace = FALSE ) tbl <- mod |> gtsummary::tbl_regression(exponentiate = TRUE) tbl tbl |> grouped_tbl_pivot_wider() tbl |> multinom_add_global_p_pivot_wider() |> gtsummary::bold_labels() tbl |> style_grouped_tbl() t1 <- gtsummary::trial |> gtsummary::tbl_summary(include = grade, by = trt) t2 <- gtsummary::trial |> gtsummary::tbl_summary(include = stage, by = trt) gtsummary::tbl_stack(list(t1, t2), group_header = c("Table 1", "Table 2")) |> style_grouped_tbl()
gtsummary
See gtsummary::tests for more details on how defining custom tests.
fisher.simulate.p() implements Fisher test with computation of p-values by
Monte Carlo simulation in larger than 2×2 tables (see
stats::fisher.test()).
svyttest_oneway() is designed to compare means between sub-groups for
survey objects. It is based on survey::svyttest() for comparing 2 means,
and on svyoneway() for comparing 3 means or more.
fisher.simulate.p(data, variable, by, ...) svyttest_oneway(data, variable, by, ...)fisher.simulate.p(data, variable, by, ...) svyttest_oneway(data, variable, by, ...)
data |
A data set. |
variable |
Name of the variable to test. |
by |
Name of the by variable. |
... |
Unused. |
library(gtsummary) trial |> tbl_summary(include = grade, by = trt) |> add_p(test = all_categorical() ~ "fisher.simulate.p") iris |> srvyr::as_survey() |> tbl_svysummary( include = Petal.Length, by = Species ) |> add_p(test = all_continuous() ~ svyttest_oneway)library(gtsummary) trial |> tbl_summary(include = grade, by = trt) |> add_p(test = all_categorical() ~ "fisher.simulate.p") iris |> srvyr::as_survey() |> tbl_svysummary( include = Petal.Length, by = Species ) |> add_p(test = all_continuous() ~ svyttest_oneway)
gtsummary
Additional themes for tables generated with gtsummary.
theme_gtsummary_prop_n( prop_stat = "{p}% ({n})", prop_digits = 1, mean_sd = FALSE, cont_digits = 1, missing_text = NULL, overall_string = NULL, set_theme = TRUE ) theme_gtsummary_fisher_simulate_p(set_theme = TRUE) theme_gtsummary_unweighted_n( n_unweighted_prefix = "", n_unweighted_suffix = " obs.", prop_digits = 1, mean_sd = FALSE, cont_digits = 1, missing_text = NULL, overall_string = NULL, set_theme = TRUE ) theme_gtsummary_bold_labels(set_theme = TRUE)theme_gtsummary_prop_n( prop_stat = "{p}% ({n})", prop_digits = 1, mean_sd = FALSE, cont_digits = 1, missing_text = NULL, overall_string = NULL, set_theme = TRUE ) theme_gtsummary_fisher_simulate_p(set_theme = TRUE) theme_gtsummary_unweighted_n( n_unweighted_prefix = "", n_unweighted_suffix = " obs.", prop_digits = 1, mean_sd = FALSE, cont_digits = 1, missing_text = NULL, overall_string = NULL, set_theme = TRUE ) theme_gtsummary_bold_labels(set_theme = TRUE)
prop_stat |
( |
prop_digits |
(non-negative |
mean_sd |
(scalar |
cont_digits |
(non-negative |
missing_text |
( |
overall_string |
( |
set_theme |
(scalar |
n_unweighted_prefix, n_unweighted_suffix
|
( |
theme_gtsummary_prop_n() displays, by default, proportions before the
number of observations (between brackets). This function cannot be used
simultaneously with gtsummary::theme_gtsummary_mean_sd(), but you can use
the mean_sd = TRUE option of theme_gtsummary_prop_n().
theme_gtsummary_prop_n() also modifies default method for
gtsummary::add_ci.tbl_summary() ("wilson" for categorical variables,
"t.test", i.e. mean confidence interval, for continuous variables if
mean_sd = TRUE, "wilcox.test", i.e. confidence interval of the
pseudomedian, for continuous variables if mean_sd = FALSE).
Finally, theme_gtsummary_prop_n() also modifies default tests for
gtsummary::add_p.tbl_summary() for continuous variables if
mean_sd = TRUE ("t.test" for comparing 2 groups, or "oneway.test" for
3 groups or more). If mean_sd = FALSE, the default tests for continuous
variables remain "wilcox.test" (2 groups) or "kruskal.test" (3 groups
or more). For categorical variables, "chisq.test.no.correct" and
"fisher.test" are used by default.
See theme_gtsummary_fisher_simulate_p() to change the default test for
categorical variables.
theme_gtsummary_fisher_simulate_p() modify the default test used for
categorical variables by Fisher test, with computation of p-values by
Monte Carlo simulation in larger than 2×2 tables.
theme_gtsummary_unweighted_n() modifies default values of tables returned
by gtsummary::tbl_svysummary() and displays the unweighted number of
observations instead of the weighted n.
theme_gtsummary_unweighted_n() also modifies default method for
gtsummary::add_ci.tbl_svysummary() ("svyprop.logit" for categorical
variables, "svymean", i.e. mean confidence interval, for continuous
variables if mean_sd = TRUE, "svymedian.mean", i.e. confidence interval
of the median, for continuous variables if mean_sd = FALSE).
Finally, theme_gtsummary_unweighted_n() also modifies default tests for
gtsummary::add_p.tbl_svysummary() for continuous variables if
mean_sd = TRUE (svyttest_oneway which calls survey::svyttest() for
comparing 2 means and svyoneway() for comparing 3 means or more).
If mean_sd = FALSE, the default tests for continuous
variables remain "svy.wilcox.test" which used a designed-based Wilcoxon
test (2 groups) or Kruskal-Wallis test (3 groups or more). For categorical
variables, "svy.chisq.test"is used by default.
theme_gtsummary_bold_labels() applies automatically
gtsummary::bold_labels() to all tables generated with gtsummary.
library(gtsummary) trial |> tbl_summary(include = c(grade, age), by = trt) |> add_p() theme_gtsummary_prop_n(mean_sd = TRUE) theme_gtsummary_fisher_simulate_p() theme_gtsummary_bold_labels() trial |> tbl_summary(include = c(grade, age), by = trt) |> add_p() data("api", package = "survey") apistrat$both[1:5] <- NA apistrat |> srvyr::as_survey(strata = stype, weights = pw) |> tbl_svysummary(include = c(stype, both), by = awards) |> add_overall() theme_gtsummary_unweighted_n() apistrat |> srvyr::as_survey(strata = stype, weights = pw) |> tbl_svysummary(include = c(stype, both), by = awards) |> add_overall() gtsummary::reset_gtsummary_theme()library(gtsummary) trial |> tbl_summary(include = c(grade, age), by = trt) |> add_p() theme_gtsummary_prop_n(mean_sd = TRUE) theme_gtsummary_fisher_simulate_p() theme_gtsummary_bold_labels() trial |> tbl_summary(include = c(grade, age), by = trt) |> add_p() data("api", package = "survey") apistrat$both[1:5] <- NA apistrat |> srvyr::as_survey(strata = stype, weights = pw) |> tbl_svysummary(include = c(stype, both), by = awards) |> add_overall() theme_gtsummary_unweighted_n() apistrat |> srvyr::as_survey(strata = stype, weights = pw) |> tbl_svysummary(include = c(stype, both), by = awards) |> add_overall() gtsummary::reset_gtsummary_theme()
gtsummary
Utilities for tables generated with gtsummary.
bold_variable_group_headers(x) italicize_variable_group_headers(x) indent_levels(x, indent = 8L) indent_labels(x, indent = 4L)bold_variable_group_headers(x) italicize_variable_group_headers(x) indent_levels(x, indent = 8L) indent_labels(x, indent = 4L)
x |
A |
indent |
An integer indicating how many space to indent text. |
gtsummary::modify_bold(), gtsummary::modify_italic(),
gtsummary::modify_indent()
library(gtsummary) tbl <- trial |> tbl_summary( include = c(stage, grade, age, trt, response, death) ) |> add_variable_group_header( header = "Clinical situation at diagnosis", variables = c(stage, grade, age) ) |> add_variable_group_header( header = "Treatment and outcome", variables = c(trt, response, death) ) tbl tbl |> bold_variable_group_headers() |> italicize_labels() |> indent_levels(indent = 8L)library(gtsummary) tbl <- trial |> tbl_summary( include = c(stage, grade, age, trt, response, death) ) |> add_variable_group_header( header = "Clinical situation at diagnosis", variables = c(stage, grade, age) ) |> add_variable_group_header( header = "Treatment and outcome", variables = c(trt, response, death) ) tbl tbl |> bold_variable_group_headers() |> italicize_labels() |> indent_levels(indent = 8L)
This function uses renv::dependencies() to identify R package dependencies
in a project and then calls pak::pkg_install() to install / update these
packages. If some packages are not found, the function will install those
available and returns a message indicated packages not installed/updated.
install_dependencies(dependencies = NULL, ask = TRUE)install_dependencies(dependencies = NULL, ask = TRUE)
dependencies |
An optional list of dependencies. If |
ask |
Whether to ask for confirmation when installing a different version of a package that is already installed. Installations that only add new packages never require confirmation. |
(Invisibly) A data frame with information about the installed package(s).
## Not run: install_dependencies() ## End(Not run)## Not run: install_dependencies() ## End(Not run)
NA as values to be comparedis_different() and is_equal() performs comparison tests, considering
NA values as legitimate values (see examples).
is_different(x, y) is_equal(x, y) cumdifferent(x) num_cycle(x)is_different(x, y) is_equal(x, y) cumdifferent(x) num_cycle(x)
x, y
|
Vectors to be compared. |
cum_different() allows to identify groups of continuous rows that have
the same value. num_cycle() could be used to identify sub-groups that
respect a certain condition (see examples).
is_equal(x, y) is equivalent to
(x == y & !is.na(x) & !is.na(y)) | (is.na(x) & is.na(y)), and
is_different(x, y) is equivalent to
(x != y & !is.na(x) & !is.na(y)) | xor(is.na(x), is.na(y)).
A vector of the same length as x.
v <- c("a", "b", NA) is_different(v, "a") is_different(v, NA) is_equal(v, "a") is_equal(v, NA) d <- dplyr::tibble(group = c("a", "a", "b", "b", "a", "b", "c", "a")) d |> dplyr::mutate( subgroup = cumdifferent(group), sub_a = num_cycle(group == "a") )v <- c("a", "b", NA) is_different(v, "a") is_different(v, NA) is_equal(v, "a") is_equal(v, NA) d <- dplyr::tibble(group = c("a", "a", "b", "b", "a", "b", "c", "a")) d |> dplyr::mutate( subgroup = cumdifferent(group), sub_a = num_cycle(group == "a") )
Add leading zeros
leading_zeros(x, left_digits = NULL, digits = 0, prefix = "", suffix = "", ...)leading_zeros(x, left_digits = NULL, digits = 0, prefix = "", suffix = "", ...)
x |
a numeric vector |
left_digits |
number of digits before decimal point, automatically computed if not provided |
digits |
number of digits after decimal point |
prefix, suffix
|
Symbols to display before and after value |
... |
additional parameters passed to |
A character vector of the same length as x.
base::formatC(), base::sprintf()
v <- c(2, 103.24, 1042.147, 12.4566, NA) leading_zeros(v) leading_zeros(v, digits = 1) leading_zeros(v, left_digits = 6, big.mark = " ") leading_zeros(c(0, 6, 12, 18), prefix = "M")v <- c(2, 103.24, 1042.147, 12.4566, NA) leading_zeros(v) leading_zeros(v, digits = 1) leading_zeros(v, left_digits = 6, big.mark = " ") leading_zeros(c(0, 6, 12, 18), prefix = "M")
Transform a data frame from long format to period format
long_to_periods(data, id, start, stop = NULL, by = NULL)long_to_periods(data, id, start, stop = NULL, by = NULL)
data |
A data frame, or a data frame extension (e.g. a tibble). |
id |
< |
start |
< |
stop |
< |
by |
< |
A tibble.
d <- dplyr::tibble( patient = c(1, 2, 3, 3, 4, 4, 4), begin = c(0, 0, 0, 1, 0, 36, 39), end = c(50, 6, 1, 16, 36, 39, 45), covar = c("no", "no", "no", "yes", "no", "yes", "yes") ) d d |> long_to_periods(id = patient, start = begin, stop = end) d |> long_to_periods(id = patient, start = begin, stop = end, by = covar) # If stop not provided, it is deduced. # However, it considers that observation ends at the last start time. d |> long_to_periods(id = patient, start = begin)d <- dplyr::tibble( patient = c(1, 2, 3, 3, 4, 4, 4), begin = c(0, 0, 0, 1, 0, 36, 39), end = c(50, 6, 1, 16, 36, 39, 45), covar = c("no", "no", "no", "yes", "no", "yes", "yes") ) d d |> long_to_periods(id = patient, start = begin, stop = end) d |> long_to_periods(id = patient, start = begin, stop = end, by = covar) # If stop not provided, it is deduced. # However, it considers that observation ends at the last start time. d |> long_to_periods(id = patient, start = begin)
Transform a data frame from long format to a sequence obect
long_to_seq( data, id, time, outcome, alphabet = "auto", labels = "auto", cnames = "auto", cpal = "auto", missing.color = "#BBBBBB", ... )long_to_seq( data, id, time, outcome, alphabet = "auto", labels = "auto", cnames = "auto", cpal = "auto", missing.color = "#BBBBBB", ... )
data |
A data frame or a data frame extension (e.g. a tibble). |
id |
< |
time |
< |
outcome |
< |
alphabet |
Optional vector containing the alphabet (the list of all
possible states).
If |
labels |
An optional vector containing state labels used for graphics.
If |
cnames |
An optional vector containing names of the different time
points. If |
cpal |
An optional colour palette for representing the states in the
graphics. If |
missing.color |
Alternative colour for representing missing values inside the sequences. |
... |
Additional arguments passed to |
An object of class stslist.
library(TraMineR) # generating a data frame in long format data("biofam") d <- biofam |> dplyr::mutate(id_ind = rownames(biofam)) |> dplyr::select(id_ind, dplyr::starts_with("a")) |> tidyr::pivot_longer( cols = dplyr::starts_with("a"), names_to = "age", names_prefix = "a", values_to = "life_state" ) |> dplyr::mutate( age = as.integer(age), life_state2 = dplyr::case_when( life_state == 0 ~ "P", life_state == 1 ~ "L", life_state == 2 ~ "M", life_state == 3 ~ "LM", life_state == 4 ~ "C", life_state == 5 ~ "LC", life_state == 6 ~ "LMC", life_state == 7 ~ "D" ) ) |> labelled::set_value_labels( life_state = c( "Parent" = 0, "Left" = 1, "Married" = 2, "Left & Married" = 3, "Child" = 4, "Left & Child" = 5, "Left & Married & Child" = 6, "Divorced" = 7 ), life_state2 = c( "Parent" = "P", "Left" = "L", "Married" = "M", "Left & Married" = "LM", "Child" = "C", "Left & Child" = "LC", "Left & Married & Child" = "LMC", "Divorced" = "D" ) ) |> dplyr::mutate( life_state3 = labelled::to_factor(life_state), life_state4 = unclass(life_state2) ) d |> long_to_seq(id = id_ind, time = age, outcome = life_state) |> head(10) d |> long_to_seq(id = id_ind, time = age, outcome = life_state2) |> head(10) d |> long_to_seq(id = id_ind, time = age, outcome = life_state3) |> head(10) d |> long_to_seq(id = id_ind, time = age, outcome = life_state4) |> head(10)library(TraMineR) # generating a data frame in long format data("biofam") d <- biofam |> dplyr::mutate(id_ind = rownames(biofam)) |> dplyr::select(id_ind, dplyr::starts_with("a")) |> tidyr::pivot_longer( cols = dplyr::starts_with("a"), names_to = "age", names_prefix = "a", values_to = "life_state" ) |> dplyr::mutate( age = as.integer(age), life_state2 = dplyr::case_when( life_state == 0 ~ "P", life_state == 1 ~ "L", life_state == 2 ~ "M", life_state == 3 ~ "LM", life_state == 4 ~ "C", life_state == 5 ~ "LC", life_state == 6 ~ "LMC", life_state == 7 ~ "D" ) ) |> labelled::set_value_labels( life_state = c( "Parent" = 0, "Left" = 1, "Married" = 2, "Left & Married" = 3, "Child" = 4, "Left & Child" = 5, "Left & Married & Child" = 6, "Divorced" = 7 ), life_state2 = c( "Parent" = "P", "Left" = "L", "Married" = "M", "Left & Married" = "LM", "Child" = "C", "Left & Child" = "LC", "Left & Married & Child" = "LMC", "Divorced" = "D" ) ) |> dplyr::mutate( life_state3 = labelled::to_factor(life_state), life_state4 = unclass(life_state2) ) d |> long_to_seq(id = id_ind, time = age, outcome = life_state) |> head(10) d |> long_to_seq(id = id_ind, time = age, outcome = life_state2) |> head(10) d |> long_to_seq(id = id_ind, time = age, outcome = life_state3) |> head(10) d |> long_to_seq(id = id_ind, time = age, outcome = life_state4) |> head(10)
mean_sd() lets you quickly compute mean and standard deviation by
sub-groups. Use .conf.int = TRUE to also return confidence intervals of the
mean.
mean_sd(data, ...) ## S3 method for class 'data.frame' mean_sd( data, ..., .by = NULL, .drop = FALSE, .drop_na_by = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = NULL ) ## S3 method for class 'survey.design' mean_sd( data, ..., .by = NULL, .drop = FALSE, .drop_na_by = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = NULL ) ## Default S3 method: mean_sd( data, ..., .drop = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = NULL )mean_sd(data, ...) ## S3 method for class 'data.frame' mean_sd( data, ..., .by = NULL, .drop = FALSE, .drop_na_by = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = NULL ) ## S3 method for class 'survey.design' mean_sd( data, ..., .by = NULL, .drop = FALSE, .drop_na_by = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = NULL ) ## Default S3 method: mean_sd( data, ..., .drop = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = NULL )
data |
A vector, a data frame, data frame extension (e.g. a tibble), or a survey design object. |
... |
< |
.by |
< |
.drop |
If |
.drop_na_by |
If |
.conf.int |
If |
.conf.level |
Confidence level for the returned confidence intervals. |
.options |
Additional arguments passed to |
A tibble. Column "n" reports the number of valid observations
and "missing" the number of missing (NA) observations, unweighted for
survey objects.
A tibble with one row per group.
# using a vector iris$Petal.Length |> mean_sd() # one variable iris |> mean_sd(Petal.Length) iris |> mean_sd(Petal.Length, .conf.int = TRUE) iris |> mean_sd(Petal.Length, .by = Species) mtcars |> mean_sd(mpg, .by = c(cyl, gear)) # two variables iris |> mean_sd(Petal.Length, Petal.Width) iris |> mean_sd(dplyr::pick(dplyr::starts_with("Petal")), .by = Species) # missing values d <- iris d$Petal.Length[1:10] <- NA d |> mean_sd(Petal.Length) d |> mean_sd(Petal.Length, .by = Species) ## SURVEY DATA ------------------------------------------------------ ds <- srvyr::as_survey(iris) ds |> mean_sd(Petal.Length, .by = Species, .conf.int = TRUE)# using a vector iris$Petal.Length |> mean_sd() # one variable iris |> mean_sd(Petal.Length) iris |> mean_sd(Petal.Length, .conf.int = TRUE) iris |> mean_sd(Petal.Length, .by = Species) mtcars |> mean_sd(mpg, .by = c(cyl, gear)) # two variables iris |> mean_sd(Petal.Length, Petal.Width) iris |> mean_sd(dplyr::pick(dplyr::starts_with("Petal")), .by = Species) # missing values d <- iris d$Petal.Length[1:10] <- NA d |> mean_sd(Petal.Length) d |> mean_sd(Petal.Length, .by = Species) ## SURVEY DATA ------------------------------------------------------ ds <- srvyr::as_survey(iris) ds |> mean_sd(Petal.Length, .by = Species, .conf.int = TRUE)
median_iqr() lets you quickly compute median, quartiles and interquartile
range by sub-groups. Use .outliers = TRUE to also return whiskers and
outliers (see ggplot2::stat_boxplot()).
median_iqr(data, ...) ## S3 method for class 'data.frame' median_iqr( data, ..., .by = NULL, .drop = FALSE, .drop_na_by = FALSE, .outliers = FALSE ) ## S3 method for class 'survey.design' median_iqr( data, ..., .by = NULL, .drop = FALSE, .drop_na_by = FALSE, .outliers = FALSE ) ## Default S3 method: median_iqr(data, ..., .drop = FALSE, .outliers = FALSE)median_iqr(data, ...) ## S3 method for class 'data.frame' median_iqr( data, ..., .by = NULL, .drop = FALSE, .drop_na_by = FALSE, .outliers = FALSE ) ## S3 method for class 'survey.design' median_iqr( data, ..., .by = NULL, .drop = FALSE, .drop_na_by = FALSE, .outliers = FALSE ) ## Default S3 method: median_iqr(data, ..., .drop = FALSE, .outliers = FALSE)
data |
A vector, a data frame, data frame extension (e.g. a tibble), or a survey design object. |
... |
< |
.by |
< |
.drop |
If |
.drop_na_by |
If |
.outliers |
If |
A tibble. Column "n" reports the number of valid observations
and "missing" the number of missing (NA) observations, unweighted for
survey objects.
A tibble with one row per group.
# using a vector iris$Petal.Length |> median_iqr() # one variable iris |> median_iqr(Petal.Length) iris |> median_iqr(Petal.Length, .outliers = TRUE) iris |> median_iqr(Petal.Length, .by = Species) mtcars |> median_iqr(mpg, .by = c(cyl, gear)) # two variables iris |> median_iqr(Petal.Length, Petal.Width) iris |> median_iqr(dplyr::pick(dplyr::starts_with("Petal")), .by = Species) # missing values d <- iris d$Petal.Length[1:10] <- NA d |> median_iqr(Petal.Length) d |> median_iqr(Petal.Length, .by = Species) ## SURVEY DATA ------------------------------------------------------ ds <- srvyr::as_survey(iris) ds |> median_iqr(Petal.Length, .by = Species, .outliers = TRUE)# using a vector iris$Petal.Length |> median_iqr() # one variable iris |> median_iqr(Petal.Length) iris |> median_iqr(Petal.Length, .outliers = TRUE) iris |> median_iqr(Petal.Length, .by = Species) mtcars |> median_iqr(mpg, .by = c(cyl, gear)) # two variables iris |> median_iqr(Petal.Length, Petal.Width) iris |> median_iqr(dplyr::pick(dplyr::starts_with("Petal")), .by = Species) # missing values d <- iris d$Petal.Length[1:10] <- NA d |> median_iqr(Petal.Length) d |> median_iqr(Petal.Length, .by = Species) ## SURVEY DATA ------------------------------------------------------ ds <- srvyr::as_survey(iris) ds |> median_iqr(Petal.Length, .by = Species, .outliers = TRUE)
Plot observed vs predicted distribution of a fitted model
observed_vs_theoretical(model)observed_vs_theoretical(model)
model |
A statistical model. |
Has been tested with stats::lm() and stats::glm() models. It may work
with other types of models, but without any warranty.
A ggplot2 plot.
# a linear model mod <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) mod |> observed_vs_theoretical() # a logistic regression mod <- glm( as.factor(Survived) ~ Class + Sex, data = titanic, family = binomial() ) mod |> observed_vs_theoretical()# a linear model mod <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) mod |> observed_vs_theoretical() # a logistic regression mod <- glm( as.factor(Survived) ~ Class + Sex, data = titanic, family = binomial() ) mod |> observed_vs_theoretical()
Transform a data frame from period format to long format
periods_to_long( data, start, stop, time_step = 1, time_name = "time", keep = FALSE )periods_to_long( data, start, stop, time_step = 1, time_name = "time", keep = FALSE )
data |
A data frame, or a data frame extension (e.g. a tibble). |
start |
< |
stop |
< |
time_step |
(numeric) Desired value for the time variable. |
time_name |
(character) Name of the time variable. |
keep |
(logical) Should start and stop variable be kept in the results? |
A tibble.
d <- dplyr::tibble( patient = c(1, 2, 3, 3), begin = c(0, 2, 0, 3), end = c(6, 4, 2, 8), covar = c("no", "yes", "no", "yes") ) d d |> periods_to_long(start = begin, stop = end) d |> periods_to_long(start = begin, stop = end, time_step = 5)d <- dplyr::tibble( patient = c(1, 2, 3, 3), begin = c(0, 2, 0, 3), end = c(6, 4, 2, 8), covar = c("no", "yes", "no", "yes") ) d d |> periods_to_long(start = begin, stop = end) d |> periods_to_long(start = begin, stop = end, time_step = 5)
Plot one or several categorical variables by sub-groups. See proportion()
for more details on the way proportions and confidence intervals are
computed. Return a bar plot (see examples).
plot_categorical( data, outcome, na.rm = TRUE, by = NULL, drop_na_by = FALSE, convert_continuous = TRUE, ..., show_overall = TRUE, overall_label = "Overall", show_pvalues = TRUE, pvalues_test = c("fisher", "chisq"), pvalues_labeller = scales::label_pvalue(add_p = TRUE), pvalues_size = 3.5, pvalues_y = ifelse(flip, 1.05, 1), show_labels = TRUE, labels_labeller = scales::label_percent(1), labels_size = 3.5, labels_color = "auto", facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), flip = FALSE, minimal = FALSE, return_data = FALSE )plot_categorical( data, outcome, na.rm = TRUE, by = NULL, drop_na_by = FALSE, convert_continuous = TRUE, ..., show_overall = TRUE, overall_label = "Overall", show_pvalues = TRUE, pvalues_test = c("fisher", "chisq"), pvalues_labeller = scales::label_pvalue(add_p = TRUE), pvalues_size = 3.5, pvalues_y = ifelse(flip, 1.05, 1), show_labels = TRUE, labels_labeller = scales::label_percent(1), labels_size = 3.5, labels_color = "auto", facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), flip = FALSE, minimal = FALSE, return_data = FALSE )
data |
A data frame, data frame extension (e.g. a tibble), or a survey design object. |
outcome |
< |
na.rm |
Should |
by |
< |
drop_na_by |
Remove |
convert_continuous |
Should continuous by variables (with 5 unique
values or more) be converted to quartiles (using |
... |
Additional arguments passed to |
show_overall |
Display "Overall" column? |
overall_label |
Label for the overall column. |
show_pvalues |
Display p-values in the top-left corner? |
pvalues_test |
Test to compute p-values for data frames: |
pvalues_labeller |
Labeller function for p-values. |
pvalues_size |
Text size for p-values. |
pvalues_y |
Y position of p-values. |
show_labels |
Display proportion labels? |
labels_labeller |
Labeller function for labels. |
labels_size |
Size of labels. |
labels_color |
Color of labels. |
facet_labeller |
Labeller function for strip labels. |
flip |
Flip x and y axis? |
minimal |
Should a minimal theme be applied? (no y-axis, no grid) |
return_data |
Return computed data instead of the plot? |
titanic |> plot_categorical( Class, by = c(Age, Sex) ) titanic |> plot_categorical( Class, by = c(Age, Sex), show_overall = FALSE, flip = TRUE ) titanic |> plot_categorical( Class, by = c(Age, Sex), flip = TRUE, minimal = TRUE ) gtsummary::trial |> plot_categorical(grade, by = c(age, stage, trt)) gtsummary::trial |> plot_categorical(grade, by = c(age, stage, trt), drop_na_by = TRUE) gtsummary::trial |> plot_categorical(c(grade, stage), by = c(trt, response))titanic |> plot_categorical( Class, by = c(Age, Sex) ) titanic |> plot_categorical( Class, by = c(Age, Sex), show_overall = FALSE, flip = TRUE ) titanic |> plot_categorical( Class, by = c(Age, Sex), flip = TRUE, minimal = TRUE ) gtsummary::trial |> plot_categorical(grade, by = c(age, stage, trt)) gtsummary::trial |> plot_categorical(grade, by = c(age, stage, trt), drop_na_by = TRUE) gtsummary::trial |> plot_categorical(c(grade, stage), by = c(trt, response))
Plot one or several continuous variables by sub-groups. See median_iqr()
for more details on the way statistics are
computed. Return a box plot (see examples).
plot_continuous( data, outcome, by = NULL, drop_na_by = FALSE, convert_continuous = TRUE, ..., show_overall = TRUE, overall_label = "Overall", show_pvalues = TRUE, pvalues_labeller = scales::label_pvalue(add_p = TRUE), pvalues_size = 3.5, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), flip = FALSE, minimal = FALSE, free_scale = FALSE, return_data = FALSE )plot_continuous( data, outcome, by = NULL, drop_na_by = FALSE, convert_continuous = TRUE, ..., show_overall = TRUE, overall_label = "Overall", show_pvalues = TRUE, pvalues_labeller = scales::label_pvalue(add_p = TRUE), pvalues_size = 3.5, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), flip = FALSE, minimal = FALSE, free_scale = FALSE, return_data = FALSE )
data |
A data frame, data frame extension (e.g. a tibble), or a survey design object. |
outcome |
< |
by |
< |
drop_na_by |
Remove |
convert_continuous |
Should continuous by variables (with 5 unique
values or more) be converted to quartiles (using |
... |
Additional arguments passed to |
show_overall |
Display "Overall" column? |
overall_label |
Label for the overall column. |
show_pvalues |
Display p-values in the top-left corner? p-values are
computed with |
pvalues_labeller |
Labeller function for p-values. |
pvalues_size |
Text size for p-values. |
facet_labeller |
Labeller function for strip labels. |
flip |
Flip x and y axis? |
minimal |
Should a minimal theme be applied? (no y-axis, no grid) |
free_scale |
Allow y axis to vary between conditions? |
return_data |
Return computed data instead of the plot? |
iris |> plot_continuous(Petal.Length, by = Species) iris |> plot_continuous( dplyr::starts_with("Petal"), by = Species, free_scale = TRUE, fill = "lightblue", outlier.color = "red" ) mtcars |> plot_continuous( mpg, by = c(cyl, gear), flip = TRUE, mapping = ggplot2::aes(fill = by) ) # works with continuous by variables mtcars |> plot_continuous( mpg, by = c(disp, drat), flip = TRUE, minimal = TRUE ) # works with survey object iris |> srvyr::as_survey() |> plot_continuous( Petal.Length, by = c(Species, Petal.Width), flip = TRUE )iris |> plot_continuous(Petal.Length, by = Species) iris |> plot_continuous( dplyr::starts_with("Petal"), by = Species, free_scale = TRUE, fill = "lightblue", outlier.color = "red" ) mtcars |> plot_continuous( mpg, by = c(cyl, gear), flip = TRUE, mapping = ggplot2::aes(fill = by) ) # works with continuous by variables mtcars |> plot_continuous( mpg, by = c(disp, drat), flip = TRUE, minimal = TRUE ) # works with survey object iris |> srvyr::as_survey() |> plot_continuous( Petal.Length, by = c(Species, Petal.Width), flip = TRUE )
Plot inertia, absolute loss and relative loss from a classification tree
plot_inertia_from_tree(tree, k_max = 15) get_inertia_from_tree(tree, k_max = 15)plot_inertia_from_tree(tree, k_max = 15) get_inertia_from_tree(tree, k_max = 15)
tree |
A dendrogram, i.e. an stats::hclust object,
an FactoMineR::HCPC object or an object that can be converted to an
stats::hclust object with |
k_max |
Maximum number of clusters to return / plot. |
A ggplot2 plot or a tibble.
hc <- hclust(dist(USArrests)) get_inertia_from_tree(hc) plot_inertia_from_tree(hc)hc <- hclust(dist(USArrests)) get_inertia_from_tree(hc) plot_inertia_from_tree(hc)
Plot one or several means by sub-groups. See mean_sd() for more details on
the way means and confidence intervals are computed.
By default, return a point plot, but other geometries could be used
(see examples).
plot_means( data, outcome, by = NULL, drop_na_by = FALSE, convert_continuous = TRUE, geom = "point", ..., show_overall = TRUE, overall_label = "Overall", show_ci = TRUE, conf_level = 0.95, ci_color = "black", show_pvalues = TRUE, pvalues_labeller = scales::label_pvalue(add_p = TRUE), pvalues_size = 3.5, show_labels = TRUE, label_y = NULL, labels_labeller = scales::label_number(0.1), labels_size = 3.5, labels_color = "black", show_overall_line = FALSE, overall_line_type = "dashed", overall_line_color = "black", overall_line_width = 0.5, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), flip = FALSE, minimal = FALSE, free_scale = FALSE, return_data = FALSE )plot_means( data, outcome, by = NULL, drop_na_by = FALSE, convert_continuous = TRUE, geom = "point", ..., show_overall = TRUE, overall_label = "Overall", show_ci = TRUE, conf_level = 0.95, ci_color = "black", show_pvalues = TRUE, pvalues_labeller = scales::label_pvalue(add_p = TRUE), pvalues_size = 3.5, show_labels = TRUE, label_y = NULL, labels_labeller = scales::label_number(0.1), labels_size = 3.5, labels_color = "black", show_overall_line = FALSE, overall_line_type = "dashed", overall_line_color = "black", overall_line_width = 0.5, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), flip = FALSE, minimal = FALSE, free_scale = FALSE, return_data = FALSE )
data |
A data frame, data frame extension (e.g. a tibble), or a survey design object. |
outcome |
< |
by |
< |
drop_na_by |
Remove |
convert_continuous |
Should continuous by variables (with 5 unique
values or more) be converted to quartiles (using |
geom |
Geometry to use for plotting means ( |
... |
Additional arguments passed to the geom defined by |
show_overall |
Display "Overall" column? |
overall_label |
Label for the overall column. |
show_ci |
Display confidence intervals? |
conf_level |
Confidence level for the confidence intervals. |
ci_color |
Color of the error bars representing confidence intervals. |
show_pvalues |
Display p-values in the top-left corner? p-values are
computed with |
pvalues_labeller |
Labeller function for p-values. |
pvalues_size |
Text size for p-values. |
show_labels |
Display mean labels? |
label_y |
Y position of labels. If |
labels_labeller |
Labeller function for labels. |
labels_size |
Size of labels. |
labels_color |
Color of labels. |
show_overall_line |
Add an overall line? |
overall_line_type |
Line type of the overall line. |
overall_line_color |
Color of the overall line. |
overall_line_width |
Line width of the overall line. |
facet_labeller |
Labeller function for strip labels. |
flip |
Flip x and y axis? |
minimal |
Should a minimal theme be applied? (no y-axis, no grid) |
free_scale |
Allow y axis to vary between conditions? |
return_data |
Return computed data instead of the plot? |
iris |> plot_means(Petal.Length, by = Species) iris |> plot_means( dplyr::starts_with("Petal"), by = Species, geom = "bar", fill = "lightblue", show_overall_line = TRUE ) mtcars |> plot_means( mpg, by = c(cyl, gear), size = 3, colour = "plum", flip = TRUE ) # works with continuous by variables mtcars |> plot_means( mpg, by = c(disp, drat), fill = "plum", geom = "bar", flip = TRUE, minimal = TRUE ) # works with survey object iris |> srvyr::as_survey() |> plot_means( Petal.Length, by = c(Species, Petal.Width), label_y = -1, size = 3, mapping = ggplot2::aes(colour = by), flip = TRUE )iris |> plot_means(Petal.Length, by = Species) iris |> plot_means( dplyr::starts_with("Petal"), by = Species, geom = "bar", fill = "lightblue", show_overall_line = TRUE ) mtcars |> plot_means( mpg, by = c(cyl, gear), size = 3, colour = "plum", flip = TRUE ) # works with continuous by variables mtcars |> plot_means( mpg, by = c(disp, drat), fill = "plum", geom = "bar", flip = TRUE, minimal = TRUE ) # works with survey object iris |> srvyr::as_survey() |> plot_means( Petal.Length, by = c(Species, Petal.Width), label_y = -1, size = 3, mapping = ggplot2::aes(colour = by), flip = TRUE )
Considering a multiple answers question coded as several binary variables
(one per answer), plot the proportion of positive answers.
If combine_answers = FALSE, plot the proportion of positive answers of each
item, separately. If combine_answers = FALSE, combine the different answers
(see combine_answers()) and plot the proportion of each combination
(ggupset package required when
flip = FALSE).
See proportion() for more details on the way proportions and
confidence intervals are computed. By default, return a bar plot, but other
geometries could be used (see examples). If defined, use variable labels
(see examples).
plot_multiple_answers( data, answers = dplyr::everything(), value = NULL, by = NULL, combine_answers = FALSE, combine_sep = " | ", missing_label = " missing", none_label = "none", drop_na = FALSE, drop_na_by = FALSE, sort = c("none", "ascending", "descending", "degrees"), geom = "bar", ..., show_ci = TRUE, conf_level = 0.95, ci_color = "black", show_labels = TRUE, labels_labeller = scales::label_percent(1), labels_size = 3.5, labels_color = "black", flip = FALSE, return_data = FALSE ) plot_multiple_answers_dodge( data, answers = dplyr::everything(), value = NULL, by, combine_answers = FALSE, combine_sep = " | ", missing_label = " missing", none_label = "none", drop_na = FALSE, drop_na_by = FALSE, sort = c("none", "ascending", "descending", "degrees"), geom = c("bar", "point"), width = 0.75, ..., show_ci = TRUE, conf_level = 0.95, ci_color = "black", show_labels = TRUE, labels_labeller = scales::label_percent(1), labels_size = 3.5, labels_color = "black", flip = FALSE )plot_multiple_answers( data, answers = dplyr::everything(), value = NULL, by = NULL, combine_answers = FALSE, combine_sep = " | ", missing_label = " missing", none_label = "none", drop_na = FALSE, drop_na_by = FALSE, sort = c("none", "ascending", "descending", "degrees"), geom = "bar", ..., show_ci = TRUE, conf_level = 0.95, ci_color = "black", show_labels = TRUE, labels_labeller = scales::label_percent(1), labels_size = 3.5, labels_color = "black", flip = FALSE, return_data = FALSE ) plot_multiple_answers_dodge( data, answers = dplyr::everything(), value = NULL, by, combine_answers = FALSE, combine_sep = " | ", missing_label = " missing", none_label = "none", drop_na = FALSE, drop_na_by = FALSE, sort = c("none", "ascending", "descending", "degrees"), geom = c("bar", "point"), width = 0.75, ..., show_ci = TRUE, conf_level = 0.95, ci_color = "black", show_labels = TRUE, labels_labeller = scales::label_percent(1), labels_size = 3.5, labels_color = "black", flip = FALSE )
data |
A data frame, data frame extension (e.g. a tibble), or a survey design object. |
answers |
< |
value |
Value indicating a positive answer. By default, will use the maximum observed value and will display a message. |
by |
< |
combine_answers |
Should answers be combined? (see examples) |
combine_sep |
Character string to separate combined answers. |
missing_label |
When combining answers and
|
none_label |
When combining answers and |
drop_na |
Should any observation with a least one |
drop_na_by |
If TRUE, will remove any |
sort |
Should answers be sorted according to their proportion? They could also be sorted by degrees (number of elements) when combining answers. |
geom |
Geometry to use for plotting proportions ( |
... |
Additional arguments passed to the geom defined by |
show_ci |
Display confidence intervals? |
conf_level |
Confidence level for the confidence intervals. |
ci_color |
Color of the error bars representing confidence intervals. |
show_labels |
Display proportion labels? |
labels_labeller |
Labeller function for proportion labels. |
labels_size |
Size of proportion labels. |
labels_color |
Color of proportion labels. |
flip |
Flip x and y axis? |
return_data |
Return computed data instead of the plot? |
width |
Dodging width. |
If drop_na = TRUE, any observation with at least one NA value for one
item will be dropped.
If drop_na = FALSE and combine_answers = FALSE, NA values for a
specific answer are excluded the denominator when computing
proportions. Therefore, all proportions may be computed on different
population sizes.
If drop_na = FALSE and combine_answers = TRUE, any observation with at
least one NA value will be labeled with missing_label.
d <- dplyr::tibble( q1a = sample(c("y", "n"), size = 200, replace = TRUE), q1b = sample(c("y", "n", "n", NA), size = 200, replace = TRUE), q1c = sample(c("y", "y", "n"), size = 200, replace = TRUE), q1d = sample("n", size = 200, replace = TRUE) ) d |> plot_multiple_answers(q1a:q1c) d |> labelled::set_variable_labels( q1a = "apple", q1b = "banana", q1c = "chocolate", q1d = "Dijon mustard" ) |> plot_multiple_answers( value = "y", drop_na = TRUE, sort = "desc", fill = "lightblue", flip = TRUE ) d |> plot_multiple_answers( combine_answers = TRUE, value = "y", fill = "#DDCC77", drop_na = TRUE ) d |> plot_multiple_answers( combine_answers = TRUE, value = "y", flip = TRUE, mapping = ggplot2::aes(fill = prop), show.legend = FALSE ) + ggplot2::scale_fill_distiller(palette = "Spectral") d$group <- sample(c("group A", "group B"), size = 200, replace = TRUE) d |> plot_multiple_answers( answers = q1a:q1d, by = group, combine_answers = TRUE, sort = "degrees", value = "y", fill = "grey80" ) d |> plot_multiple_answers_dodge(q1a:q1d, by = group) d |> plot_multiple_answers_dodge(q1a:q1d, by = group, flip = TRUE) d |> plot_multiple_answers_dodge(q1a:q1d, by = group, combine_answers = TRUE)d <- dplyr::tibble( q1a = sample(c("y", "n"), size = 200, replace = TRUE), q1b = sample(c("y", "n", "n", NA), size = 200, replace = TRUE), q1c = sample(c("y", "y", "n"), size = 200, replace = TRUE), q1d = sample("n", size = 200, replace = TRUE) ) d |> plot_multiple_answers(q1a:q1c) d |> labelled::set_variable_labels( q1a = "apple", q1b = "banana", q1c = "chocolate", q1d = "Dijon mustard" ) |> plot_multiple_answers( value = "y", drop_na = TRUE, sort = "desc", fill = "lightblue", flip = TRUE ) d |> plot_multiple_answers( combine_answers = TRUE, value = "y", fill = "#DDCC77", drop_na = TRUE ) d |> plot_multiple_answers( combine_answers = TRUE, value = "y", flip = TRUE, mapping = ggplot2::aes(fill = prop), show.legend = FALSE ) + ggplot2::scale_fill_distiller(palette = "Spectral") d$group <- sample(c("group A", "group B"), size = 200, replace = TRUE) d |> plot_multiple_answers( answers = q1a:q1d, by = group, combine_answers = TRUE, sort = "degrees", value = "y", fill = "grey80" ) d |> plot_multiple_answers_dodge(q1a:q1d, by = group) d |> plot_multiple_answers_dodge(q1a:q1d, by = group, flip = TRUE) d |> plot_multiple_answers_dodge(q1a:q1d, by = group, combine_answers = TRUE)
Plot one or several proportions (defined by logical conditions) by
sub-groups. See proportion() for more details on the way proportions and
confidence intervals are computed. By default, return a bar plot, but other
geometries could be used (see examples). stratified_by() is an helper
function facilitating a stratified analyses (i.e. proportions by groups
stratified according to a third variable, see examples).
dummy_proportions() is an helper to easily convert a categorical variable
into dummy variables and therefore showing the proportion of each level of
the original variable (see examples).
plot_proportions( data, condition, by = NULL, drop_na_by = FALSE, convert_continuous = TRUE, geom = "bar", ..., show_overall = TRUE, overall_label = "Overall", show_ci = TRUE, conf_level = 0.95, ci_color = "black", show_pvalues = TRUE, pvalues_test = c("fisher", "chisq"), pvalues_labeller = scales::label_pvalue(add_p = TRUE), pvalues_size = 3.5, show_labels = TRUE, label_y = NULL, labels_labeller = scales::label_percent(1), labels_size = 3.5, labels_color = "black", show_overall_line = FALSE, overall_line_type = "dashed", overall_line_color = "black", overall_line_width = 0.5, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), flip = FALSE, minimal = FALSE, free_scale = FALSE, return_data = FALSE ) stratified_by(condition, strata) dummy_proportions(variable)plot_proportions( data, condition, by = NULL, drop_na_by = FALSE, convert_continuous = TRUE, geom = "bar", ..., show_overall = TRUE, overall_label = "Overall", show_ci = TRUE, conf_level = 0.95, ci_color = "black", show_pvalues = TRUE, pvalues_test = c("fisher", "chisq"), pvalues_labeller = scales::label_pvalue(add_p = TRUE), pvalues_size = 3.5, show_labels = TRUE, label_y = NULL, labels_labeller = scales::label_percent(1), labels_size = 3.5, labels_color = "black", show_overall_line = FALSE, overall_line_type = "dashed", overall_line_color = "black", overall_line_width = 0.5, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), flip = FALSE, minimal = FALSE, free_scale = FALSE, return_data = FALSE ) stratified_by(condition, strata) dummy_proportions(variable)
data |
A data frame, data frame extension (e.g. a tibble), or a survey design object. |
condition |
< |
by |
< |
drop_na_by |
Remove |
convert_continuous |
Should continuous by variables (with 5 unique
values or more) be converted to quartiles (using |
geom |
Geometry to use for plotting proportions ( |
... |
Additional arguments passed to the geom defined by |
show_overall |
Display "Overall" column? |
overall_label |
Label for the overall column. |
show_ci |
Display confidence intervals? |
conf_level |
Confidence level for the confidence intervals. |
ci_color |
Color of the error bars representing confidence intervals. |
show_pvalues |
Display p-values in the top-left corner? |
pvalues_test |
Test to compute p-values for data frames: |
pvalues_labeller |
Labeller function for p-values. |
pvalues_size |
Text size for p-values. |
show_labels |
Display proportion labels? |
label_y |
Y position of labels. If |
labels_labeller |
Labeller function for labels. |
labels_size |
Size of labels. |
labels_color |
Color of labels. |
show_overall_line |
Add an overall line? |
overall_line_type |
Line type of the overall line. |
overall_line_color |
Color of the overall line. |
overall_line_width |
Line width of the overall line. |
facet_labeller |
Labeller function for strip labels. |
flip |
Flip x and y axis? |
minimal |
Should a minimal theme be applied? (no y-axis, no grid) |
free_scale |
Allow y axis to vary between conditions? |
return_data |
Return computed data instead of the plot? |
strata |
Stratification variable |
variable |
Variable to be converted into dummy variables. |
titanic |> plot_proportions( Survived == "Yes", overall_label = "All", labels_color = "white" ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), fill = "lightblue" ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), fill = "lightblue", flip = TRUE ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), fill = "lightblue", minimal = TRUE ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), geom = "point", color = "red", size = 3, show_labels = FALSE ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), geom = "area", fill = "lightgreen", show_overall = FALSE ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), geom = "line", color = "purple", ci_color = "darkblue", show_overall = FALSE ) titanic |> plot_proportions( Survived == "Yes", by = -Survived, mapping = ggplot2::aes(fill = by), color = "black", show.legend = FALSE, show_overall_line = TRUE, show_pvalues = FALSE ) # defining several proportions titanic |> plot_proportions( dplyr::tibble( Survived = Survived == "Yes", Male = Sex == "Male" ), by = c(Class), mapping = ggplot2::aes(fill = condition) ) titanic |> plot_proportions( dplyr::tibble( Survived = Survived == "Yes", Male = Sex == "Male" ), by = c(Class), mapping = ggplot2::aes(fill = condition), free_scale = TRUE ) iris |> plot_proportions( dplyr::tibble( "Long sepal" = Sepal.Length > 6, "Short petal" = Petal.Width < 1 ), by = Species, fill = "palegreen" ) iris |> plot_proportions( dplyr::tibble( "Long sepal" = Sepal.Length > 6, "Short petal" = Petal.Width < 1 ), by = Species, fill = "palegreen", flip = TRUE ) # works with continuous by variables iris |> labelled::set_variable_labels( Sepal.Length = "Length of the sepal" ) |> plot_proportions( Species == "versicolor", by = dplyr::contains("leng"), fill = "plum", colour = "plum4" ) # works with survey object titanic |> srvyr::as_survey() |> plot_proportions( Survived == "Yes", by = c(Class, Sex), fill = "darksalmon", color = "black", show_overall_line = TRUE, labels_labeller = scales::label_percent(.1) ) # stratified analysis titanic |> plot_proportions( (Survived == "Yes") |> stratified_by(Sex), by = Class, mapping = ggplot2::aes(fill = condition) ) + ggplot2::theme(legend.position = "bottom") + ggplot2::labs(fill = NULL) # Convert Class into dummy variables titanic |> plot_proportions( dummy_proportions(Class), by = Sex, mapping = ggplot2::aes(fill = level) )titanic |> plot_proportions( Survived == "Yes", overall_label = "All", labels_color = "white" ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), fill = "lightblue" ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), fill = "lightblue", flip = TRUE ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), fill = "lightblue", minimal = TRUE ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), geom = "point", color = "red", size = 3, show_labels = FALSE ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), geom = "area", fill = "lightgreen", show_overall = FALSE ) titanic |> plot_proportions( Survived == "Yes", by = c(Class, Sex), geom = "line", color = "purple", ci_color = "darkblue", show_overall = FALSE ) titanic |> plot_proportions( Survived == "Yes", by = -Survived, mapping = ggplot2::aes(fill = by), color = "black", show.legend = FALSE, show_overall_line = TRUE, show_pvalues = FALSE ) # defining several proportions titanic |> plot_proportions( dplyr::tibble( Survived = Survived == "Yes", Male = Sex == "Male" ), by = c(Class), mapping = ggplot2::aes(fill = condition) ) titanic |> plot_proportions( dplyr::tibble( Survived = Survived == "Yes", Male = Sex == "Male" ), by = c(Class), mapping = ggplot2::aes(fill = condition), free_scale = TRUE ) iris |> plot_proportions( dplyr::tibble( "Long sepal" = Sepal.Length > 6, "Short petal" = Petal.Width < 1 ), by = Species, fill = "palegreen" ) iris |> plot_proportions( dplyr::tibble( "Long sepal" = Sepal.Length > 6, "Short petal" = Petal.Width < 1 ), by = Species, fill = "palegreen", flip = TRUE ) # works with continuous by variables iris |> labelled::set_variable_labels( Sepal.Length = "Length of the sepal" ) |> plot_proportions( Species == "versicolor", by = dplyr::contains("leng"), fill = "plum", colour = "plum4" ) # works with survey object titanic |> srvyr::as_survey() |> plot_proportions( Survived == "Yes", by = c(Class, Sex), fill = "darksalmon", color = "black", show_overall_line = TRUE, labels_labeller = scales::label_percent(.1) ) # stratified analysis titanic |> plot_proportions( (Survived == "Yes") |> stratified_by(Sex), by = Class, mapping = ggplot2::aes(fill = condition) ) + ggplot2::theme(legend.position = "bottom") + ggplot2::labs(fill = NULL) # Convert Class into dummy variables titanic |> plot_proportions( dummy_proportions(Class), by = Sex, mapping = ggplot2::aes(fill = level) )
Create a trajectory index plot (similar to sequence index plot) from a data frame in long or period format.
plot_trajectories( data, id, time, fill, by = NULL, sort_by = NULL, nudge_x = NULL, hide_y_labels = NULL, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), ... ) plot_periods( data, id, start, stop, fill, by = NULL, sort_by = NULL, nudge_x = NULL, hide_y_labels = NULL, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), ... )plot_trajectories( data, id, time, fill, by = NULL, sort_by = NULL, nudge_x = NULL, hide_y_labels = NULL, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), ... ) plot_periods( data, id, start, stop, fill, by = NULL, sort_by = NULL, nudge_x = NULL, hide_y_labels = NULL, facet_labeller = ggplot2::label_wrap_gen(width = 50, multi_line = TRUE), ... )
data |
A data frame, a data frame extension (e.g. a tibble), or a survey design object. |
id |
< |
time |
< |
fill |
< |
by |
< |
sort_by |
< |
nudge_x |
Optional amount of horizontal distance to move. |
hide_y_labels |
Hide y labels? If |
facet_labeller |
Labeller function for strip labels. |
... |
Additional arguments passed to |
start, stop
|
< |
plot_trajectories() assumes that data are stored in a long format (i.e.
one row per unit of time). You can use tidyr::pivot_longer() or
periods_to_long() to transform your data in such format. By default, tiles
are centered on the value of time. You can adjust horizontal position with
nudge_x. By default, each row is assumed to represent one unit of time and
represented with a width of 1. You can adjust tiles' width with width.
plot_periods() is adapted for period format with a start and a stop
variable. You can use long_to_periods() to transform your data in such
format. Beginning and ending of each tile is determined by start and
stop arguments.
For survey design objects, weights are not taken into account. Each individual trajectory as the same height.
d <- dplyr::tibble( id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3), time = c(0:3, 0:2, 0:4), status = c("a", "a", "b", "b", "b", "b", "a", "b", "b", "b", "b", "a"), group = c("f", "f", "f", "f", "f", "f", "f", "m", "m", "m", "m", "m") ) d |> plot_trajectories(id = id, time = time, fill = status, colour = "black") d |> plot_trajectories(id = id, time = time, fill = status, nudge_x = .5) d |> plot_trajectories(id = id, time = time, fill = status, by = group) d2 <- d |> dplyr::mutate(end = time + 1) |> long_to_periods(id = id, start = time, stop = end, by = status) d2 d2 |> plot_periods( id = id, start = time, stop = end, fill = status, colour = "black", height = 0.8 )d <- dplyr::tibble( id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3), time = c(0:3, 0:2, 0:4), status = c("a", "a", "b", "b", "b", "b", "a", "b", "b", "b", "b", "a"), group = c("f", "f", "f", "f", "f", "f", "f", "m", "m", "m", "m", "m") ) d |> plot_trajectories(id = id, time = time, fill = status, colour = "black") d |> plot_trajectories(id = id, time = time, fill = status, nudge_x = .5) d |> plot_trajectories(id = id, time = time, fill = status, by = group) d2 <- d |> dplyr::mutate(end = time + 1) |> long_to_periods(id = id, start = time, stop = end, by = status) d2 d2 |> plot_periods( id = id, start = time, stop = end, fill = status, colour = "black", height = 0.8 )
proportion() lets you quickly count observations (like dplyr::count())
and compute relative proportions. Proportions are computed separately by
group (see examples).
proportion(data, ...) ## S3 method for class 'data.frame' proportion( data, ..., .by = NULL, .na.rm = FALSE, .weight = NULL, .scale = 100, .sort = FALSE, .drop = FALSE, .drop_na_by = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = list(correct = TRUE) ) ## S3 method for class 'survey.design' proportion( data, ..., .by = NULL, .na.rm = FALSE, .scale = 100, .sort = FALSE, .drop_na_by = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = NULL ) ## Default S3 method: proportion( data, ..., .na.rm = FALSE, .scale = 100, .sort = FALSE, .drop = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = list(correct = TRUE) )proportion(data, ...) ## S3 method for class 'data.frame' proportion( data, ..., .by = NULL, .na.rm = FALSE, .weight = NULL, .scale = 100, .sort = FALSE, .drop = FALSE, .drop_na_by = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = list(correct = TRUE) ) ## S3 method for class 'survey.design' proportion( data, ..., .by = NULL, .na.rm = FALSE, .scale = 100, .sort = FALSE, .drop_na_by = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = NULL ) ## Default S3 method: proportion( data, ..., .na.rm = FALSE, .scale = 100, .sort = FALSE, .drop = FALSE, .conf.int = FALSE, .conf.level = 0.95, .options = list(correct = TRUE) )
data |
A vector, a data frame, data frame extension (e.g. a tibble), or a survey design object. |
... |
< |
.by |
< |
.na.rm |
Should |
.weight |
< |
.scale |
A scaling factor applied to proportion. Use |
.sort |
If |
.drop |
If |
.drop_na_by |
If |
.conf.int |
If |
.conf.level |
Confidence level for the returned confidence intervals. |
.options |
Additional arguments passed to |
A tibble.
A tibble with one row per group.
# using a vector titanic$Class |> proportion() # univariable table titanic |> proportion(Class) titanic |> proportion(Class, .sort = TRUE) titanic |> proportion(Class, .conf.int = TRUE) titanic |> proportion(Class, .conf.int = TRUE, .scale = 1) # bivariable table titanic |> proportion(Class, Survived) # proportions of the total titanic |> proportion(Survived, .by = Class) # row proportions titanic |> # equivalent syntax dplyr::group_by(Class) |> proportion(Survived) # combining 3 variables or more titanic |> proportion(Class, Sex, Survived) titanic |> proportion(Sex, Survived, .by = Class) titanic |> proportion(Survived, .by = c(Class, Sex)) # missing values dna <- titanic dna$Survived[c(1:20, 500:530)] <- NA dna |> proportion(Survived) dna |> proportion(Survived, .na.rm = TRUE) ## SURVEY DATA ------------------------------------------------------ ds <- srvyr::as_survey(titanic) # univariable table ds |> proportion(Class) ds |> proportion(Class, .sort = TRUE) ds |> proportion(Class, .conf.int = TRUE) ds |> proportion(Class, .conf.int = TRUE, .scale = 1) # bivariable table ds |> proportion(Class, Survived) # proportions of the total ds |> proportion(Survived, .by = Class) # row proportions ds |> dplyr::group_by(Class) |> proportion(Survived) # combining 3 variables or more ds |> proportion(Class, Sex, Survived) ds |> proportion(Sex, Survived, .by = Class) ds |> proportion(Survived, .by = c(Class, Sex)) # missing values dsna <- srvyr::as_survey(dna) dsna |> proportion(Survived) dsna |> proportion(Survived, .na.rm = TRUE)# using a vector titanic$Class |> proportion() # univariable table titanic |> proportion(Class) titanic |> proportion(Class, .sort = TRUE) titanic |> proportion(Class, .conf.int = TRUE) titanic |> proportion(Class, .conf.int = TRUE, .scale = 1) # bivariable table titanic |> proportion(Class, Survived) # proportions of the total titanic |> proportion(Survived, .by = Class) # row proportions titanic |> # equivalent syntax dplyr::group_by(Class) |> proportion(Survived) # combining 3 variables or more titanic |> proportion(Class, Sex, Survived) titanic |> proportion(Sex, Survived, .by = Class) titanic |> proportion(Survived, .by = c(Class, Sex)) # missing values dna <- titanic dna$Survived[c(1:20, 500:530)] <- NA dna |> proportion(Survived) dna |> proportion(Survived, .na.rm = TRUE) ## SURVEY DATA ------------------------------------------------------ ds <- srvyr::as_survey(titanic) # univariable table ds |> proportion(Class) ds |> proportion(Class, .sort = TRUE) ds |> proportion(Class, .conf.int = TRUE) ds |> proportion(Class, .conf.int = TRUE, .scale = 1) # bivariable table ds |> proportion(Class, Survived) # proportions of the total ds |> proportion(Survived, .by = Class) # row proportions ds |> dplyr::group_by(Class) |> proportion(Survived) # combining 3 variables or more ds |> proportion(Class, Sex, Survived) ds |> proportion(Sex, Survived, .by = Class) ds |> proportion(Survived, .by = c(Class, Sex)) # missing values dsna <- srvyr::as_survey(dna) dsna |> proportion(Survived) dsna |> proportion(Survived, .na.rm = TRUE)
Sometimes, the sum of rounded numbers (e.g., using base::round()) is not
the same as their rounded sum.
round_preserve_sum(x, digits = 0)round_preserve_sum(x, digits = 0)
x |
Numerical vector to sum. |
digits |
Number of decimals for rounding. |
This solution applies the following algorithm
Round down to the specified number of decimal places
Order numbers by their remainder values
Increment the specified decimal place of values with k largest remainders, where k is the number of values that must be incremented to preserve their rounded sum
A numerical vector of same length as x.
https://biostatmatt.com/archives/2902
sum(c(0.333, 0.333, 0.334)) round(c(0.333, 0.333, 0.334), 2) sum(round(c(0.333, 0.333, 0.334), 2)) round_preserve_sum(c(0.333, 0.333, 0.334), 2) sum(round_preserve_sum(c(0.333, 0.333, 0.334), 2))sum(c(0.333, 0.333, 0.334)) round(c(0.333, 0.333, 0.334), 2) sum(round(c(0.333, 0.333, 0.334), 2)) round_preserve_sum(c(0.333, 0.333, 0.334), 2) sum(round_preserve_sum(c(0.333, 0.333, 0.334), 2))
Provides a safe colour palette for categorical variable. It is based on
Paul Tol's colour schemes designed to be distinct for all people, including
colour-blind readers, distinct from black and white, distinct on screen and
paper, and matching well together. It is primarily based on the bright
colour scheme implemented in khroma::scale_fill_bright(). This colour
scheme include 7 colours, including a grey reserved for NA values.
Therefore, scale_fill_safe() use the bright scheme only if 6 or less
colours are needed (keeping the grey for any NA value). If 7 to 9 colours
are needed, the muted scheme (cf. khroma::scale_fill_muted()) is used
instead. Finally, if 10 or more colours are requested, the rainbow scheme
is used (cf. khroma::scale_fill_discreterainbow()). This is a sequential
colour scheme. Here, colour are randomly reordered to provide more contrasts
between modalities.
safe_pal(reverse = FALSE) scale_fill_safe( name = ggplot2::waiver(), ..., reverse = FALSE, aesthetics = "fill", na.value = "#BBBBBB" ) scale_colour_safe( name = ggplot2::waiver(), ..., reverse = FALSE, aesthetics = "colour", na.value = "#BBBBBB" ) scale_color_safe( name = ggplot2::waiver(), ..., reverse = FALSE, aesthetics = "colour", na.value = "#BBBBBB" )safe_pal(reverse = FALSE) scale_fill_safe( name = ggplot2::waiver(), ..., reverse = FALSE, aesthetics = "fill", na.value = "#BBBBBB" ) scale_colour_safe( name = ggplot2::waiver(), ..., reverse = FALSE, aesthetics = "colour", na.value = "#BBBBBB" ) scale_color_safe( name = ggplot2::waiver(), ..., reverse = FALSE, aesthetics = "colour", na.value = "#BBBBBB" )
reverse |
A logical scalar: should the resulting vector of colours be reversed? |
name |
The name of the scale. Used as the axis or legend title.
If |
... |
Other arguments passed on to |
aesthetics |
Character string or vector of character strings listing
the name(s) of the aesthetic(s) that this scale works with. This can be
useful, for example, to apply colour settings to the colour and fill
aesthetics at the same time, via |
na.value |
Colour to be used for |
A palette function.
scales::show_col(safe_pal()(6)) scales::show_col(safe_pal(reverse = TRUE)(6)) scales::show_col(safe_pal()(9)) scales::show_col(safe_pal()(16)) ggplot2::ggplot(titanic) + ggplot2::aes(x = Age, fill = Class) + ggplot2::geom_bar() + scale_fill_safe() ggplot2::ggplot(iris) + ggplot2::aes(x = Petal.Length, y = Petal.Width, colour = Species) + ggplot2::geom_point(size = 3) + scale_colour_safe()scales::show_col(safe_pal()(6)) scales::show_col(safe_pal(reverse = TRUE)(6)) scales::show_col(safe_pal()(9)) scales::show_col(safe_pal()(16)) ggplot2::ggplot(titanic) + ggplot2::aes(x = Age, fill = Class) + ggplot2::geom_bar() + scale_fill_safe() ggplot2::ggplot(iris) + ggplot2::aes(x = Petal.Length, y = Petal.Width, colour = Species) + ggplot2::geom_point(size = 3) + scale_colour_safe()
step(), taking into account missing valuesWhen your data contains missing values, concerned observations are removed from a model. However, then at a later stage, you try to apply a descending stepwise approach to reduce your model by minimization of AIC, you may encounter an error because the number of rows has changed.
step_with_na(model, ...) ## Default S3 method: step_with_na(model, ..., full_data = eval(model$call$data)) ## S3 method for class 'svyglm' step_with_na(model, ..., design)step_with_na(model, ...) ## Default S3 method: step_with_na(model, ..., full_data = eval(model$call$data)) ## S3 method for class 'svyglm' step_with_na(model, ..., design)
model |
A model object. |
... |
Additional parameters passed to |
full_data |
Full data frame used for the model, including missing data. |
design |
Survey design previously passed to |
step_with_na() applies the following strategy:
recomputes the models using only complete cases;
applies stats::step();
recomputes the reduced model using the full original dataset.
step_with_na() has been tested with stats::lm(), stats::glm(),
nnet::multinom(), survey::svyglm() and survival::coxph().
It may be working with other types of models, but with no warranty.
In some cases, it may be necessary to provide the full dataset initially used to estimate the model.
step_with_na() may not work inside other functions. In that case, you
may try to pass full_data to the function.
The stepwise-selected model.
set.seed(42) d <- titanic |> dplyr::mutate( Group = sample( c("a", "b", NA), dplyr::n(), replace = TRUE ) ) mod <- glm(as.factor(Survived) ~ ., data = d, family = binomial()) # step(mod) should produce an error mod2 <- step_with_na(mod, full_data = d) mod2 ## WITH SURVEY --------------------------------------- library(survey) ds <- d |> dplyr::mutate(Survived = as.factor(Survived)) |> srvyr::as_survey() mods <- survey::svyglm( Survived ~ Class + Group + Sex, design = ds, family = quasibinomial() ) mod2s <- step_with_na(mods, design = ds) mod2sset.seed(42) d <- titanic |> dplyr::mutate( Group = sample( c("a", "b", NA), dplyr::n(), replace = TRUE ) ) mod <- glm(as.factor(Survived) ~ ., data = d, family = binomial()) # step(mod) should produce an error mod2 <- step_with_na(mod, full_data = d) mod2 ## WITH SURVEY --------------------------------------- library(survey) ds <- d |> dplyr::mutate(Survived = as.factor(Survived)) |> srvyr::as_survey() mods <- survey::svyglm( Survived ~ Class + Group + Sex, design = ds, family = quasibinomial() ) mod2s <- step_with_na(mods, design = ds) mod2s
This function allows to compare several means using survey::svyglm(). More
precisely, this is a wrapper for survey::regTermTest(m, "group") where
m <- survey::svyglm(x ~ group, design).
svyoneway(formula, design, ...)svyoneway(formula, design, ...)
formula |
a formula of the form |
design |
a survey design object |
... |
additional parameters passed to |
an object of class "htest"
stats::oneway.test() for classic data frames
svyoneway( Petal.Length ~ Species, design = srvyr::as_survey(iris) )svyoneway( Petal.Length ~ Species, design = srvyr::as_survey(iris) )
This titanic dataset is equivalent to
datasets::Titanic |> dplyr::as_tibble() |> tidyr::uncount(n).
titanictitanic
An object of class tbl_df (inherits from tbl, data.frame) with 2201 rows and 4 columns.
Remove row-wise grouping created with dplyr::rowwise() while preserving
any other grouping declared with dplyr::group_by().
unrowwise(data)unrowwise(data)
data |
A tibble. |
A tibble.
titanic |> dplyr::rowwise() titanic |> dplyr::rowwise() |> unrowwise() titanic |> dplyr::group_by(Sex, Class) |> dplyr::rowwise() titanic |> dplyr::group_by(Sex, Class) |> dplyr::rowwise() |> unrowwise()titanic |> dplyr::rowwise() titanic |> dplyr::rowwise() |> unrowwise() titanic |> dplyr::group_by(Sex, Class) |> dplyr::rowwise() titanic |> dplyr::group_by(Sex, Class) |> dplyr::rowwise() |> unrowwise()
Generates an interactive variable dictionary based on labelled::look_for().
Accepts data frames, tibbles, and also survey objects.
view_dictionary(data = NULL, details = c("basic", "none", "full")) view_detailed_dictionary(data = NULL) to_DT( x, caption = NULL, column_labels = list(pos = "#", variable = "Variable", col_type = "Type", label = "Variable label", values = "Values", missing = "Missing values", unique_values = "Unique values", na_values = "User-defined missings (values)", na_range = "User-defined missings (range)") )view_dictionary(data = NULL, details = c("basic", "none", "full")) view_detailed_dictionary(data = NULL) to_DT( x, caption = NULL, column_labels = list(pos = "#", variable = "Variable", col_type = "Type", label = "Variable label", values = "Values", missing = "Missing values", unique_values = "Unique values", na_values = "User-defined missings (values)", na_range = "User-defined missings (range)") )
data |
a data frame, a tibble or a survey object
(if |
details |
add details about each variable (see |
x |
a tibble returned by |
caption |
an optional caption for the table |
column_labels |
Optional column labels |
view_dictionary() calls labelled::look_for() and applies to_DT() to
the result to produce an HTML version of the variable dictionary. If you are
using RStudio, it will be displayed by default in the Viewer pane,
allowing to have the dictionary close to your code.
view_detailed_dictionary() is similar to view_dictionary() with the
option details = "full".
These two functions are also available through dedicated addins in RStudio. To use them, select the name of a data frame, then choose View variable dictionary in the Addins menu.
to_DT() is an utility to convert the result of labelled::look_for() into
a DT::datatable().
iris |> view_dictionary() iris |> labelled::look_for(details = TRUE) |> to_DT()iris |> view_dictionary() iris |> labelled::look_for(details = TRUE) |> to_DT()