| Title: | Interface for Multilevel Regression and Poststratification |
|---|---|
| Description: | Dual interfaces, graphical and programmatic, designed for intuitive applications of Multilevel Regression and Poststratification (MRP). Users can apply the method to a variety of datasets, from electronic health records to sample survey data, through an end-to-end Bayesian data analysis workflow. The package provides robust tools for data cleaning, exploratory analysis, flexible model building, and insightful result visualization. For more details, see Si et al. (2020) <https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2020002/article/00003-eng.pdf?st=iF1_Fbrh> and Si (2025) <doi:10.1214/24-STS932>. |
| Authors: | Toan Tran [cre, aut, cph], Jonah Gabry [aut, cph], Yajuan Si [aut, cph] |
| Maintainer: | Toan Tran <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.10.0 |
| Built: | 2026-06-02 09:39:37 UTC |
| Source: | https://github.com/mrp-interface/shinymrp |
MRPModel object with estimation results.Return an example MRPModel object with estimation results.
example_model(is_timevar = TRUE)example_model(is_timevar = TRUE)
is_timevar |
Logical indicating whether the model is fitted to time-varying data. |
A MRPModel object.
Return example poststratification data accepted by
the $load_pstrat() method of an MRPWorkflow object.
example_pstrat_data()example_pstrat_data()
A data.frame object.
Return example data based on the specified characteristics.
example_sample_data( is_timevar = TRUE, is_aggregated = TRUE, special_case = NULL, family = "binomial" )example_sample_data( is_timevar = TRUE, is_aggregated = TRUE, special_case = NULL, family = "binomial" )
is_timevar |
Logical indicating whether the data is time-varying. |
is_aggregated |
Logical indicating whether the data is aggregated. |
special_case |
Optional character string for specific use cases such as COVID data.
Options are |
family |
Character string specifying the distribution family for outcome measures.
Options are |
A data.frame object.
Create a new MRPWorkflow object that implements
the Bayesian data analysis workflow common in applications of
Multilevel Regression and Post-stratification (MRP).
mrp_workflow()mrp_workflow()
A MRPWorkflow object.
An MRPModel object is an R6 object created by the
$create_model() method of an
MRPWorkflow object. Each MRPModel object represents a
multilevel regression model, providing methods for sampling, diagnostics,
and poststratification.
Creates a new instance of the MRPModel class. This method is called by the $create_model()
method of an MRPWorkflow object and does not need to be called directly by users.
model_spec |
List containing model effects specification, including intercept, fixed effects, varying effects, and interactions |
mrp_data |
List containing the MRP data structure with input sample data and new poststratification data |
metadata |
List containing metadata about the analysis, including family, time variables, and special cases |
link_data |
List containing information about data linking, including geography and ACS year |
plot_data |
List containing data prepared for visualization, including dates and GeoJSON objects |
extra |
List containing COVID test sensitivity and specificity |
A new MRPModel object initialized with the provided model specification and relevant data.
MRPModel objects have the following associated
methods, many of which have their own (linked) documentation pages:
| Method | Description |
$model_spec() |
Return model specification. |
$formula() |
Return model formula. |
$metadata() |
Return model metadata. |
$stan_code() |
Return model Stan code. |
| Method | Description |
$fit() |
Fit multilevel regression model using CmdStanR. |
$check_fit_exists() |
Check if model has been fitted. |
$check_estimate_exists() |
Check if poststratification has been performed. |
| Method | Description |
$summary() |
Return posterior summary table. |
$diagnostics() |
Return sampling diagnostics. |
| Method | Description |
$ppc() |
Create input for posterior predictive check. |
$log_lik() |
Create input for leave-one-out cross-validation. |
$poststratify() |
Run poststratification to generate population estimates. |
| Method | Description |
$save() |
Save model object to file. |
library(shinymrp) # Initialize workflow workflow <- mrp_workflow() # Load example data sample_data <- example_sample_data() # Preprocess sample data workflow$preprocess( sample_data, is_timevar = TRUE, is_aggregated = TRUE, special_case = NULL, family = "binomial" ) # Link to ACS data at ZIP code level workflow$link_acs( link_geo = "zip", acs_year = 2021 ) # Create and fit multiple models model <- workflow$create_model( intercept_prior = "normal(0, 4)", fixed = list( sex = "normal(0, 2)" ), varying = list( race = "normal(0, 2)", age = "normal(0, 2)", time = "normal(0, 2)" ) ) # Run MCMC model$fit(n_iter = 500, n_chains = 2, seed = 123) # Estimates summary and diagnostics posterior_summary <- model$summary() # Sampling diagnostics model_diagnostics <- model$diagnostics()library(shinymrp) # Initialize workflow workflow <- mrp_workflow() # Load example data sample_data <- example_sample_data() # Preprocess sample data workflow$preprocess( sample_data, is_timevar = TRUE, is_aggregated = TRUE, special_case = NULL, family = "binomial" ) # Link to ACS data at ZIP code level workflow$link_acs( link_geo = "zip", acs_year = 2021 ) # Create and fit multiple models model <- workflow$create_model( intercept_prior = "normal(0, 4)", fixed = list( sex = "normal(0, 2)" ), varying = list( race = "normal(0, 2)", age = "normal(0, 2)", time = "normal(0, 2)" ) ) # Run MCMC model$fit(n_iter = 500, n_chains = 2, seed = 123) # Estimates summary and diagnostics posterior_summary <- model$summary() # Sampling diagnostics model_diagnostics <- model$diagnostics()
The $check_estimate_exists() method of an MRPModel object
checks whether poststratification has been performed. Check out the
More examples of R6 classes
vignette for usage examples.
check_estimate_exists()check_estimate_exists()
Logical indicating whether poststratification has been performed.
The $check_fit_exists() method of an MRPModel object
checks whether the model has been fitted. Check out the
More examples of R6 classes
vignette for usage examples.
check_fit_exists()check_fit_exists()
Logical indicating whether the model has been fitted.
The $diagnostics() method of an MRPModel object
returns MCMC diagnostics, including convergence statistics and
sampling efficiency measures. Check out this official Stan guide
for more information on interpreting these metrics. For usage examples, refer to the
More examples of R6 classes
vignette.
diagnostics(summarize = TRUE)diagnostics(summarize = TRUE)
summarize |
Logical indicating whether to return a summarized version of the diagnostics (default is TRUE) |
A data.frame object if summarize is TRUE, otherwise a list of raw diagnostics.
The $fit() method of an MRPModel object fits the model using
Stan's main Markov chain Monte Carlo (MCMC) algorithm. Check out the
More examples of R6 classes
vignette for usage examples.
fit(n_iter = 2000, n_chains = 4, seed = NULL, ...)fit(n_iter = 2000, n_chains = 4, seed = NULL, ...)
n_iter |
Number of MCMC iterations per chain (including warmup iterations). Default is 2000. |
n_chains |
Number of MCMC chains to run. Default is 4. |
seed |
Random seed for reproducibility. Default is |
... |
Additional arguments passed to CmdStanR |
No return value, called for side effects.
The $formula() method of an MRPModel object
returns the lme4-style formula constructed from the given model specification.
Check out the
More examples of R6 classes
vignette for usage examples.
formula()formula()
A character string of the model formula.
The $log_lik() method of an MRPModel object
runs Stan's standalone generated quantities
and extracts log-likelihood values for leave-one-out cross-validation. This
method is called by the $compare_models() method of an MRPWorkflow object
and does not need to be called directly by users.
log_lik()log_lik()
A data.frame object containing log-likelihood values.
The $metadata() method of an MRPModel object
returns the metadata associated with the model,
including metadata inherited from a workflow object and model fitting parameters.
Check out the
More examples of R6 classes
vignette for usage examples.
metadata()metadata()
A list containing the model metadata.
The $model_spec() method of an MRPModel object
returns the model specification list. Check out the
More examples of R6 classes
vignette for usage examples.
model_spec()model_spec()
A list containing the model specification including intercept, fixed effects, varying effects, and interactions.
The $poststratify() method of an MRPModel object
runs Stan's standalone generated quantities and extracts posterior samples
for poststratified estimates. This method is called by the $poststratify()
method of a MRPWorkflow object and does not need to be called directly by users.
poststratify(interval = 0.95)poststratify(interval = 0.95)
interval |
Confidence interval (a numeric value between 0 and 1) or
standard deviation ( |
A data.frame object containing the poststratified estimates and their corresponding uncertainty intervals.
The $ppc() method of an MRPModel object
runs Stan's standalone generated quantities
to draw from the posterior predictive distribution. This method is called
by the $pp_check() method of a MRPWorkflow object and does
not need to be called directly by users.
ppc()ppc()
A data.frame object containing samples from the posterior predictive distribution.
The $save() method of an MRPModel object
saves a fitted MRPModel object to a file for later use.
qs2::qs_save() is used internally, and it is customary
to use the .qs2 file extension. Check out the
More examples of R6 classes
vignette for usage examples.
save(file)save(file)
file |
File path where the model should be saved. |
No return value, called for side effects.
The $stan_code() method of an MRPModel object
returns the model Stan code. Check out the
More examples of R6 classes
vignette for usage examples.
stan_code()stan_code()
A character string containing the model Stan code.
The $summary() method of an MRPModel object
returns tables containing the summary of posterior samples
for the model parameters and diagnostics. Check out the
More examples of R6 classes
vignette for usage examples.
summary()summary()
A list of data.frame objects containing posterior sample summary and diagnostics for model parameters:
fixed effects (fixed)
standard deviations of varying effects (varying)
standard deviations of residuals (residual)
BYM2 proportion of spatial structure (spatial_proportion)
A MRPWorkflow object is an R6 object created by
the mrp_workflow() function. This class provides methods for all steps
in the workflow, from data preparation and visualization to model fitting.
MRPWorkflow objects have the following associated
methods with their own (linked) documentation pages:
| Method | Description |
$preprocess() |
Preprocess sample data. |
$preprocessed_data() |
Return preprocessed sample data. |
$link_acs() |
Link sample data to ACS data. |
$load_pstrat() |
Load custom poststratification data. |
| Method | Description |
$create_model() |
Create a MRPModel object. |
$pp_check() |
Perform posterior predictive check. |
$compare_models() |
Compare models using LOO-CV. |
| Method | Description |
$demo_bars() |
Create demographic comparison bar plots. |
$covar_hist() |
Create geographic covariate distribution histograms. |
$sample_size_map() |
Create sample size map. |
$outcome_plot() |
Create summary plots of raw outcome measure. |
$outcome_map() |
Visualize raw outcome measure by geography. |
$estimate_plot() |
Visualize estimates for demographic groups. |
$estimate_map() |
Visualize estimates for geographic areas. |
library(shinymrp) # Initialize the MRP workflow workflow <- mrp_workflow() # Load example data sample_data <- example_sample_data() ### DATA PREPARATION # Preprocess sample data workflow$preprocess( sample_data, is_timevar = TRUE, is_aggregated = TRUE, special_case = NULL, family = "binomial" ) # Link data to the ACS # and obtain poststratification data workflow$link_acs( link_geo = "zip", acs_year = 2021 ) ### DESCRIPTIVE STATISTICS # Visualize demographic distribution of data sex_bars <- workflow$demo_bars(demo = "sex") # Visualize geographic distribution of data ss_map <- workflow$sample_size_map() # Visualize outcome measure raw_outcome_plot <- workflow$outcome_plot() ### MODEL BUILDING # Create new model objects model <- workflow$create_model( intercept_prior = "normal(0, 4)", fixed = list( sex = "normal(0, 2)", race = "normal(0, 2)" ), varying = list( age = "", time = "" ) ) # Run MCMC model$fit(n_iter = 500, n_chains = 2, seed = 123) # Estimates summary and diagnostics model$summary() # Sampling diagnostics model$diagnostics() # Posterior predictive check workflow$pp_check(model) ### VISUALIZE RESULTS # Plots of overall estimates, estimates for demographic groups, and geographic areas workflow$estimate_plot(model, group = "sex") # Choropleth map of estimates for geographic areas workflow$estimate_map(model, geo = "county")library(shinymrp) # Initialize the MRP workflow workflow <- mrp_workflow() # Load example data sample_data <- example_sample_data() ### DATA PREPARATION # Preprocess sample data workflow$preprocess( sample_data, is_timevar = TRUE, is_aggregated = TRUE, special_case = NULL, family = "binomial" ) # Link data to the ACS # and obtain poststratification data workflow$link_acs( link_geo = "zip", acs_year = 2021 ) ### DESCRIPTIVE STATISTICS # Visualize demographic distribution of data sex_bars <- workflow$demo_bars(demo = "sex") # Visualize geographic distribution of data ss_map <- workflow$sample_size_map() # Visualize outcome measure raw_outcome_plot <- workflow$outcome_plot() ### MODEL BUILDING # Create new model objects model <- workflow$create_model( intercept_prior = "normal(0, 4)", fixed = list( sex = "normal(0, 2)", race = "normal(0, 2)" ), varying = list( age = "", time = "" ) ) # Run MCMC model$fit(n_iter = 500, n_chains = 2, seed = 123) # Estimates summary and diagnostics model$summary() # Sampling diagnostics model$diagnostics() # Posterior predictive check workflow$pp_check(model) ### VISUALIZE RESULTS # Plots of overall estimates, estimates for demographic groups, and geographic areas workflow$estimate_plot(model, group = "sex") # Choropleth map of estimates for geographic areas workflow$estimate_map(model, geo = "county")
The $compare_models() method compares multiple fitted MRPModel objects
using leave-one-out cross-validation to assess relative model performance. Check out the
More examples of R6 classes
vignette for usage examples.
compare_models(..., suppress = NULL)compare_models(..., suppress = NULL)
... |
Multiple MRPModel objects to compare. |
suppress |
Character string specifying output to suppress during comparison. |
A data.frame object containing the comparison results.
The covar_hist() method creates histogram plots
showing the distribution of geographic covariates across ZIP codes. Refer to the
More on data preparation for their definitions.
This method is only available for COVID data.
Check out the More examples of R6 classes
vignette for usage examples.
covar_hist(covar, file = NULL, ...)covar_hist(covar, file = NULL, ...)
covar |
Character string specifying the geographic covariate. Options are
|
file |
Optional file path to save the plot. |
... |
Additional arguments passed to |
A ggplot object showing the covariate distribution histogram.
The $create_model() method creates a new MRPModel object
with Stan code generated from the model specification list.
CmdStanR objects are used internally to interface with Stan to
compile the code and run its MCMC algorithm. Check out the
More examples of R6 classes
vignette for usage examples.
create_model( intercept_prior = NULL, fixed = NULL, varying = NULL, interaction = NULL, sens = 1, spec = 1 )create_model( intercept_prior = NULL, fixed = NULL, varying = NULL, interaction = NULL, sens = 1, spec = 1 )
intercept_prior |
Character string specifying the prior distribution for the overall intercept. Check Details for more information about prior specification. |
fixed |
List of the fixed effects in the model and their prior distributions. Check Details for more information about prior specification. |
varying |
List of the varying effects in the model and the prior distributions of their standard deviations. Check Details for more information about prior specification. |
interaction |
List of the interactions in the model and their prior distributions. Interaction names are created by concatenating the names of the interacting variables with a colon (e.g., "sex:age"). Currently, only two-way interactions are supported. Check Details for more information about prior specification. |
sens |
Sensitivity adjustment in the COVID-19 test results. Check Details for more information. |
spec |
Specificity adjustment in the COVID-19 test results. Check Details for more information. |
The syntax for the prior distributions is similar to that of Stan. The following are currently supported:
normal(mu, sigma)
student_t(nu, mu, sigma)
structured
icar
bym2
The structured prior distribution developed by Si et al. (2020), which can be assigned to three types of two-way interactions:
Two categorical variables (both with more than two levels)
One categorical variable (with more than two levels) and one binary variable
One categorical variable (with more than two levels) and one continuous variable
The spatial prior options (ICAR and BYM2) are useful when data contain geographic units (e.g., ZIP codes, counties, states) with spatial structure, specifically when observations exhibit correlation among neighboring regions. For details about the implementation and usage, see Spatial prior specification in shinymrp.
The following default prior distributions are assigned to effects with empty strings ("")
in the model specification list:
Overall intercept: normal(0,5)
Coefficient: normal(0,3)
The model assumes varying effects follow a normal distribution with an unknown standard deviation, which will be assigned with priors.
Standard deviation (main effect): normal+(0,3)
Standard deviation (interaction): normal+(0,1)
For COVID data, we allow users to specify the PCR testing sensitivity and specificity. Let be the probability
that person in group tests positive. The analytic incidence is a function of the test sensitivity ,
specificity , and the true incidence for individuals in group :
A new MRPModel object.
Creates bar plots for comparing demographic distributions between sample data and poststratification data. Check out the More examples of R6 classes vignette for usage examples.
demo_bars(demo, file = NULL, ...)demo_bars(demo, file = NULL, ...)
demo |
Character string specifying the demographic variable to plot. |
file |
Optional file path to save the plot. |
... |
Additional arguments passed to |
A ggplot object showing demographic comparisons
The $estimate_map() method creates interactive choropleth maps that show MRP estimates by geographic region.
This method cannot be used if either the sample or the poststratification data contains no geographic information.
Check out the More examples of R6 classes
vignette for usage examples.
estimate_map( model, geo = NULL, time_index = NULL, interval = 0.95, file = NULL )estimate_map( model, geo = NULL, time_index = NULL, interval = 0.95, file = NULL )
model |
Fitted MRPModel object |
geo |
Character string specifying the geographic level for mapping.
Options include geography for data linking and those at larger scales.
A "linking" geography is required to use this method. It is either specified
as |
time_index |
Integer specifying the time index for time-varying data. |
interval |
Confidence interval (a numeric value between 0 and 1) or
standard deviation ( |
file |
Optional file path with .html extension to save the interactive map. Expand the hamburger menu in the top right corner of the map to access other export options. |
A highcharter map object showing MRP estimates by geography.
The $estimate_plot() method creates plots showing overall MRP estimates or
estimates for different demographic groups. Check out the
More examples of R6 classes
vignette for usage examples.
estimate_plot( model, group = NULL, interval = 0.95, show_caption = TRUE, file = NULL, ... )estimate_plot( model, group = NULL, interval = 0.95, show_caption = TRUE, file = NULL, ... )
model |
Fitted MRPModel object |
group |
Character string specifying the demographic group.
If left as |
interval |
Confidence interval (a numeric value between 0 and 1) or
standard deviation ( |
show_caption |
Logical indicating whether to show the caption in the plot (default is TRUE). |
file |
Optional file path to save the plot. |
... |
Additional arguments passed to |
A ggplot object showing MRP estimates.
The $link_acs() method obtains poststratification data by
linking the preprocessed sample data to the American Community Survey
based on given geographic granularity and year. See the
More on data preparation
vignette for more information on data linking. For usage examples, refer to the
More examples of R6 classes
vignette.
link_acs(link_geo = NULL, acs_year = 2023)link_acs(link_geo = NULL, acs_year = 2023)
link_geo |
Character string specifying the geographic level for linking. Options are |
acs_year |
Numeric value specifying the last year of the data collection period for the target ACS dataset. |
No return value, called for side effects.
The $load_pstrat() method processes and stores input poststratification data.
The object is subject to the same data preprocessing steps as the sample data. See the
More on data preparation
vignette for more information on data processing. For usage examples, refer to the
More examples of R6 classes
vignette.
load_pstrat(pstrat_data, is_aggregated = TRUE)load_pstrat(pstrat_data, is_aggregated = TRUE)
pstrat_data |
An object of class |
is_aggregated |
Logical indicating whether the poststratification data is already aggregated. |
No return value, called for side effects.
The $outcome_map() method creates maps showing the average outcome values
by geography for cross-sectional data, or the highest/lowest temporal average for time-varying data.
The sample and poststratification data must contain geographic information for this method to work.
Check out the More examples of R6 classes
vignette for usage examples.
outcome_map(summary_type = NULL, file = NULL)outcome_map(summary_type = NULL, file = NULL)
summary_type |
Character string, for time-varying data, indicating whether to display the
highest ( |
file |
Optional file path with .html extension to save the interactive map. Expand the hamburger menu in the top right corner of the map to access other export options. |
A highcharter map object showing average outcome measure by geography.
The $outcome_plot() method creates plots of the average outcome values.
Check out the More examples of R6 classes
vignette for usage examples.
outcome_plot(file = NULL, ...)outcome_plot(file = NULL, ...)
file |
Optional file path to save the plot. |
... |
Additional arguments passed to |
A ggplot object showing the outcome measure distribution.
The $pp_check() method creates posterior predictive check plots
to assess model fit by comparing observed data to replicated data from
the posterior predictive distribution. Check out the
More examples of R6 classes
vignette for usage examples.
pp_check(model, file = NULL, ...)pp_check(model, file = NULL, ...)
model |
Fitted MRPModel object. |
file |
Optional file path to save the plot. |
... |
Additional arguments passed to |
A ggplot object showing the posterior predictive check result.
The $preprocess() method runs the preprocessing pipeline
that includes data standardization, filtering, imputation, and aggregation.
See the More on data preparation
vignette for more information about data processing. For usage examples, refer to the
More examples of R6 classes
vignette.
preprocess( data, is_timevar = FALSE, is_aggregated = FALSE, special_case = NULL, family = NULL, time_freq = NULL, freq_threshold = 0 )preprocess( data, is_timevar = FALSE, is_aggregated = FALSE, special_case = NULL, family = NULL, time_freq = NULL, freq_threshold = 0 )
data |
An object of class |
is_timevar |
Logical indicating whether the data contains time-varying components. |
is_aggregated |
Logical indicating whether the data is already aggregated. |
special_case |
Character string specifying special case handling. Options are |
family |
Character string specifying the distribution family for the outcome variable. Options are |
time_freq |
Character string specifying the time indexing frequency or time length for grouping dates (YYYY-MM-DD) in the data.
Options are |
freq_threshold |
Numeric value specifying the minimum frequency threshold for including observations. Values with lower frequency will cause the entire row to be removed. The default value is 0 (no filtering). |
No return value, called for side effects.
The $preprocessed_data() method returns the preprocessed sample data.
Check out the More examples of R6 classes
vignette for usage examples.
preprocessed_data()preprocessed_data()
A data.frame object containing the preprocessed sample data.
The $sample_size_map() method creates interactive choropleth maps
showing data distribution with respect to geography. This method cannot be used
if either the sample or the poststratification data contains no geographic information.
Check out the More examples of R6 classes
vignette for usage examples.
sample_size_map(file = NULL)sample_size_map(file = NULL)
file |
Optional file path with .html extension to save the interactive map. Expand the hamburger menu in the top right corner of the map to access other export options. |
A highcharter map object showing sample size distribution.
Run the Shiny Application
run_app( onStart = NULL, options = list(), enableBookmarking = NULL, uiPattern = "/", launch.browser = TRUE, ... )run_app( onStart = NULL, options = list(), enableBookmarking = NULL, uiPattern = "/", launch.browser = TRUE, ... )
onStart |
A function that will be called before the app is actually run.
This is only needed for |
options |
Named options that should be passed to the |
enableBookmarking |
Can be one of |
uiPattern |
A regular expression that will be applied to each |
launch.browser |
Logical; if TRUE (default) open in an external browser even when running inside RStudio. If FALSE, use RStudio Viewer (when available). |
... |
arguments passed into golem options via |