Skip to contents

Overview

CGMissingDataR includes an optional Shiny app for interactive missing glucose imputation. The app is a point-and-click interface around the main package function:

The app is useful when users want to:

  • upload a CSV file without writing R code;
  • choose the target glucose, subject ID, timestamp, and feature columns from a user interface;
  • load built-in example data sets for demonstration;
  • inspect the observed missingness before running imputation;
  • run the imputation workflow;
  • preview only the rows where glucose was originally missing and then imputed;
  • download the completed data as a CSV file.

The Shiny app does not implement a separate imputation algorithm. It calls run_missing_glucose_imputation() internally and returns the same type of completed data frame as the command-line workflow.

Installation

Install CGMissingDataR from CRAN with:

install.packages("CGMissingDataR")

The app requires the optional R package shiny. If Shiny is not already installed, install it with:

Then load the package:

Launching the app

Launch the app with:

run_cgmissingdata_app()

During package development, after running devtools::load_all(), the same launcher can be used:

devtools::load_all()
run_cgmissingdata_app()

The app is bundled inside the installed package, typically under:

system.file(
  "shiny",
  "cgm_imputation_app",
  package = "CGMissingDataR"
)

Users normally do not need to access this directory directly. The run_cgmissingdata_app() launcher finds it automatically.

Input options

The app provides two ways to load data.

Upload a CSV file

Use the Browse button to upload a CSV file containing CGM data. The file should contain, at minimum, columns corresponding to:

Role Example column App selector
Subject identifier USUBJID Subject ID column
Glucose value LBORRES Target glucose column
Timestamp Time Timestamp column
Additional predictors AGE, hba1c Feature columns

After the file is uploaded, the app displays a preview of the uploaded data and populates the column-selection controls.

Load built-in example data

The app can also load built-in example data sets for demonstration. These are useful for quickly showing how the workflow behaves without requiring users to upload their own data.

The example data sets are intended to include:

Example data Description
CGMExmplDat5Pct Example CGM data with about 5% missing glucose values.
CGMExmplDat10Pct Example CGM data with about 10% missing glucose values.

After selecting an example data set and clicking Load example data, the app uses that data set exactly as if it had been uploaded by the user.

Selecting columns

Once data are loaded, select the columns that map to the imputation function.

Target glucose column

Choose the glucose column with missing values to impute. In the included example data, this is usually:

LBORRES

The original target column is preserved in the returned data. Values that were originally missing remain NA in this original column. Completed glucose values are written to a new column named:

imputed_glucose_value

Subject ID column

Choose the column identifying each subject or participant. In the example data, this is usually:

USUBJID

The subject ID is used for sorting, lag feature creation, rolling-mean feature creation, and subject-level time handling.

Timestamp column

Choose the raw timestamp column. In the example data, this is usually:

Time

The imputation function creates or reuses a numeric TimeSeries column from the timestamp values. Common timestamp formats are supported, including colon-separated, hyphen-separated, slash-separated, ISO-style, and POSIXct values.

Feature columns

Choose additional predictor columns. In the example data, these commonly include:

AGE
hba1c

Feature columns should be numeric or coercible to numeric. If a SEX column is present, the underlying function can encode it internally.

Missingness summary card

The app includes a missingness summary card beside the uploaded data preview. After a target glucose column is selected, this card shows:

  • the percentage of missing values in the target column;
  • the number of missing rows;
  • the total number of rows;
  • a warning style when missingness is greater than the chosen threshold, such as 20%.

This card is intended as a quick data-quality check before running the imputation workflow. Higher missingness does not necessarily mean imputation cannot be run, but users should interpret results carefully when a large portion of the target glucose column is missing.

Backend selection

The app supports the same backends as run_missing_glucose_imputation().

Backend Description Recommended use
mice R-native backend using the R package mice. Default, CRAN-safe workflow.
sklearn Optional Python-compatible backend through reticulate. Closest agreement with the Python reference workflow.

MICE backend

The default backend is:

imputer_backend = "mice"

This backend does not require Python and is the safest choice for most users. It is also the backend used in CRAN-safe examples and tests.

Optional sklearn backend

The optional Python-compatible backend is:

imputer_backend = "sklearn"

This path sends the data frame to Python through reticulate. Python then uses:

  • pandas for data-frame operations;
  • scikit-learn for IterativeImputer;
  • statsmodels for ARIMA;
  • Python xgboost for XGBoost regression.

To use the Python backend, install reticulate and declare the Python requirements before launching or running the app:

install.packages("reticulate")

reticulate::py_require(c(
  "numpy",
  "pandas",
  "scikit-learn",
  "statsmodels",
  "xgboost"
))

The Python backend is optional. It is not required for installing or loading the package.

Running imputation

After loading data and selecting columns, click Run imputation.

Internally, the app calls code equivalent to:

out <- run_missing_glucose_imputation(
  data = uploaded_data,
  target_col = selected_target_col,
  feature_cols = selected_feature_cols,
  id_col = selected_id_col,
  time_col = selected_time_col,
  imputer_backend = selected_backend,
  use_arima_if_missing_leq = selected_threshold,
  xgb_nrounds = selected_xgb_rounds,
  seed = selected_seed,
  export = FALSE
)

The returned object is a data frame. The most important columns are:

Column Meaning
Original target column Original glucose values; originally missing values remain NA.
TimeSeries Numeric time feature derived from the timestamp column.
imputed_glucose_value Completed glucose values after imputation.
imputation_method Final method used, such as MICE+ARIMA or MICE+XGBoost.
missing_rate Original missingness rate of the target glucose column.

Previewing results

After imputation, the app displays a preview of rows where the original target glucose value was missing. This is more informative than showing only the first few rows of the completed data set because it lets users directly inspect the newly imputed values.

For example, the preview is based on logic like:

imputed_rows <- out[is.na(out[[target_col]]), , drop = FALSE]
head(imputed_rows, 15)

The full completed data frame remains available for download.

Downloading results

Use the Download imputed CSV button to save the completed data set. The CSV contains all returned columns from run_missing_glucose_imputation(), including:

  • the original glucose column;
  • TimeSeries;
  • imputed_glucose_value;
  • imputation_method;
  • missing_rate;
  • any original input columns retained by the workflow.

Internal lag and rolling-mean columns are used during imputation but are removed from the returned data frame before display or download.

Troubleshooting

The app does not launch

If you see an error saying that Shiny is not installed, run:

Then restart R and try:

run_cgmissingdata_app()

No column choices appear

Column choices appear only after data are loaded. Upload a CSV file or load one of the built-in example data sets.

Imputation fails because a timestamp cannot be parsed

Check the timestamp column selected in the app. The values should be parseable dates or datetimes, for example:

"2020:01:16:00:00"
"2020-01-16 00:00:00"
"2020/01/16 00:00:00"
"2020-01-16T00:00:00"

If the wrong column was selected as the timestamp column, select the correct column and rerun imputation.

Python backend fails because a Python module is missing

If imputer_backend = "sklearn" fails because Python packages are missing, run:

reticulate::py_require(c(
  "numpy",
  "pandas",
  "scikit-learn",
  "statsmodels",
  "xgboost"
))

Then restart R and launch the app again.

Downloaded data contain NA in the original glucose column

This is expected. The original target column is intentionally preserved. The completed values are stored in:

imputed_glucose_value

Developer notes

The recommended package structure for the app is:

inst/
└── shiny/
    └── cgm_imputation_app/
        └── app.R

The launcher should live in an exported R function, for example:

run_cgmissingdata_app <- function() {
  app_dir <- system.file(
    "shiny",
    "cgm_imputation_app",
    package = "CGMissingDataR"
  )
  shiny::runApp(app_dir, display.mode = "normal")
}

Because the app is optional, shiny should usually be listed in Suggests, not Imports, unless the package requires Shiny for normal operation.