Welcome to CGMissingDataR • CGMissingDataR

Install the released version from CRAN:

install.packages("CGMissingDataR")

Or install the development version from GitHub:

install.packages("devtools")
devtools::install_github("ZhangLabUKY/CGMissingDataR")

CGMissingDataR imputes missing glucose values in continuous glucose monitoring (CGM) data. The main public workflow is:

run_missing_glucose_imputation()

The function handles both explicit missing glucose values coded as NA and implicit missing readings caused by timestamp gaps. It accepts a data frame with a subject identifier, timestamp column, glucose column, and optional subject-level or visit-level covariates. It returns the user’s original columns plus imputed_glucose_value, leaving the original glucose column unchanged.

What the imputation workflow does

run_missing_glucose_imputation() performs the following steps:

reads a data frame or CSV file;
parses and sorts timestamps within each subject;
regularizes each subject to an equal interval_minutes timestamp grid;
converts missing timestamp gaps into explicit rows with target_col = NA;
encodes SEX when present;
creates internal time, lag, and rolling-mean glucose features;
imputes the target and feature matrix;
chooses the final model from the post-regularization target missing rate:
- MICE+ARIMA when missing rate is <= 0.05,
- MICE+XGBoost when missing rate is > 0.05;
returns a single completed data frame containing the original input columns plus imputed_glucose_value.

Internal columns such as TimeSeries, TimeDifferenceMinutes, lag features, rolling means, imputation method labels, and missingness-tracking flags are used for modeling but are not returned.

Because timestamp gaps are converted into explicit rows before imputation, the returned data frame may contain more rows than the input data when readings are absent from the expected CGM sampling grid.

The default R-native backend uses the R package mice. For closest agreement with the Python reference workflow, install reticulate and use the optional Python backend.

install.packages("reticulate")

The Python backend uses these Python packages through reticulate:

reticulate::py_require(c(
  "numpy",
  "pandas",
  "scikit-learn",
  "statsmodels",
  "xgboost"
))

Basic use

library(CGMissingDataR)

data("CGMExmplDat10Pct")

out <- run_missing_glucose_imputation(
  CGMExmplDat10Pct,
  target_col = "LBORRES",
  feature_cols = c("AGE", "hba1c"),
  id_col = "USUBJID",
  time_col = "Time",
  imputer_backend = "mice"
)

head(out[c(
  "USUBJID",
  "Time",
  "LBORRES",
  "AGE",
  "hba1c",
  "imputed_glucose_value"
)])

The original target column is not overwritten. Rows that were missing in LBORRES, including rows inserted from timestamp gaps, remain missing there; the completed value is stored in imputed_glucose_value.

missing_rows <- is.na(out$LBORRES)
head(out[missing_rows, c(
  "USUBJID",
  "Time",
  "LBORRES",
  "imputed_glucose_value"
)])

imputed_glucose_value is returned as a continuous numeric model estimate. Users who need whole-number glucose values for reporting can round after imputation:

out$imputed_glucose_value_rounded <- round(out$imputed_glucose_value)

Timestamp gaps

Raw CGM exports may represent missingness in two ways:

a row exists but the glucose value is NA;
a timestamp is absent entirely, causing a gap in the expected sampling grid.

For example, if a subject’s readings jump from 00:05 to 00:30, the function internally creates the missing 5-minute rows at 00:10, 00:15, 00:20, and 00:25, sets the target glucose value to NA, and then imputes those values using the same workflow as explicit missing glucose values.

Bundled Shiny app

CGMissingDataR also includes a small Shiny app for users who prefer an interactive workflow. The app lets users upload a CSV file or load one of the built-in example data sets, choose the target glucose, subject ID, timestamp, and feature columns, run run_missing_glucose_imputation(), preview rows with missing glucose values that were imputed, and download the completed data as a CSV file.

Launch the app from R with:

run_app()

The app supports the same two imputation backends as the main function:

mice, the default CRAN-safe R backend;
sklearn, the optional Python-compatible backend using reticulate.

The Shiny app is optional. If it is not already installed, install Shiny with:

install.packages("shiny")

For package developers, the app is stored under inst/shiny/cgm_imputation_app/ and is launched through the exported run_app() helper.

Optional Python-compatible backend

Use imputer_backend = "sklearn" to run the strict Python-compatible path. In that path, reticulate sends the data to Python, where pandas, scikit-learn, statsmodels, and Python xgboost perform the preprocessing and calculations. The completed pandas data frame is then converted back to R.

out_py <- run_missing_glucose_imputation(
  CGMExmplDat10Pct,
  target_col = "LBORRES",
  feature_cols = c("AGE", "hba1c"),
  id_col = "USUBJID",
  time_col = "Time",
  imputer_backend = "sklearn"
)

The Python backend is optional. It is not required for package installation, loading, or CRAN examples.

Learn more

The main vignette contains a detailed walkthrough of data requirements, timestamp regularization, return columns, backend selection, optional Python setup, and troubleshooting:

https://zhanglabuky.github.io/CGMissingDataR/articles/How-To-Use-CGMissingDataR.html

A separate Shiny app vignette walks through the interactive interface:

https://zhanglabuky.github.io/CGMissingDataR/articles/Using-the-CGMissingDataR-Shiny-App.html

Changelog

The changelog is available at:

https://zhanglabuky.github.io/CGMissingDataR/news/index.html