Run missingness benchmark (target-masking with LAG features)
Source:R/run_bench.R
run_missingness_benchmark.RdThis function is deprecated. Use
run_missing_glucose_imputation() for real missing glucose values.
This function implements missingness benchmarking by masking the target column at various rates and evaluating imputation and predictive performance of MICE, Random Forest, and KNN methods. Additionally, it includes LAG features of the target variable to assess their impact on imputation and prediction. The function returns a data.frame summarizing the Mask Rate, Method, MRD (Mean Relative Difference), and Masked Count for each method and mask rate.
Arguments
- data
A data.frame (or object coercible to data.frame), OR a path to a CSV file.
- target_col
Single character string: name of the outcome column to mask/impute (e.g., "LBORRES", "Glucose").
- feature_cols
Character vector of base feature columns (excluding the target). If NULL, uses all columns except
target_col.- id_col
Character string: subject identifier column used for LAG features (default "USUBJID").
- time_col
Character string: time-ordering column used for LAG features (default "TimeSeries").
- mask_rates
Numeric vector in (0, 1): fraction of rows to mask (default 0.05, 0.10, 0.20, 0.30, 0.40).
- mask_type
One of
"random"or"block".- rf_n_estimators
Integer: number of trees for random forest (default 400).
- knn_k
Integer: number of neighbors for kNN (default 7).
- seed
Integer: random seed used for MICE and models (default 42).
- lag_k
Integer vector of lags to compute on the target (default c(1,2,3)).
- add_rollmean
Logical: add rolling mean feature of prior target values (default TRUE).
- roll_window
Integer: rolling window length for rollmean (default 3).
Details
LAG features are computed using data.table::shift() (fast lag/lead). The rolling mean
is computed with data.table::frollmean() using align="right" and fill=NA.