Enhancing AKI Prediction Model Transportability Across Diverse Healthcare Systems

Predicting acute kidney injury (AKI) in hospital settings is crucial for timely intervention and improved patient outcomes. Machine learning models have shown promise in this area, but their Transportability across different healthcare systems remains a significant challenge. This study delves into the development and validation of an AKI prediction model and introduces a novel metric to assess and predict its transportability.

Utilizing data from a source health system, researchers analyzed 153,821 inpatient encounters spanning from 2010 to 2018. The dataset was rich, encompassing a wide range of clinical facts, averaging 67 facts per patient per day. After rigorous data curation, the dataset comprised 38,920 unique variables and over 142 million observations. Notably, only 1933 variables were consistently present across all six participating sites, highlighting the inherent variability in healthcare data.

The study employed a temporal validation approach, holding out encounters after January 1, 2017, to simulate real-world prospective evaluation. The remaining data was divided into derivation, calibration, and internal validation sets. A Gradient Boosting Tree (DS-GBT) model was trained on the derivation set to predict AKI risk within 48 hours and subsequently evaluated across internal and temporal validation sets.

Figure 1: Model Performance within the Source Health System. Receiver operating characteristic (ROC) curves demonstrate the AKI prediction model’s performance for various AKI severity levels and subgroups within the source health system. The legend details the different validation approaches used: internal validation with and without serum creatinine (SCr) and blood urea nitrogen (BUN), and temporal external validation with and without SCr and BUN.

Clinical Insights from the AKI Prediction Model

To gain clinical understanding, the study utilized Shapley Additive exPlanations (SHAP) values to interpret the DS-GBT model. This allowed for the evaluation of the marginal effects of different predictive features. An interactive dashboard (https://sxinger.shinyapps.io/AKI_shap_dashbd/) provides a comprehensive view of feature importance and marginal effect plots.

Analysis of the top predictors for moderate-to-severe AKI (stage 2 or higher) revealed key clinical variables. Serum creatinine (SCr) levels and changes, vancomycin exposure, minimum blood pressure values and blood pressure variability, age, body mass index (BMI), height, and chest X-ray procedures were identified as critical predictors. Intriguingly, when SCr and BUN were excluded from the model, other factors like vancomycin and piperacillin-tazobactam exposure, blood pressure changes, age, BMI, height, chest x-ray, bilirubin, and anion gap emerged as top predictors.

Figure 2: Top Predictors for Moderate-to-Severe AKI. Marginal effect plots illustrate the relationship between the top 10 most influential variables and the predicted odds ratio for moderate-to-severe AKI within 48 hours. The x-axis represents the raw values of each feature, and the y-axis shows the logarithmic estimate of the odds ratio (SHAP value).

The analysis highlighted non-linear relationships for some key predictors. Elevated SCr, vancomycin exposure, and higher BMI were positively associated with increased AKI risk. Age, minimal blood pressure, and blood pressure changes exhibited U-shaped associations, indicating increased risk at both extremes. Chest X-rays and piperacillin-tazobactam injection also showed positive associations with AKI risk. An individualized risk prediction dashboard (https://sxinger.shinyapps.io/AKI_ishap_dashbd/) was also developed to dissect patient-level risk factors.

External Validation and the Challenge of Transportability

To assess the model’s transportability, external validation was conducted across five additional health systems. Demographic characteristics varied significantly across these sites, highlighting the diverse patient populations and healthcare practices. Notably, while overall AKI rates were lower in the external validation sites, some sites exhibited higher rates of moderate-to-severe AKI.

The study compared the performance of directly transporting the model trained on the source site (Transported Model) versus refitting the model using local data at each target site (Refitted Model). Significant performance variations were observed across sites for the transported model, emphasizing the limitations of direct transportability. Refitting the model on local data consistently yielded improved performance, suggesting the importance of site-specific adaptation.

Figure 3: External Validation and Model Transportability. Receiver operating characteristic (ROC) curves compare the performance of the transported AKI prediction model versus the refitted model on external validation site data for 48-hour AKI predictions. Panels (a), (b), and (c) represent predictions for any AKI, at least stage 2 AKI, and stage 3 AKI, respectively.

Understanding Performance Heterogeneity and Feature Disparities

To understand the reasons behind the performance differences, the study investigated feature selection disparities across sites. Analysis revealed that while some features were consistently important across all sites, many were site-specific, particularly within medication and lab test categories. Commonly important features included serum creatinine, height, BMI, age, and hemoglobin, while others like blood pressure summaries, bilirubin, chloride, potassium, and phosphate were also frequently selected.

Marginal effect analysis of common features further revealed variations in their associations with AKI risk across different sites. For instance, the relationship between age and AKI risk differed across sites, highlighting the influence of site-specific population characteristics on model behavior.

Figure 4: Feature Selection Variability Across Healthcare Systems. This figure illustrates the disparities in feature importance rankings across different healthcare systems. Each dot represents a feature, with the y-axis indicating the proportion of sites that identified the feature as a top predictor, and the x-axis representing the median importance ranking across sites.

Introducing adjMMD: A Metric for Assessing Model Transportability

To address the challenge of predicting model transportability, the researchers developed a novel metric called adjusted maximum mean discrepancy (adjMMD). This metric quantifies the joint distribution differences in feature space between the source and target datasets. adjMMD is an adaptation of the classic MMD metric, widely used in transfer learning to measure data distribution shifts.

The study demonstrated that adjMMD effectively reflects the potential performance deterioration when models are transported. A key finding was the identification of a “minimal feature set”—a small subset of top-ranking variables sufficient to accurately infer AUROC variation based on adjMMD. This significantly enhances the practicality of adjMMD, as target sites do not need to collect all variables to assess model transportability.

The research established a strong correlation between adjMMD and the drop in AUROC (ΔAUC). A linear equation was derived to predict performance change based on adjMMD values calculated using the minimal feature set. This provides a valuable tool for healthcare systems to quickly estimate the expected performance of a transported AKI prediction model without extensive local validation.

The robustness of adjMMD was further validated through various experiments, demonstrating its consistent performance across different derivation sites, model types (DS-GBT and LASSO), and model versions. These findings underscore the potential of adjMMD as a practical and reliable metric for evaluating and predicting the transportability of machine learning models in healthcare.

In conclusion, this study highlights the complexities of transporting AKI prediction models across diverse healthcare systems and introduces adjMMD as a promising metric to assess and predict model transportability. The development of a minimal feature set and a predictive linear equation further enhances the practical utility of adjMMD, paving the way for more efficient and reliable deployment of machine learning models in diverse clinical settings.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *