In January, Google released a research paper entitled, “Scalable and accurate deep learning for electronic health records,” by Alvin Rajkomar, Eyal Oren, et al. It’s not formally published yet, but we’re glad to have the chance to review and comment.
The researchers, including participants from Stanford University and the University of Chicago, describe building deep learning models based on “uncurated” data from hospital electronic health records (EHRs). Specifically, they evaluated prediction of in-hospital mortality, 30-day unplanned readmissions, and length of stay. They also predicted ICD-10 codes at discharge. The authors state, “Our study’s approach uses a single data representation of the entire EHR as a sequence of events, allowing this system to be used for any prediction that would be clinically or operationally useful with minimal additional data preparation.” The authors also assert, “Using the entirely of a patient’s chart for every prediction does more than promote scalability, it exposes more data with which to make an accurate prediction.”
For this predictive modeling study, Google used de-identified, structured EHR records for 216,221 adult patients who had been hospitalized for at least 24 hours. Input from one of the hospitals also included free-text clinical notes. The data was from University of California San Francisco Medical Center (2012-2016) and University of Chicago Medicine (2009-2016).
Overall, these models had good predictive performance, with an area under the receiver operator curve (AUROC) of 0.90 for predicting inpatient mortality at admission and 0.75-0.76 for predicting 30-day readmission at the time of discharge.
While this is an impressive demonstration of deep learning methods applied to EHR data, it is unclear if the reported approach would be useful in the delivery of real-time care. Ultimately, the goal of predictive models is to provide new, actionable insights, not to confirm pre-existing knowledge or beliefs, however helpful that may be.
For a predictive model to affect outcomes in a real-world healthcare setting, it must meet the following criteria (the trifecta of predicting):
The prediction must be correct. The prediction must be timely and a suitable intervention possible. The prediction must provide “new news.” The approach from this research group focuses on the first criteria, but it doesn’t explicitly account for the second and third.
For example, while prediction of inpatient mortality using data within 24 hours of admission may provide advance warning of mortality risk, it is important to evaluate the model’s performance on a continuous basis throughout the patient’s stay, since understanding a patient’s evolving condition is an important component in timely care delivery.
It is even more important to ensure the prediction provides new news. This is where the strategy of including too much data can backfire – it may improve the predictive power of a model, yet limit its utility in providing new insights. Consider that physician orders and notes, which reflect physician judgment, are part of the model input. While adding these inputs will improve performance, it may not help the physician. After all, is the goal for the physician’s knowledge to improve the model, or for the model to improve the physician’s knowledge?
To take an extreme example, a physician’s order for a palliative care consult would be part of the input used to predict mortality. But, this factor would not give the physician new and early insight into the patient’s mortality risk.
PeraHealth has built models using the Rothman Index (RI) that address these same problems and achieve equally strong performance but do not include inputs reflecting existing diagnostic or treatment knowledge. As part of our process, we explicitly seek to provide new information to the physician or nurse. In fact, PeraHealth’s approach to data capture is entirely patient-centric – without inputs related to provider opinion or action.
We recognize the significant achievement of processing EHR data, in all its complexity, with little curation and little need for expert opinion. While Google’s work is an interesting advance in applying deep learning models to EHR data, the performance must be considered in light of the goals. If the purpose is to assist physicians and nurses with treatment decisions, we suggest the performance numbers overestimate the model’s value.
Deep learning models have the advantage of allowing the modeler to simply specify the input data and let the model determine the optimal functional form and variable weightings. However, in practice, large advances in prediction performance tend not to be related to model sophistication, but rather to gaining a deeper understanding of the matter at hand and subsequent acquisition, or transformation, of the relevant input data.
To develop PeraHealth’s Rothman Index, we started by studying the problem and then working to understand what model restrictions were appropriate to ensure the model would provide additional value to providers. The result was a heuristic model synthesizing risk across a range of physiological measures, including labs, vitals, and importantly, nursing assessments. Nursing assessments capture deterioration in a patient’s functional condition which generally precedes derangement of vital signs or lab test results. The result is a continuous measure of patient condition, integrated with the EHR, and computed on a real-time basis across all conditions, diseases, and care settings. This approach has set the Rothman Index apart as a practical tool to help physicians improve care – the gold-standard when it comes to clinical decision support.
We look forward to further work from the Google group, but we caution that sometimes more data is not the pathway to a better tool.