Predictive Analytics in Healthcare: Applications and Challenges

A much discussed, but largely untapped opportunity to improve health system performance

Nov 13, 2024

In a 2020 article titled The Impact of Machine Learning on Patient Care: A Systematic Review, the authors describe a significant volume of interest, discussions of the promise, and studies that have been conducted on the use of machine learning for clinical use cases.

“According to Hanson et al., AI in medicine in 2001 consisted mainly of small “proof-of-concept” studies, lacking true comparisons with standard of care1. It seems that only modest gains have been recorded over the last twenty years.”

For the last ten years (if not longer), the promise of machine learning and artificial intelligence in healthcare has been at least widely theorized if not fully “prophesied.” However, the actual impact and adoption has been limited despite the technological capability and relative feasibility of implementation.

This article will provide a brief overview of the promise and barriers to adoption of clinical predictive analytics at the point of care. For a general review of artificial intelligence in healthcare, see my other article on the topic, here.

Overview of Predictive Analytics

Predictive analytics is a branch of advanced analytics that uses historical data, statistical algorithms, and machine learning techniques to identify the probability of future outcomes. In healthcare, predictive analytics leverages vast amounts of health data derived from electronic medical records and administrative claims data to forecast patient outcomes, identify at-risk populations, and optimize resource allocation.

Digitization of Health Records and Claims Data

The digitization of health records and claims data is a necessary pre-condition to significant adoption of risk prediction models. Over the past decade, the healthcare industry has undergone a significant digital transformation. The shift from paper-based records to electronic health records (EHRs) and digital claims data has been driven by several factors. First in the U.S., the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 incentivized the adoption of EHRs. Second, technological advancements: Improvements in data storage, processing power, and software capabilities have made digital health records more feasible and cost-effective. Third, demand for interoperability has facilitated the need for seamless information exchange between healthcare providers has pushed for standardized digital formats (most would argue that seamless information exchange does not happen still and is actually a barrier to the adoption of predictive analytics in many circumstances).

As a result, by 2021, over 90% of office-based physicians and nearly all hospitals in the U.S. had adopted EHRs, creating a rich source of data for predictive analytics.

Non-computational Prediction

Non-computational risk prediction has been used in clinical practice for many years via “rules of thumb” and simple counting based algorithms. Examples include the APGAR score, which comprises five components: 1) color, 2) heart rate, 3) reflexes, 4) muscle tone, and 5) respiration. These categories are given am integer score from 0 - 2. The score assesses and assigns a number to these five clinical signs. The Apgar score provides a well-validated and simple method for reporting the status of the newborn infant immediately after birth.

There are many examples of these simple risk assessments and systems throughout care delivery. The promise of a system of fully-digitized patient records at the point of care and the application of computational approaches allows for the consumption of more predictor variables to improve the predictive accuracy and precision of an algorithm. In many circumstances, the simple counting based algorithms are accurate but not precise in that they cast a wide net to help clinicians ensure they do not miss patients in need of extra care.

Computational approaches can improve the precision of a prediction to help better allocate scarce resources for certain outcomes of interest and reduces the time and staff required to make a prediction or to assess risk. In the field of prediction, the value of an algorithm is typically measured by the following metrics: false positive rate, false negative rate, true positive rate, and true negative rate.

Applications of Predictive Analytics in Care Delivery

Predictive analytics can be applied across the healthcare delivery continuum to enhance the clinical and cost-effectiveness of services.

Early disease detection: Identifying patients at risk of developing chronic or acute conditions or at risk of deterioration. One area that has seen a great deal of interest and research is predicting admission to the ICU after a patient is first admitted to the hospital. This can prioritize that patient for close monitoring and extra support to prevent deterioration.

Predicting emergency department and hospital utilization: In the outpatient setting, there is a great deal of interest in identifying patients who may be admitted to the hospital before it happens to prevent that outcome. This is commonly employed by population health-focused and accountable care organizations to inform care management and coordination activities. It has grown to be more common, but it is still not widely employed.
Hospital readmission prevention: Predicting which patients are likely to be readmitted within 30 days of discharge.
Resource allocation: Forecasting patient volume to optimize staffing and inventory management.
Personalized treatment plans (Precision Medicine): Tailoring interventions based on individual patient characteristics and history. Using medical records, genetic data, and other data sources to select the right drugs, treatments, and programs for patients to increase the probability of therapeutic success.
Population health management: Identifying high-risk groups for targeted preventive care and to achieve higher scores on HEDIS and STAR quality metrics.

Methodologies for Risk Prediction (Statistical Models)

If you are interested in digging deeper into this topic, the following statistical methods are critical to understand. Risk prediction methodologies in healthcare span a spectrum of complexity and of how much computational power is required:

Linear Regression: Simple and interpretable, used for straightforward predictions based on a few variables. This has been the primary method of developing risk stratification algorithms to this point and is a common research methodology.
Logistic Regression: Commonly used for binary outcomes (e.g., readmission risk).
Decision Trees: Provide visual representation of decision-making processes.
Random Forests: Ensemble method that combines multiple decision trees for improved prediction accuracy.
Support Vector Machines and Gradient Boosting Machines: Effective for high-dimensional data and non-linear relationships. This is a machine learning methodology that is common in the modern clinical risk prediction literature and has been deemed to show a great deal of promise.
Neural Networks: Can capture complex non-linear relationships in data. Neural networks can also suffer from a blackbox effect in some cases and requires a great deal of computational power and expertise to create.
Deep Learning: Advanced neural networks capable of processing vast amounts of unstructured data, including images and text from EHRs. Deep learning models are often a “black box” in how they achieve their predictions, so there are issues associated with trust and adoption in these cases.

Real-World Use Cases of Risk Prediction

Sepsis Prediction at Johns Hopkins Hospital

Johns Hopkins Hospital implemented a predictive model called the Targeted Real-time Early Warning System (TREWS) to identify patients at risk of sepsis. The system analyzes EHR data in real-time, considering over 500 variables to calculate a patient's sepsis risk score. TREWS has demonstrated a 19% reduction in sepsis-related deaths and a significant decrease in hospital length of stay.

Heart Failure Readmission Prediction at Partners HealthCare

Partners HealthCare developed a machine learning model to predict 30-day readmissions for heart failure patients. The model uses data from EHRs, including lab results, medications, and comorbidities. It achieved an area under the curve (AUC) of 0.72, outperforming traditional risk scores. The system allows care teams to focus resources on high-risk patients, potentially reducing readmission rates.

Risk of Bias

As predictive analytics becomes more prevalent in healthcare, concerns about bias have emerged. This article discusses bias issues in the use of machine learning to read CT scans, which is a promising use of this technology to improve the efficiency of care delivery.

Data representation: If certain populations are underrepresented in the training data, models may perform poorly for these groups. The data scientists like to say “garbage in, garbage out.” The quality of the underlying training or research data is critical to real-world performance.
Historical bias: Models trained on historical data may perpetuate existing biases in healthcare delivery.
Feature selection: The choice of variables used in predictive models can inadvertently introduce bias. Predictor variable selection is critical as there are many confounding and interrelated concepts in EHR and claims data that can affect the real-world performance of a model.
Algorithmic bias: Some algorithms may be more prone to amplifying biases present in the data.

To mitigate these risks, it's crucial to:

Ensure diverse and representative training data
Regularly audit models for performance across different demographic groups
Ensure real-world performance monitoring and continuous quality improvement processes
In some circumstances, have regulatory oversight of algorithm performance
Involve diverse stakeholders in the development and deployment of predictive models
Involve clinicians of many levels to test assumptions about predictor variables
Use interpretable models when possible to allow for scrutiny of decision-making processes. The “black box effect” can limit trust from clinical users and potentially introduce risk.

Barriers to Adoption

Despite its potential, several barriers hinder the widespread adoption of predictive analytics in healthcare:

Data quality and interoperability: Inconsistent data formats and quality across different healthcare systems can limit the effectiveness of predictive models.
Privacy and security concerns: The sensitive nature of health data requires robust security measures and compliance with regulations like HIPAA.
Integration with clinical workflows: Predictive tools must be seamlessly integrated into existing workflows to be effective and widely adopted. This requires buy-in from electronic medical record vendors to allow algorithms to produce the necessary user interface.
Lack of technical expertise: Many healthcare organizations lack the in-house expertise to develop and maintain sophisticated predictive models.
Ethical and legal considerations: The use of AI in healthcare decision-making raises complex ethical and legal questions that are still being addressed. Liability is a key issue in this space as well as regulatory oversight of the performance of algorithms.
Cost of implementation: The initial investment in infrastructure and expertise can be substantial for healthcare organizations.
Lack of reimbursement: The reimbursement system is largely built on the delivery of services in a time-based payment system. Medical devices and drugs have separate reimbursement, but the payment environment, encompassing both government and private payors, has not caught up to the modern software-based care delivery environment. Reimbursement models will be requires to offset the adoption and maintenance costs of these algorithms.

Conclusion

Predictive analytics holds immense potential to transform healthcare delivery by enabling proactive, data-driven decision-making. The digitization of health records over the past decade has created a rich data ecosystem that can power sophisticated predictive models. From simple linear regressions to complex deep learning approaches, a range of methodologies are being applied to predict patient outcomes and optimize care delivery.

Real-world applications, such as sepsis prediction and readmission risk assessment, demonstrate the tangible benefits of these technologies. However, the risk of bias in predictive models and various barriers to adoption present ongoing challenges that must be addressed. However, this is always the case in any new product launch. It is, however, especially important in healthcare due to the life an death nature of the services provided.

As the healthcare industry continues to evolve, overcoming these challenges will be crucial to realizing the full potential of predictive analytics. By doing so, we can move towards a future of more personalized, efficient, and effective healthcare delivery that improves outcomes for all patients.

If you like Health Tech Happy Hour, please take a moment to subscribe, its free!

Hanson CW, Marshall BE. Artificial intelligence applications in the intensive care unit. Crit Care Med 2001;29:427–35.

Health Tech Happy Hour