Artificial Intelligence, Machine Learning, and High Cost Utilizers of Care
Using computational risk prediction to solve one of health care's most significant and perplexing problems
The existing and growing burden of preventable hospitalizations represents one of health care's most pressing challenges. In the United States alone, ambulatory care-sensitive conditions (ACSCs) - medical issues that could potentially be prevented through effective outpatient care - result in millions of hospital admissions annually, costing the health care system billions of dollars. Specifically, preventable hospitalizations (PPH) and ED visits (PPED) cost the US health care system approximately $30 - $100 billion annually. Estimates suggest approximately 13% of adult and 8% of pediatric hospitalizations are potentially preventable.
This is a problem outside of the United States as well as recent research estimates the cost of these preventable hospitalizations at approximately €3.5 billion per year (a little under 1% of Germany’s heath spending) in Germany's health system alone, with costs increasing nearly 1% annually.
Another interesting layer to add into this question is the amount of overlap between potentially preventable events and the “high-cost utilizers (HCUs).” HCUs are commonly referred to as the 5% of patients who make up over 50% of health spending.
The trouble, from a public health perspective, is identifying these preventable, high cost episodes before they happen, but it is important to note that many high cost episodes are NOT preventable. There are car accidents, certain other traumas, unexpected heart attacks, instances of pneumonia, and child birth complications. These episodes may be preventable by measures outside of the control of medical care such as better safety procedures, airbags, less distracted driving, and a lifetime of better health behaviors, but they are not immediately preventable had a qualified health care provider known in advance. Sometimes, bad things, expensive things, happen.
Anyone who is working in medicine, health policy, or care delivery are generally aware that the US spends far more per capita on health care than other OECD countries and has far worse performance on key quality indicators. We desperately want to focus efforts on areas that have the best return on investment in the quest to reduce per capita cost and improve health outcomes.
Importantly, the human and financial toll of these preventable hospitalizations extends beyond direct medical costs. Patients experience disrupted lives, potential complications, and degraded health outcomes. Health care systems face stretched resources, reduced capacity for necessary acute care, and significant financial burdens. Despite recognition of this challenge, current clinical operations, particularly in primary care settings, struggle to proactively identify and intervene with high-risk patients before they require hospitalization.
The Curious Case of the High Cost Utilizers
This part of the article is based on a summary of one of my favorite articles ever written about health care. Everyone in health care who has heard of the 5% HCU problem immediately says something to themselves like, “oh, well if we could just really target care toward the 5% of highest cost patients, then we could significantly reduce per capita spending in the United States and save a whole bunch of money.” There are other costs to be reduced and episodes to be avoided within the other 95%, but that 5% of patients seems so very achievable. Yet, they are actually so very elusive.
A very well written and groundbreaking analysis from Robert Pearl and Philip Madvig from Kaiser Permanente (KP) reveals important insights about the sickest 5% of patients who account for 50% of all healthcare spending. Contrary to common assumptions, these high-cost utilizers are not a homogeneous group that can be managed through traditional disease management programs that have been heralded as the solution for many years. They are much, much more complicated than that.
KP's research identified three distinct cohorts, broken into thirds, with very different needs and opportunities for intervention:
The first third consists of patients with manageable chronic conditions like diabetes, stable heart failure, and mental illnesses. While these patients frequently use emergency rooms and require periodic hospitalizations, their conditions can potentially be better controlled through high-quality and accessible outpatient care. This is why good primary care is so critical and is an area in which the US fails1. However, the makeup of this group changes significantly year to year, complicating targeted interventions. This is the group that often overlaps with the potentially preventable ED visits and hospitalizations group. Because they are likely preventable, this is where advanced identification methods can come into play to make interventions like disease management programs, care coordination, and social needs interventions more cost-effective.
The second third comprises patients experiencing one-time catastrophic health events - major trauma, extremely premature births, or sudden life-threatening illnesses like acute cancer. This group accounts for approximately 35%2 of the spending on high-cost patients. Since these events are largely unpredictable, there is limited opportunity for prevention. Many of these patients either recover or pass away, moving them out of the high-cost category the following year. There are opportunities to reduce cost in this group through effective care, but the episodes themselves are likely necessary.
The final third includes patients with severe chronic conditions like end-stage renal disease who require expensive ongoing treatment. Many began in the first group but saw their conditions deteriorate over time. These patients account for another 35% of the spending on high-cost utilizers. While disease management, care coordination, and other interventions can help make their care more efficient, their basic need for intensive medical intervention cannot be entirely eliminated. This group and the prior group are where systemic efforts to curb unreasonably high prices, such as those discussed by Anderson, may be most beneficial to reduce per capita spending. Additionally, interdisciplinary care models, like specialty accountable care organizations, and medical homes tailored to these conditions may also be beneficial.
This tripart segmentation explains why traditional disease management programs have struggled to reduce costs - they aren't designed to address the heterogeneous nature of high-cost utilizers or account for how the population changes year to year. The real opportunity lies in better managing the first group while preventing them from progressing to the third group. However, this requires casting a much “wider net.” Among Medicare beneficiaries over 65, about 27.5 million have five or more chronic conditions that could potentially be controlled, but only 4-5 million end up as high-cost patients in any given year. This is the key area in which machine learning and AI-based technologies can make a huge difference—screening the 27.5 million for rising-risk accurately and precisely and presenting that information to intervention programs. Accuracy and precision are critical as one of the major critiques of disease management programs is that their operational costs can exceed any reductions in medical expenses they prevent. Casting a “wide net” is expensive and better patient risk prediction can narrow the net.
KP's analysis suggests that effectively addressing high-cost utilizers requires moving beyond conventional disease management to more comprehensive primary care models that can monitor and support a broader population of at-risk patients. Their success with this approach demonstrates that while the challenge is complex, it is not insurmountable with the right care delivery model. KP benefits from an operationally- and IT-integrated model where they are both payor and provider in most circumstances, so that tightly controlled quality management and coordination often allows them to operationalize interventions more effectively than other organizations.
Traditional approaches to high-risk patient identification rely heavily on clinicians' ability to manually identify warning signs during periodic (and sometimes uncommon) visits - an approach that often fails to catch deteriorating conditions between appointments. Additionally, the massive volume of patient data generated through electronic health records (EHRs) has made it increasingly difficult for providers to comprehensively process all relevant risk factors for each patient. This is where modern computational methods can work to augment the capabilities of clinicians.
Two recent studies demonstrate how artificial intelligence and machine learning could transform our ability to predict and prevent these hospitalizations. The research shows promising results in both general medical and oncology contexts.
Study 1: Big Data Analytics for Preventable Hospitalizations
The first study, published in the International Journal of Environmental Research and Public Health, developed prediction models to identify patients at high risk for ambulatory care-sensitive (e.g., potentially preventable) hospitalizations in Germany. The researchers compared traditional statistical approaches (logistic regression) with machine learning methods (Random Forest) using real-world administrative claims3 data from over 69,000 patients.
Their models achieved strong predictive performance, with c-values (a measure of predictive accuracy) above 0.75 for both approaches. The Random Forest model performed slightly better, particularly in identifying high-risk patients, with a sensitivity of 50% for detecting those who would experience preventable hospitalizations in the following year. This is likely better than clinical judgement alone when clinicians are seeing >20 patients per day in the clinic.
Key risk factors identified included:
Advanced age
Previous hospitalizations
Higher levels of long-term care needs
Specific diagnoses including maternal disorders, mental health conditions, and circulatory system diseases
The model's success demonstrates the potential for machine learning to process complex combinations of risk factors more effectively than traditional statistical methods. Some newer deep-learning-based approaches are also being explored and showing even better predictive attributes.
Study 2: Machine Learning for Cancer Treatment Complications
The second study, conducted at the University of California San Francisco, focused specifically on preventing emergency visits and hospitalizations among cancer patients receiving systemic infusion therapy. The researchers developed and prospectively validated three machine learning approaches: LASSO, Random Forest, and gradient boosted trees (GBT). The prospective evaluation component is important because it shows that models validated on training data can translate those results to the real-world clinical environment.
Testing the models on over 1,000 systemic therapy treatments, they authors achieved impressive predictive accuracy. The gradient boosted tree model performed best, with an AUC of 0.78, corresponding to 77.6% sensitivity and 61.9% specificity in identifying patients who would require acute care within 30 days. Time windows are very important components of these models and to intervention programs. Predicting out further allows for more time to intervene, while predicting sooner may improve precision in some circumstances.
This real-world validation demonstrates the practical potential for machine learning to identify high-risk patients before complications develop, potentially allowing for preventive interventions that could avoid emergency department visits and hospitalizations. Prospective evaluations are critical to facilitating adoption by clinics.
Implications for Healthcare Delivery
These studies suggest that artificial intelligence and computational patient risk prediction could enable a fundamental shift from reactive to proactive care delivery. By identifying high-risk patients before they require hospitalization, healthcare systems could:
1. Implement VERY targeted preventive interventions for high-risk patients. They need to be VERY targeted because it allows for actual cost reduction when taking program operations cost into account.
2. More efficiently allocate limited healthcare resources to patients that are most likely to lead to downstream cost. However, this is where incentives from payment models must incentivize hospitals and health systems to prevent their own revenue generating events. Why would a health system want to prevent a high revenue additional hospitalization if they get paid handsomely for it?
The models should be integrated into clinical workflows via electronic health records to provide real-time risk assessments, allowing care teams or central programs to proactively reach out to high-risk patients and adjust care plans accordingly.
Implementation Challenges and Solutions
While the potential benefits are significant, several key challenges must be addressed for successful implementation:
Data Access and Integration
Electronic health record data access remains a significant hurdle. Many healthcare organizations operate with siloed data systems that make it difficult to aggregate the comprehensive patient information needed for accurate risk prediction. Additionally EPIC systems and Cerner/Oracle now control most health data in the United States, so they are major gatekeepers to innovation in this space. If you want to implement a model, you will need consent from these companies in many circumstances.
Solutions require:
Development of improved data sharing frameworks between healthcare providers (and enforcement of data blocking laws and rules)
Standardization of data formats and definitions across systems (e.g., like the FHIR and HL7 protocols)
Investment in technical infrastructure for secure data integration (e.g., data warehouses pulling from multiple systems of record)
Clear protocols for patient privacy protection and consent
Reimbursement Models
Current health care payment systems often don't adequately support preventive interventions. It is still the case that most payment models do not have a clear relationship to improved outcomes. Without reimbursement, effective technologies in this space may never scale into the clinic. New reimbursement models are needed that:
Provide financial incentives for preventing hospitalizations
Support proactive outreach to high-risk patients
Fund the implementation and maintenance of predictive systems
Recognize the value of preventive care coordination and align with health outcomes. This space has been an area of innovation for the last 15 years, but most health care delivery is still in a fee-for-service model
Clinician Trust and Workflow Integration
Healthcare providers must trust and effectively use AI-powered predictions for the system to succeed. This requires:
Transparent model development and validation processes (prospective, clinical studies are critical to gaining the trust of clinicians)
Clear communication of prediction confidence levels
Seamless integration into existing clinical workflows (e.g. providers do not like logging into multiple systems to accomplish their jobs)
Ongoing validation and safety monitoring of model performance
Training and support for clinical teams
Assessment and mitigation of algorithmic bias
What does this mean for the future?
The health care industry stands at a pivotal moment in transforming how we manage and prevent avoidable hospitalizations. As artificial intelligence and machine learning capabilities continue to advance, several promising pathways are emerging that could revolutionize our approach to preventive care.
The integration of diverse data sources outside of claims and EHR data alone represents perhaps the most significant opportunity. While current predictive models rely heavily on clinical and claims data, the next generation of tools will need to incorporate a much broader spectrum of information. Social determinants of health, including housing stability, food security, and transportation access, play a crucial role in health outcomes but are rarely captured in current systems. Environmental data, from air quality metrics to neighborhood walkability scores, could provide vital context about patient risk factors. Similarly, patient-reported outcomes, patient-generated health data from home-based remote patient monitoring, and behavioral health data could offer early warning signs of deteriorating conditions well before they manifest in clinical measurements.
This expanded data ecosystem will require new approaches to information sharing and integration across departments, organizations, and between government agencies. Health care organizations will need to develop robust frameworks for data governance that balance privacy protection with the need for comprehensive patient information. Standardization of data formats and definitions across systems will be essential, as will investments in technical infrastructure that can securely handle and analyze these diverse data streams. The ability to explain predictions will be crucial, helping clinicians understand and trust the system's recommendations.
Looking ahead, the field would benefit from the development of standardized frameworks for validating and implementing predictive models. This would help ensure consistent evaluation of model performance and provide clear guidelines for clinical implementation. The creation of shared best practices around intervention design and implementation could help accelerate adoption of effective approaches across different healthcare settings.
Ultimately, how health care payment models and financing systems adapt to facilitate the adoption of AI and ML-based care models will be the most important determinant of success. Without a robust business model for developers of these technologies and a reward to compensate innovators for their risk-taking and high R&D costs, it will be difficult to sustain this market. Both of these outcomes require new reimbursement and payment methodologies to allow health care delivery organizations that operate on relatively thin margins to adopt and adapt their existing care models. The traditional archetype of the humble, “build it and they will come” clinic model will not be adequate for the use of these technologies, so care delivery models will need to adapt as well.
I generally think one of the solutions, or compromises, to our arguments about socialized/universal/single-payor vs the private, for-profit, free market system is to make primary care access a fully-government funded system like the UK National Health Service.
https://hbr.org/2020/01/managing-the-most-expensive-patients
Claims data is very structured, so it works well in computational methods to reduce the amount of data cleaning that is required. However, electronic health record data is richer and is generally more useful in predicting these types of events as the research suggests, but is messy and often requires methods to extract data from unstructured notes.