By David W. Bates, Suchi Saria, Lucila Ohno-Machado, Anand Shah, and Gabriel Escobar
Big Data In Health Care: Using
Analytics To Identify And Manage
High-Risk And High-Cost Patients
ABSTRACT The US health care system is rapidly adopting electronic health records, which will dramatically increase the quantity of clinical data that are available electronically. Simultaneously, rapid progress has been made in clinical analytics—techniques for analyzing large quantities of data and gleaning new insights from that analysis—which is part of what is known as big data. As a result, there are unprecedented opportunities to use big data to reduce the costs of health care in the United States. We present six use cases—that is, key examples—where some of the clearest opportunities exist to reduce costs through the use of big data: high-cost patients, readmissions, triage, decompensation (when a patient’s condition worsens), adverse events, and treatment optimization for diseases affecting multiple organ systems. We discuss the types of insights that are likely to emerge from clinical analytics, the types of data needed to obtain such insights, and the infrastructure—analytics, algorithms, registries, assessment scores, monitoring devices, and so forth—that organizations will need to perform the necessary analyses and to implement changes that will improve care while reducing costs. Our findings have policy implications for regulatory oversight, ways to address privacy concerns, and the support of research on analytics.
he cost of health care in the United value in increasing health care providers’ access States is high, nearly twice that in to patients’ records is not in question.
most other developed countries,1 In other industries, companies have been very and it continues to grow rapidly. successful at using big data to improve theireffi-
The unsustainable projected trajec- ciency.6 By big data, we referto the high volume, tory of US health care costs has led to calls for variety,andpotentialfortherapidaccumulation
improving the value of health care.2 However, the Affordable Care Act—the most substantial policy reform in US health care in decades— has been criticized for not doing enough to contain costs.3
As health reform progresses, one key dynamic oftheUShealthcaresystemistherapidadoption of electronic health records (EHRs). The growth of EHRs will make it possible to access unprecedented amounts of clinical data and offers the potential for cost savings.4 The extent of those cost savings is still to be determined,5 but EHRs’ of data and to analytics, which is the discovery and communication of patterns in data.
Examples include Amazon’s product recommendation system for online shopping, creating efficientpricinginthestockmarket,andpredicting players’ statistics in baseball. “Watson”—an applicationdevelopedbyIBM—hada recentsuccess on the television quiz show Jeopardy, using some of these big-data approaches.7 However, the extent to which these tactics will be applicable to clinical questions is as yet uncertain.8
The underlying techniques used in big data
doi: 10.1377/hlthaff.2014.0041 HEALTH AFFAIRS 33,
NO. 7 (2014): 1123–1131
©2014 Project HOPE— The People-to-People Health Foundation, Inc.
David W. Bates (dbates@ partners.org) is chief of the Division of General Medicine, Brigham and Women’s Hospital, in Boston, Massachusetts.
Suchi Saria is an assistant professor of computer science and health policy management at the Center for Population Health and IT, Johns Hopkins University, in Baltimore, Maryland.
Lucila Ohno-Machado is associate dean for informatics and technology in the Division of Biomedical Informatics, University of California, San Diego, in La Jolla.
Anand Shah is vice president of clinical services at PCCI, in Dallas, Texas.
Gabriel Escobar is regional director of hospital operations research and director of the Systems Research Initiative,
Division of Research, Kaiser
Permanente, in Oakland, California.
have improved substantially in the past decade, and they often involve hypothesis-free approaches such as data mining. Many experts have called for health care to adopt big-data approaches,9 butuptakehasbeenrelativelylimited so far.
That may be about to change. Payment reform strategiesthatincentivizevaluesuchasaccountable care (a key strategy of the Affordable Care Act, in which entities are asked to be “accountable” for the care they provide) and bundling (a payment approach in which providers are asked to deliver a set of services for a predefined price) are intended to motivate organizations to improve the efficiency of their care. One tactic that healthcareorganizationswilllikelydeployisthe more effective use of predictive analytics.
Ideally, predictive analytics will involve linking data from multiple sources, including clinical, genetic and genomic, outcomes, claims, and socialdata.Manynewsourcesofdataarebecoming available, such as data from cell phones and social media applications. Aggregating these data for the purpose of achieving clinical predictive analytics will require the adoption of standards,10 raise privacy and ethical concerns,11 and require new ways to preserve privacy.12
Big data sets can be subjected to many other types of analytic approaches, including pattern recognition and natural history—that is, the course of a disease process. However, we believe thateveninthe short term,it willbe possiblefor health care organizations to realize substantial benefits from deploying predictive systems. Predictive systems are software tools that allow the stratification of riskto predict an outcome. Such tools are important because many potential outcomes are associated with harm to patients, are expensive, or both.
In health care, we suggest that one way to use predictivesystemswouldbetoidentifyandmanagesixverypracticalusecases—thatis,examples of instances in which value is likely to be achieved. They are high-cost patients, readmissions, triage, decompensation (when a patient’s condition worsens), adverse events, and treatment optimization for diseases affecting multiple organ systems (such as autoimmune diseases, including lupus). Below we address the types of data and infrastructure that health care organizationswillneedforeachusecase.Wealso discuss what organizations will need to do to actually improve care.
Approximately 5 percent of patients account for about50percentofallUShealthcarespending.13 One approach to reducing costs is to identify suchpatientsandmanagethemmoreeffectively, oftenbyhavingcasemanagersworkwiththemto improvetheircare.Suchanapproachhasalready resulted in cost reductions.14 However, the identification of potentially high-cost patients has not always produced the desired results. For example, a number of Medicare demonstration projects did not lower costs even though the projects were able to identify high-risk pa-
To effectively implement analytic methods for identifying potentially high-cost patients, a numberofissuesmustbeconsidered.First,what approach should be used to predict which patients who arelikely to be high risk or high cost? Second, what new measurement sources can be incorporated to improve the predictions? Attributes associated with high-cost patients may include behavioral health problems or socioeconomic factors such as poverty or racial minority status. Thus, integrating data about mental health, socioeconomic status, or other issues such as marital and living status from various sources16 may significantly change the quality of the predictions that can be made.
A third issue is how to make predictions actionable, by identifying which patients are most likely to benefit from an intervention and what specific interventions can most improve care. The effective implementation of new analytic systemstoidentifypotentiallyhigh-costpatients will require making predictions easily available with minimal changes to clinical work flows, to increase the chances that health care providers will act on the predictions.
Many organizations and companies that currentlyuseanalyticsystemshavefocusedonidentifyingthealgorithmthatcanbeststratifydataby risk of future costs while not addressing other issues. The variation among algorithms may not be large, and a more practical algorithm may be better than a slightly more accurate one. Algorithmsaremosteffectiveandperformbestwhen they are derived from and then used in similar
A fourth issue is how to account for the fact that many cases of outcomes in predictive models often come from low-risk groups. This suggests the need for more accurate modeling, particularly for population management.
Wesuggestthatitisimportantinusinganalytic systems to identify potentially high-cost patients to determine the patients’ specific needs and gaps in care. It is especially important to identifyandaddressbehavioralhealthproblems, because a large portion of the patients at high risk for hospital admission have some sort of behavioral health issue, with depression being especially frequent.20
One tactic that health care organizations will likely deploy is the more effective use of predictive analytics.
Programs to manage high-cost patients are expensive.Theywillbemuchmorecost-effective if interventions can be precisely tailored to a patient’s specific problems, which might be related to transportation, medication nonadherence, or family conflict.
Resourcesinhealthcarearebecomingincreasingly limited, which requires greater emphasis onvalue.Thus,itwillbeimportanttoinvestigate analytic techniques that identify not only highriskpeople,butalsothosewhoareatparticularly low risk. For instance, the standard approach may be to give all patients who are discharged from the hospital a follow-up appointment in two weeks. But it might make more sense to ensure that the highest-risk patients are seen withintwodays,whilepatientswithverylowrisk might require follow-up care only as needed. Algorithms can help reallocate resources more effectively at both the high-risk and low-risk ends of the spectrum.
Much has been made of the frequency and high cost of hospital readmissions.21 The Centers for Medicare and Medicaid Services (CMS) has strongly incentivized organizations to reduce their frequency.22 As many as one-third of readmissions have been posited to be preventable and, therefore, to present a significant opportunity for improving care delivery.23
Health care organizations should all use an algorithm to predict who is likely to be readmittedtothehospital.However,thepredictivevalue of the algorithms tends to be similar. Four areas of a predictive algorithm may be important differentiators: tailoring the intervention to the individual patient, ensuring that patients actually get the precise interventions intended for them, monitoring specific patients after discharge to find out if they are having problems before they decompensate, and ensuring a low ratio of patientsflaggedforaninterventiontopatientswho experience a readmission (that is, a low false positive rate).
Some work has already been done in predicting readmissions,24 and analytics will play a key role in further work. For example, it may make sense soon to ask patients with a smartphone to allow health care organizations to access data fromtheirphonesthatwillhelpidentifypatients who are not managing a chronic condition well or that will monitor people recently discharged from the hospital, since it appears that patients who are not making calls or sending e-mail with their usual frequency may be depressed or suffering from other issues.25 Patients may also be asked to wear some type of device that monitors physiological parameters, such as heart rate or rhythm. These data will be most effective in informing health care decisions if they are processed with analytics.
Estimating the risk of complications when a patient first presents to a hospital can be useful for a number of reasons, such as managing staffing and bed resources, anticipating the need for a transfer to the appropriate unit, and informing overall strategy for managing the patient. In the neonatal setting, for example, the invention of the Apgar score revolutionized the management of newborn resuscitation.26,27 However, computing the score required training caregivers to assess subjective parameters such as irritability and “color” (a proxy for tissue perfusion, or how well blood is flowing to tissues). In newborns and many other populations, using modernbig-datatechniques28 thatcombineroutinely collected physiological measurements makes much more accurate assessments possible with a minimal burden of training and implemen-
In integrating a triage algorithm into clinical work flow, it is vital to have a detailed guideline thatclarifieshowthealgorithmwillinformcare. TwopilotprogramsinKaiserPermanenteNorthernCalifornia(KPNC),anintegratedhealthcare delivery system with comprehensive information systems, are using this approach.
The first pilot involves evaluating newborns for early onset sepsis. The goal is to reduce the number of newborns who receive antibiotics unnecessarily.Hundredsofthousandsofnewborns areevaluatedforearly-onsetsepsiseachyear.30–32 Recently, a team of scientists from KPNC, Harvard University, and the University of California,SanFranciscoandSantaCruz,developed a two-step protocol that can be expected to decrease the number of these evaluations and reducetheprescriptionofantibioticsfornewborns dramatically in the United States. In the first step, which can be embedded in an EHR, objective maternal data are used to assign a preliminary (prior to birth) probability of early-onset sepsis.33 In the second step, a simplified set of clinical findings are combined with the estimate based on maternal data to yield a new posterior probability for risk of sepsis following birth.34 The combination of these two steps could lead toasmanyas240,000fewerUSnewborns’being treated with systemic antibiotics each year.
The second KPNC pilot addresses adult patients in the emergency department. Severityof-illness scores for adult intensive care patients have been available for some time.35,36 However, the scores’ impact on triage has been limited. This is in part because the most important of these—the Acute Physiology and Chronic Health Evaluation(APACHE)37 andtheSimplifiedAcute PhysiologyScore(SAPS)38—involvedatathatare captured after a patient has entered intensive care.
In the second pilot, clinicians in the emergencydepartmentwillbeprovidedwithtwocomposite scores that have been calibrated using millionsofpatientrecordsandthatareapplicableto all hospitalized patients, not just those in intensive care. The first of these scores summarizes a patient’s global comorbidity burden during the preceding twelve months; the second captures a patient’s physiological instability in the preceding seventy-two hours.39 In addition, these two scores, available in real time, are combined with vital signs, trends in vital signs, and other information, such as how long a patient has been in the hospital. If the information collectively indicates that a patient has =8 percent risk of deteriorating in the next twelve hours, an alert is sent to the responsible providers.
Importantly, the KPNC early-onset sepsis and emergency department composite score pilots areboth designed forpatients whoarenot being monitoredcontinuously,yettheytakeadvantage of big-data methodologies. In both cases, teams of clinicians are developing work flows that integrate big-data components (real-time risk estimates) with traditional components (such as clinical examinations and care pathways).
Oftenbeforedecompensation—theworseningof a patient’s condition—there is a period in which physiological data can be used to determine whether the patient is at risk for decompensating. Much of the initial rationale for intensive care units (ICUs)was toallow patientswhowere critically ill to be closely monitored. A host of technologies40 arenowavailablethatcanbeused to monitor patients who are in general care
Some work has already been done in predicting readmissions, and analytics will play a key role in further work.
units, in nursing homes, or even at home but at risk of some sort of decompensation. Realtime indices such as the Rothman Index are also
Some of these technologies have been available for many years, such as electrocardiographic monitoring and oxygen monitoring. Others are newer, such as end-tidal CO2 monitoring and monitors that allow detection of whether or not a patient is moving.44,45 A problem with all of these technologies has been the signal-tonoise ratio: Alarms are often false positives. Monitors are becoming available in which multiple data streams can be compared simultaneously, and analytics can be used in the background to determine whether or not the signal is valid.
Oneexampleofthesenewmonitorsisadevice thatsitsunderthemattressandthatcollectsdata aboutthepatient’srespiratoryrateandpulseand whether or not the patient is moving.45 The data are transmitted to a server, where analytics are used in real time to determine if the patient appears likely to be decompensating. When the system detects a likely decompensation, an e-mail message is sent to an on-duty nurse’s smartphone.
With this system, the likelihood that a true decompensation is present has been increased to approximately 50 percent—far better than for cardiac telemetry, for which it is typically 5– 10percent.Inonesmalltrial,thesystemreduced the number of subsequent ICU days for patients in general care units by 47 percent, compared to
Analytics that use multiple data streams to effectively detect decompensation are already at work in some ICUs, and such use is expected to grow. Analytic tools are likely to make their way into other clinical settings as well to predict decompensation.
Analytics will almost certainly be useful across the health care continuum.
Another use case for analytics will be to predict which patients are at risk of adverse events of severaltypes.Adverseeventsareexpensive46 and cause substantial morbidity and mortality, yet many are preventable.
Renal Failure Renal failure is extremely expensive and carries a high risk of mortality.47 However, renal function is readily measured, and early changes in it are often apparent well before major decompensation occurs. It seems likelythatanalyticscouldbecombinedwithdata about exposures to specific medications and with measures of kidney function, blood pressure, urine output, and other processes to identify patients at risk of decompensation.
Infection Analytics can be effective in managing infection. One example involves monitoring and interpreting changes in heart rate variability fordetectionof majordecompensation in infants with very low birthweights before the emergence of an infection.48 Monitoring the heart-rate characteristics of newborns alone has already resulted in reductions in mortality and increases in the number of ventilator-free days. However, there is room for improvement using increasingly sophisticated analytics that account for subtle signals28 but also filter out extraneous patterns,49 such as those that occur when the baby moves.
Adverse Drug Events Adverse drug events, which occur frequently50 and are costly,51 are another area where analytics can be effective. Most effortssofartopredictwhichpatientswillsuffer an adverse drug event have not been very effective.52,53 However, analytics have the potential to predict with substantial accuracy which patient may suffer an adverse drug event and to detect patients who are in the early stages of such an event, by assessing genetic and genomic information, laboratory data, information on vital signs, and other data.
Diseases Affecting Multiple Organ Systems
Chronic conditions that span more than one organ system or are systemic in nature are some of thecostliestconditionstomanage.54,55 Anysingle disease may include cutaneous (skin), mucosal, renal, musculoskeletal, pulmonary, hematological, immunological, and neurologic manifestations.56 Autoimmune disorders such as scleroderma, rheumatoid arthritis, and systemic lupus erythematosus are examples of such conditions.
The ability to accurately predict the trajectory ofapatient’sdiseasecouldallowthecaregiverto better target complicated and expensive therapies to patients who stand to benefit the most from them, thus reducing the burden of disease on those patients and on the health care system. Currently, the caregiver’s ability to optimize treatment is limited by the complexities resulting from the heterogeneity in clinical phenotypes, the diversity of available measurements, and lack of high-precision biomarkers.57
This area is ripe for computing approaches that can combine the multitude of measurements taken as part of routine care to infer the progression of a patient’s disease and tailor treatments to that patient. There are already some successful examples of these ap-
Multisitelongitudinal registries that allow the aggregation of populations of patients with a disease or condition60 have been initiated. In the near future, clinical data networks are likely to play the role that registries now do. One example of such a network is the National Patient-Centered Clinical Research Network (PCORnet),61 which itself comprises multiple clinical data research networks. Access to longitudinal records has been the biggest limitation for making progress in the area of chronic diseases in multiple organ systems. As EHRs and clinical data networks based on EHRs become widespread,weexpecttoseethebenefitsofthese technologies in improving care for patients with such diseases.
We have discussed six use cases for high-risk patients in which clinical analytics are likely to be highly beneficial. This is by no means an exhaustivelist.Theevidenceofbenefitvarieswidely across the six use cases, but the current costs for the patients in each case are very great.
We focused in particular on use cases that include the hospital inpatient setting, in part because that is where the most data are available. However, analytics will almost certainly be useful across the health care continuum—for example, in evaluating the overall drivers of costs and using tools like geocoding (coding data by geography) to detect epidemics or to identify “hot spots” (of diseases, high costs, and so on). Both predicting outcomes of patients—such as who will be a high-cost patient, be readmitted, or suffer an adverse event—and tailoring the management of patients should result in substantial savings for the health care system.
One question is to what extent to use diseasespecific models versus more general ones in bigdata analytics in health care. Much of US health careorganizations’ focusintheiruseofanalytics hasbeenonpatientswithonecondition,suchas congestiveheartfailure.Thisapproachcanoften be effective. However, we believe that approaches that address multiple conditions are likely to have a bigger impact on care outcomes and cost savings in the long run.
Another question is how to incorporate the narrative text from EHRs into big-data analytics inhealth care. Extracting clinically relevant concepts using natural language processing is difficult.62,63 Essentialelementsofthenarrative,such as temporal relationships and co-references (that is, narrative that refers to more than one thing), may be lost or incorrectly assigned.64,65 Nonetheless, clinical natural language processing is already quite usable, and even simple approaches can find 90 percent of factual information of many types.66,67 A related problem is that longitudinalfollow-upishinderedbythepaucity of information exchange among health systems and registries of vital statistics.
Modern analytic approaches have shown demonstrable performance gains in other industries and are markedly different from the typical dataanalyticapproachesusedinhealthcare.The health care system has generally used simple decision tree or logistic regression models, in partbecausetheseoftenhavetobeimplemented under time constraints at the point of care.
EHRs make it possible to use models of diagnosis and care that combine thousands of disparate measurements to generate evidence in realtime.Thesemodelscanbefarmorecomplex than their predecessors: For example, instead of identifying one or two key markers, such as smoking and high blood pressure, complex analyticmodelscancombinesubtlecuesfromalarge number of markers. This increased complexity makesthenewmodelsmoredifficulttointerpret andtheirreliabilitylesseasytoassess,compared to previous models.
Other industries have grown accustomed to running mission-critical systems using such complex and advanced approaches while also establishingreliability—typically throughexten-
Federal support for research that evaluates the use of analytics and big data to address the six use cases is warranted.
sive test implementations before deployment in production. Attention must be paid to the generalizabilityofexistingresultsinmodels’performance to evaluate the size and scope of appropriate test implementations in health care.16
Another limiting factor in the use of analytics in the health care setting has been delivering predictions to providers—especially in real time—to enable action. That is becoming progressivelyeasierwithEHRsandmoderncommunication tools. However, many EHRs do not include robust event engines—tools that sift through data and use rules to notify providers when appropriate—or robust approaches for determining which provider is responsible for a specific patient at a given time.
Our observations have a number of implications with respect to research, regulation, payment, and privacy, among other areas.
Research Regarding research, more systematic evaluation is needed to move from potential to realization in many areas. Specifically, we believe that federal support for research that evaluates the use of analytics and big data to address the six use cases discussed above is warranted. Especially useful would be studies of the tailoringofsolutionsforhigh-riskpatientsandtheuse of multiple streams of data—in particular, from sensor technologies—for the prediction of adverse events and for therapy selection for patients with diseases that affect multiple organ systems.
Yet to be determined is the extent to which hypothesis-driven (the traditional approach) or hypothesis-free approaches (such as those used in data mining) are appropriate. Also still unclear is the relative importance of developing specific approaches and of implementing and disseminating them. We believe that there is more need to develop approaches, because pay-
Current provisions of the Affordable Care Act may not be
sufficient on their own to get providers to focus on costs.
ment reform is likely to offer strong incentives for their implementation and dissemination.
Regulation From the regulation perspective, a key question will be to what extent these predictive analytic approaches will be regulated by the Food and Drug Administration (FDA). In August 2013 the Food and Drug Administration Safety and Innovation Act working group tasked with evaluating emerging health information technology (IT) published a draft report concluding that FDA premarket review of health IT applications, such as analytics, would not be beneficial.68 The report also concluded that if health IT applications used analytics to deliver strong clinical decision support or were embedded in devices, they might require FDA review. Thus, thereis clearly a tension between the need forregulatoryoversightandforprotectionofthe public. The FDA has already released another report on this topic in 2014.69
David Bates is on the clinical advisory board for and has received research funding from EarlySense, a company that uses analytics and sensor technology to improve care. The authors thank Stephanie Klinkenberg-Ramirez for her assistance with the preparation of this article. Funding was provided by Framework and Action Plan for Predictive Analytics Grant No. 3861 from the Gordon and Betty Moore
Payment With respect to payment, strategies suchastheaccountablecareorganizationmodel that encourage organizations to invest in cost reduction will likely accelerate the adoption of
analytics. However, as many experts have commented,thecurrentprovisionsoftheAffordable CareActmaynotbesufficientontheirowntoget providers to focus on costs.70
Privacy Regarding privacy, there are many thorny issues, as the growing controversy over the National Security Agency’s collection of data about private phone calls has illustrated. Many people will not wish to have some types of data about them linked with other types of data, and this issue may be even more sensitive in health care than in other domains. However, Ruth Faden and coauthors have argued that in a just healthcaresystem,patientshave amoral obligation to contribute to the common purpose of improvingthequalityandvalueofclinicalcare.71 Policy makers have been reluctant to alter the provisions of the Health Information Portability andAccountabilityAct(HIPAA)of1996,whichis the majorlegislation relatedto privacyand security issues in health care. However, the act does not address many issues that will become relevant as more disparate data sources become linked.
Big data, including analytics, is a powerful tool that will be as useful in health care as it has been in other industries. The choice of these specific use cases that we have discussed in this article canbedebated.Nonetheless,webelievethatthey will be among those that deliver the greatest value for health care organizations in the near term. This general approach has great potential for improving value in health care. We believe that organizations that employ it in many domains will benefit, especially under payment reform. ?
1 Davis K. 2012 annual report: president’s message—health care reform: a journey [Internet]. New York (NY): Commonwealth Fund; 2012 Dec 26 [cited 2014 Apr 24]. Available from: http://www.commonwealthfund .org/Publications/Annual-Report-
2 Porter ME. What is value in health care? N Engl J Med. 2010;363(26):
3 Mechanic RE, Altman SH, McDonough JE. The new era of payment reform, spending targets, and cost containment in Massachusetts: early lessons for the nation.
Health Aff (Millwood). 2012;31(10):
4 Xierali IM, Hsiao CJ, Puffer JC, Green LA, Rinaldo JC, BazemoreAW, et al. The rise of electronic health record adoption among family physicians. Ann Fam Med. 2013;
5 Bassi J, Lau F. Measuring value for money: a scoping review on economic evaluation of health information systems. J Am Med Inform Assoc. 2013;20(4):792–801.
6 Davenport TH, Harris JG. Competing on analytics: the new science of winning. Boston (MA): Harvard Business School Press; 2007.
7 Ferrucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, et al. Building Watson: an overview of the DeepQA project. AI Magazine.
8 Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18(5):
9 Murdoch TB, Detsky AS. The inevitable application ofbig data tohealth care. JAMA. 2013;309(13):1351–2. 10 Tenenbaum JD, Sansone SA, Haendel M. A sea of standards for omics data: sink or swim? J Am Med Inform Assoc. 2013;21(2):200–3.
11 Agaku IT, Adisa AO, Ayo-Yusuf OA, Connolly GN. Concern about security and privacy, and perceived control over collection and use of health information are related to withholding of health information from healthcare providers. J Am Med Inform Assoc. 2013;21(2):374–8.
12 Ohno-Machado L. To share or not to share: that is not the question. Sci Transl Med. 2012;4(165):165cm15.
13 Schoenman JA, Chockley N. Understanding U.S. health care spending [Internet]. Washington (DC): National institute for Health Care Management Research and Educational Foundation; 2011 Jul [cited 2014 Apr 24]. (Data Brief). Available from: http://www.nihcm.org/ images/stories/NIHCM-CostBriefEmail.pdf
14 Nelson L. Lessons from Medicare’s demonstration projects on disease management and care coordination [Internet]. Washington (DC): Congressional Budget Office; 2012 Jan [cited 2014 Apr 24]. (Working Paper No. 2012-01). Available from: http:// www.cbo.gov/sites/default/files/ cbofiles/attachments/WP201201_Nelson_Medicare_DMCC_
15 Weil E, Ferris T, Meyer G. Fact sheet—phase one: MGH Medicare demonstration project for high-cost beneficiaries [Internet]. Boston
(MA): Massachusetts General Hospital Physician Group; [cited 2014 Apr 24]. Available from: http:// www.massgeneral.org/News/assets/ pdf/CMS_project_phase1Fact Sheet.pdf
16 Paxton C, Niculescu-Mizil A, Saria S. Developing predictive models using electronic medical records: challenges and pitfalls. AMIA Annu Symp Proc. 2013;2013:1109–15.
17 Turner-McGrievy GM, Beets MW, Moore JB, Kaczynski AT, BarrAnderson DJ, Tate DF. Comparison of traditional versus mobile app selfmonitoring of physical activity and dietary intake among overweight adults participating in an mHealth weight loss program. J Am Med Inform Assoc. 2013;20(3):513–8.
18 Jiang X, Menon A, Wang S, Kim J,
Ohno-Machado L. Doubly Optimized
Calibrated Support Vector Machine (DOC-SVM): an algorithm for joint optimization of discrimination and calibration. PLoS One. 2012;7(11): e48823.
19 Jiang X, Boxwala AA, El-Kareh R, Kim J, Ohno-Machado L. A patientdriven adaptive prediction technique to improve personalized risk estimation for clinical decision support. J Am Med Inform Assoc. 2012;
20 Freund T, Kunz CU, Ose D,
Szecsenyi J, Peters-Klimm F. Patterns of multimorbidity in primary care patients at high risk of future hospitalization. Popul Health Manag. 2012;15(2):119–24.
21 Jencks SF, Williams MV, Coleman EA. Rehospitalizations among patients in the Medicare fee-for-service program. N Engl J Med. 2009;
22 Clancy CM. Commentary: reducing hospital readmissions: aligning financial and quality incentives. Am J Med Qual. 2012;27(5):441–3.
23 Kocher RP, Adashi EY. Hospital readmissions and the Affordable Care Act: paying for coordinated quality care. JAMA. 2011;306(16):1794–5.
24 Bayati M. Data-driven decision making in healthcare systems [Internet]. Redmond (WA): Microsoft Corporation; 2011 Sep 27 [cited 2014
Apr 24]. (20th Anniversary Lecture
Series). Available from: http:// research.microsoft.com/apps/ video/default.aspx?id=159290
25 Madan A, Cebrian M, Lazer D, Pentland A. Social sensing for epidemiological behavior change. In: Proceedings of the 12th ACM International Conference on Ubiquitous Computing. New York (NY): ACM Press; 2010. p. 291–300.
26 Apgar V. The newborn (Apgar) scoring system. Reflections and advice. Pediatr Clin North Am. 1966; 13(3):645–50.
27 Finster M,Wood M. The Apgar score has survived the test of time. Anesthesiology. 2005;102(4):855–7.
28 Saria S, Koller D, Penn A. Learning individual and population level traits from clinical temporal data [Internet]. Submitted to: Neural Information Processing Systems (NIPS) Foundation. Predictive Models in
Personalized Medicine workshop;
Whistler, BC; 2010 Dec 11 [cited 2014 May 22]. Available from: https:// sites.google.com/site/personalmed models/proceedings/Saria.pdf
29 Saria S, Rajani AK, Gould J, Koller D, Penn AA. Integration of early physiological responses predicts later illness severity in preterm infants. Sci Transl Med. 2010;2(48):48ra65.
30 Escobar GJ. The neonatal “sepsis work-up”: personal reflections on the development of an evidencebased approach toward newborn infections in a managed care organization. Pediatrics. 1999;103(Suppl E1):360–73.
31 Escobar GJ, Li DK, Armstrong MA, Gardner MN, Folck BF, Verdi JE, et al. Neonatal sepsis workups in infants =¼ 2000 grams at birth: a population-based study. Pediatrics. 2000;106(2 Pt 1):256–63.
32 Mukhopadhyay S, Eichenwald EC, Puopolo KM. Neonatal early-onset sepsis evaluations among wellappearing infants: projected impact of changes in CDC GBS guidelines. J Perinatol. 2013;33(3):198–205.
33 Puopolo KM, Draper D, Wi S,
Newman TB, Zupancic J, Lieberman E, et al. Estimating the probability of neonatal early-onset infection on the basis of maternal risk factors. Pediatrics. 2011;128(5):e1155–63. 34 Escobar GJ, Puopolo KM,Wi S, Turk BJ, Kuzniewicz MW,Walsh EM, etal. Stratification of risk of early-onset sepsis in newborns =34 weeks’ gestation. Pediatrics. 2014;133(1):30–6. 35 Ohno-Machado L, Resnic FS, Matheny ME. Prognosis in critical care. Annu Rev Biomed Eng. 2006; 8:567–99.
36 Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985;13(10):
37 Zimmerman JE, Kramer AA. Out-
come prediction in critical care: the Acute Physiology and Chronic Health Evaluation models. Curr Opin Crit Care. 2008;14(5):491–7.
38 Metnitz PG, Moreno RP, Almeida E, Jordan B, Bauer P, Campos RA, et al. SAPS 3—from evaluation of the patient to evaluation of the intensive care unit. Part 1: objectives, methods and cohort description. Intensive Care Med. 2005;31(10):1336–44.
39 Escobar GJ, Gardner MN, Greene JD, Draper D, Kipnis P. Riskadjusting hospital mortality using a comprehensive electronic record in an integrated health care delivery system. Med Care. 2013;51(5):
40 Kodali BS. Capnography outside the operating rooms. Anesthesiology.
41 Rothman MJ, Rothman SI, Beals J 4th. Development and validation of a continuous measure of patient condition using the electronic medical record. J Biomed Inform. 2013; 46(5):837–48.
42 Finlay GD, Rothman MJ, Smith RA. Measuring the Modified Early
Warning Score and the Rothman Index: advantages of utilizing the electronic medical record in an early warning system. J Hosp Med. 2014;
43 Rothman SI, Rothman MJ, Solinger AB. Placing clinical variables on a common linear scale of empirically based risk as a step towards construction of a general patient acuity score from the electronic health record: a modelling study. BMJ Open. 2013;3(5).
44 Donald MJ, Paterson B. End tidal carbon dioxide monitoring in prehospital and retrieval medicine: a review. Emerg Med J. 2006;23(9):
45 Brown H, Terrence J, Vasquez P, Bates DW, Zimlichman E. Continuous monitoring in an inpatient medical-surgical unit: a controlled clinical trial. Am J Med. 2014; 127(3):226–32.
46 JhaAK, ChanDC, Ridgway AB, Franz C, Bates DW. Improving safety and eliminating redundant tests: cutting costs in U.S. hospitals. Health Aff (Millwood). 2009;28(5):1475–84.
47 Bates DW, Su L,Yu DT, Chertow GM, Seger DL, Gomes DR, et al. Mortality and costs of acute renal failure associated with amphotericin B therapy. Clin Infect Dis. 2001;32(5):
48 Moorman JR, Carlo WA, Kattwinkel J, Schelonka RL, Porcelli PJ, Navarrete CT, et al. Mortality reduction by heart rate characteristic monitoring in very low birth weight neonates: a randomized trial. J Pediatr. 2011;159(6):900–6.
49 Quinn JA,Williams CK, McIntosh N. Factorial switching linear dynamical systems applied to physiological condition monitoring. IEEE Trans Pattern Anal Mach Intell. 2009; 31(9):1537–51.
50 Bates DW, Cullen DJ, Laird N, Petersen LA, Small SD, Servi D, et al. Incidence of adverse drug events and potential adverse drug events. Implications for prevention. ADE Prevention Study Group. JAMA. 1995; 274(1):29–34.
51 Bates DW. Drugs and adverse drug reactions: how worried should we be? JAMA. 1998;279(15):1216–7.
52 Sakuma M, Bates DW, Morimoto T.
Clinical prediction rule to identify high-risk inpatients for adverse drug events: the JADE Study. Pharmacoepidemiol Drug Saf. 2012;21(11):
53 Field TS, Gurwitz JH, Harrold LR, Rothschild J, DeBellis KR, Seger AC, et al. Risk factors for adverse drug events among older adults in the ambulatory setting. J Am Geriatr Soc. 2004;52(8):1349–54.
54 Fortin M, Bravo G, Hudon C, Vanasse A, Lapointe L. Prevalence of multimorbidity among adults seen in family practice. Ann Fam Med.
55 Wolff JL, Starfield B, Anderson G. Prevalence, expenditures, and complications of multiple chronic conditions in the elderly. Arch Intern Med. 2002;162(20):2269–76.
56 Petri M. Systemic lupus erythematosus: 2006 update. J Clin Rheumatol. 2006;12(1):37–40.
57 Hummers LK, Wigley FM. Scleroderma. New York (NY): Lange Medical Books/McGraw Hill; 2013. 58 Leeper NJ, Bauer-Mehren A, Iyer SV, Lependu P, Olson C, Shah NH.
Practice-based evidence: profiling the safety of cilostazol by textmining of clinical notes. PLoS One.
59 Frankovich J, Longhurst CA, Sutherland SM. Evidence-based medicine in the EMR era. N Engl J Med. 2011;365(19):1758–9.
60 Natter MD, Quan J, Ortiz DM, Bousvaros A, Ilowite NT, Inman CJ, et al. An i2b2-based, generalizable, open source, self-scaling chronic disease registry. J Am Med Inform Assoc. 2013;20(1):172–9.
61 National Patient-Centered Clinical Research Network [home page on the Internet]. Boston (MA):
PCORnet; [cited 2014 Apr 24].
Available from: http://pcornet.org/
62 Meystre SM, Savova GK, KipperSchuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research.Yearb Med Inform. 2008:128–44.
63 Saria S, McElvain G, Rajani AK, Penn AA, Koller DL. Combining structured and free-text data for automatic coding of patient outcomes. AMIA Annu Symp Proc. 2010;
64 Sun W, Rumshisky A, Uzuner O.
Temporal reasoning over clinical text: the state of the art. J Am Med Inform Assoc. 2013;20(5):814–9.
65 Uzuner O, Bodnari A, Shen S, Forbush T, Pestian J, South BR. Evaluating the state of the art in coreference resolution for electronic medical records. J Am Med Inform Assoc. 2012;19(5):786–91.
66 D’AvolioLW,Nguyen TM, Goryachev S, Fiore LD. Automated concept-level information extraction to reduce the need for custom software and rules development. J Am Med Inform Assoc. 2011;18(5):607–13.
67 LePendu P, Iyer SV, Bauer-Mehren A, Harpaz R, Mortensen JM, Podchiyska T, et al. Pharmacovigilance using clinical notes. Clin Pharmacol Ther. 2013;93(6):547–55.
68 Bates DW. Draft FDASIA Committee report [Internet]. Silver Spring (MD): Food and Drug Administration; [cited 2014 Apr 25]. Available from: http://www.healthit.gov/ facas/sites/faca/files/FDASIA
69 Food and Drug Administration. FDASIA health IT report: proposed strategy and recommendations for a risk-based framework [Internet]. Silver Spring (MD): FDA; 2014 Apr [cited 2014 May 5]. Available from: http://www.fda.gov/downloads/ AboutFDA/CentersOffices/Officeof
70 Noble DJ, Casalino LP. Can accountable care organizations improve population health?: should they try? JAMA. 2013;309(11):
71 Faden RR, Kass NE, Goodman SN,
Pronovost P, Tunis S, Beauchamp TL. An ethics framework for a learning health care system: a departure from traditional research ethics and clinical ethics. Hastings Cent Rep. 2013;(S1):S6–27.