Author + information
- †University of Utah, Cardiovascular Medicine, Salt Lake City, Utah
- ‡Barbra Streisand Women’s Heart Center, Cedars-Sinai Heart Institute, Los Angeles, California
- ↵∗Reprint requests and correspondence:
Dr. Rashmee Shah, Cardiovascular Medicine, University of Utah, 30 North 1900 East, Room 4A100, Salt Lake City, Utah 84132.
Health care delivery is changing rapidly. The Norman Rockwell era of a single town doctor with his black bag and home visits is no more. This image is being replaced by large health care systems with integrated provider networks and, more importantly, integrated information networks. Each patient generates a personal narrative of his or her disease in the form of data, ranging from vital signs and lab results to the granular detail in clinical notes. The amount of collected and stored data as part of routine clinical care is increasing exponentially and represents an unprecedented opportunity to improve health care delivery and outcomes by learning from every patient treated.
The pace of data collection, however, has been much faster than our ability to decipher these data for health care outcome improvement. To realize the potential of this resource, we need people with methodological and analytical skills who can translate the data into usable information. One solution to this problem is to create open-source data platforms that are broadly available for public access and analysis—in other words, “crowd source” for medical research. Other industries have successfully used crowd sourcing for rapid development. The Android operating system, for example, is open to any developer; the number of applications increased by >50% to almost 1.5 million in 2014 alone (1).
The crowd sourcing concept is slowly taking hold in health care. In this issue of the Journal, Khera et al. (2) used the Nationwide Inpatient Sample (NIS), a publicly available database from the Healthcare Utilization Project (HCUP) that is supported by the Agency for Healthcare Research and Quality (3). The HCUP is advantageous because it is inexpensive, easily available to anyone, and organized such that it is “analysis ready.” As a result, reports from HCUP are appearing more frequently in the medical data. A Google Scholar search for “healthcare utilization project,” “nationwide inpatient sample,” or “state inpatient database” yielded 2,460 citations in 2014 alone. This represents a 7.5-fold increase from just 10 years earlier, evidence that publicly available data and crowd sourcing speed the pace of medical research.
Still, the study by Khera et al. (2), in this issue of the Journal, and other similar studies (4) highlight methodological challenges unique to open-source data and analyses. In their study, the authors evaluate sex-specific differences in ST-segment elevation myocardial infarction (STEMI) treatment and outcomes. The authors report that women with STEMI were less likely to receive coronary angiography and revascularization and were more likely to die during hospitalization regardless of revascularization status. These findings, although troublesome, are not surprising. More women than men have been dying of cardiovascular disease since the 1980s (5); a pooled analysis of randomized, controlled trials demonstrated an increased risk of 30-day mortality among women with STEMI, although the effect was attenuated by angiographic findings (6); a meta-analysis of STEMI patients treated with primary percutaneous coronary intervention demonstrated that women had a 50% higher risk of in-hospital mortality (7).
The surprising finding from this study was an increase in risk-adjusted mortality among younger men and women with STEMI from 2004 to 2011, on a background of increasing revascularization and decreasing length of hospital stay. No discernible trend is apparent in the unadjusted results (see Table 4 in the study by Khera et al. ); the trend is apparent only after inclusion of comorbid conditions in the model (see Online Table 4 in the supplemental material for the study by Khera et al. ). This observation is counterintuitive and differs from other published data. The ACTION Registry-GWTG demonstrated a decline in in-hospital, risk-adjusted STEMI mortality from 2007 to 2008 (8), and data from France demonstrated improved survival between 1995 and 2010 (9), despite an increase in comorbid conditions over time in the latter study. Several hypotheses could explain the discrepant finding of increasing mortality in the current report, such as improved mortality rates for older patients but not younger, more often female patients (10) or because nationally representative samples such as the NIS include sicker patients than those entered into registries (11).
Although increasing STEMI mortality among younger patients may be true, one must also consider methodological issues to explain the results. A key limitation with administrative data, such as the NIS, is the inability to accurately characterize patients for risk adjustment. In this case, the authors use the “kitchen sink” approach, including all 29 variables in the Elixhauser risk-adjustment model, plus additional factors. This approach uses diagnoses codes to characterize patients, which is fraught with inaccuracy. The prevalence of almost all comorbid conditions increased over time (see Online Tables 2 and 3 in the supplemental material for the study by Khera et al. ). This trend could reflect changes in billing-related coding practice rather than an accurate representation of patient risk profile; plus the methods make no distinction between complications and comorbid conditions. These inaccuracies alone could account for the small incremental increase in risk over time. A recently published model using the ACTION Registry-GWTG (Acute Coronary Treatment and Intervention Outcomes Network Registry—Get With The Guidelines) merged with Medicare data identified age, heart rate, systolic blood pressure, troponin, and creatinine as the variables most predictive of 30-day mortality (12). Aside from age, the NIS does not include any of these variables, so they are not included in the model. The net result is that the conclusion that risk-adjusted mortality is increasing in young adults with STEMI could simply be a statistical artifact.
Data quality leading to potentially erroneous data conclusions thus remains a concern in publicly available datasets. Currently, administrative (or billing) data are most easily shared because the format is uniform and structured. As demonstrated by this study, the trade-off is the lack of clinical detail, which limits the research scope to broad trends and hypothesis generation. This limitation could be addressed by more comprehensive data sharing (e.g., vital signs, medication orders, clinical notes) across electronic health records (EHRs). Indeed, several groups are working toward increased interoperability and data sharing (13,14), and the federal government recently announced financial support for interoperability as part of the Precision Medicine Initiative (15). Ideally, these efforts will result in large, EHR-based patient datasets that include the clinical details—phenotype, genotype, socioeconomic variables, and others that we have not yet identified—needed for robust analyses. For example, in the study by Khera et al. (2), the authors cite a higher frequency of stress cardiomyopathy, dissection, and vasospasm as possible reasons for lower revascularization rates among women. If discharge summaries were shared across platforms, researchers would have the option to use text and information extraction methods to identify specific patient phenotypes (16) and test this female-specific hypothesis.
As publicly available data and crowd sourcing for research increase, readers will have to be more savvy (and skeptical) about methods used in secondary observational data analyses. Still, this type of research fills an important niche. Data have historically been restricted to specific institutions, hospitals, or companies and research aims were myopic, limited to the interests of these few entities. If they were unaware of or uninterested in sex-, ethnicity-, or socioeconomic-based disparities in cardiovascular disease, the subject was not studied. The publicly available data and the crowd sourcing approach theoretically allows anyone to engage in medical research, so previously neglected groups, such as women (17), are more likely to receive attention in the published data.
Publicly available data and other forms of data sharing are increasing. In 2014, the Centers for Medicare & Medicaid Services (CMS) launched a virtual research data center that will make research access easier, albeit at a cost (18). The National Institutes of Health announced the intention “to make public access to digital scientific data the standard for all NIH-funded research” (19) and the Institute of Medicine advocates for data sharing in a manner that protects patient privacy (20). The pharmaceutical and medical device industries have also moved toward data sharing (21,22). These initiatives should open the door to a broader researcher pool, which, when coupled with large sample sizes and advanced methods for data extraction, could increase opportunities for research focused on women and other neglected groups.
Data collection in health care is increasing exponentially, in step with a movement toward data sharing as the default practice. The future of health care undoubtedly involves large datasets spanning multiple health care systems, including clinical details that can be extracted with emerging data-mining techniques. This represents an opportunity to study previously neglected groups. We need an ecosystem of researchers with specific expertise to standardize and validate methods and uses of crowd-sourced data to speed the pace of medical research and maximize the potential of these growing resources.
↵∗ Editorials published in the Journal of the American College of Cardiology reflect the views of the authors and do not necessarily represent the views of JACC or the American College of Cardiology.
Dr. Shah owns stock in Gilead Sciences. Dr. Merz is supported by contracts from the National Heart, Lung, and Blood Institute, nos. N01-HV-68161, N01-HV-68162, N01-HV-68163, and N01-HV-68164; grants U0164829, U01 HL649141, U01 HL649241, K23HL105787, F31NR015725, R01 HL090957, and 1R03AG032631 from the National Institute on Aging; GCRC grant MO1-RR00425 from the U.S. National Center for Research Resources; and the National Center for Advancing Translational Sciences, grant UL1TR000124; and has received honoraria from Northwestern University, the Radcliffe Institute, University of California San Francisco, and consulting fees from Research Triangle Institute.
- 2015 American College of Cardiology Foundation
- ↵Bell K. Google Play now has more apps than Apple's App Store, report says. Available at: http://mashable.com/2015/01/15/google-play-more-apps-than-ios/. Accessed August 10, 2015.
- Khera S.,
- Kolte D.,
- Gupta T.,
- et al.
- ↵Agency for Healthcare Research and Quality. Introduction to the HCUP Nationwide Inpatient Sample (NIS), 2009. Available at: http://www.hcup-us.ahrq.gov/db/nation/nis/NIS_Introduction_2009.jsp. Accessed January 20, 2013.
- Stretch R.,
- Sauer C.M.,
- Yuh D.D.,
- Bonde P.
- Wenger N.K.
- Pancholy S.,
- Shantha G.,
- Patel T.,
- Cheskin L.J.
- Roe M.T.,
- Messenger J.C.,
- Weintraub W.S.,
- et al.
- Ford E.S.,
- Capewell S.
- ↵Precision Medicine Initiative. Available at: https://www.whitehouse.gov/the-press-office/2015/01/30/fact-sheet-president-obama-s-precision-medicine-initiative. Accessed August 10, 2015.
- ↵CMS Virtual Research Data Center. Available at: http://www.resdac.org/cms-data/request/cms-virtual-research-data-center. Accessed August 11, 2015.
- ↵National Institutes of Health Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research. Available at: http://grants.nih.gov/grants/NIH-Public-Access-Plan.pdf. Accessed August 12, 2015.
- ↵Institute of Medicine. Sharing clinical trial data: maximizing benefits, minimizing risk. 2015. Available at: http://iom.nationalacademies.org/Reports/2015/Sharing-Clinical-Trial-Data.aspx. Accessed August 12, 2105.
- Krumholz H.M.