Author + information
- Received April 14, 2008
- Revision received September 16, 2008
- Accepted October 7, 2008
- Published online January 27, 2009.
- Andreas P. Kalogeropoulos, MD⁎,
- Vasiliki V. Georgiopoulou, MD⁎,
- Grigorios Giamouzis, MD, PhD⁎,
- Andrew L. Smith, MD⁎,
- Syed A. Agha, MD⁎,
- Sana Waheed, MD⁎,
- Sonjoy Laskar, MD⁎,
- John Puskas, MD, MSc⁎,
- Sandra Dunbar, RN, DSN⁎,
- David Vega, MD⁎,
- Wayne C. Levy, MD† and
- Javed Butler, MD, MPH, FACC⁎,⁎ ()
- ↵⁎Reprint requests and correspondence:
Dr. Javed Butler, Emory University Hospital, 1365 Clifton Road, Suite AT430, Atlanta, Georgia 30322
Objectives The aim of this study was to validate the Seattle Heart Failure Model (SHFM) in patients with advanced heart failure (HF).
Background The SHFM was developed primarily from clinical trial databases and extrapolated the benefit of interventions from published data.
Methods We evaluated the discrimination and calibration of SHFM in 445 advanced HF patients (age 52 ± 12 years, 68.5% male, 52.4% white, ejection fraction 18 ± 8%) referred for cardiac transplantation. The primary end point was death (n = 92), urgent transplantation (n = 14), or left ventricular assist device (LVAD) implantation (n = 3); a secondary analysis was performed on mortality alone.
Results Patients were receiving optimal therapy (angiotensin-II modulation 92.8%, beta-blockers 91.5%, aldosterone antagonists 46.3%), and 71.0% had an implantable device (defibrillator 30.4%, biventricular pacemaker 3.4%, combined 37.3%). During a median follow-up of 21 months, 109 patients (24.5%) had an event. Although discrimination was adequate (c-statistic >0.7), the SHFM overall underestimated absolute risk (observed vs. predicted event rate: 11.0% vs. 9.2%, 21.0% vs. 16.6%, and 27.9% vs. 22.8% at 1, 2, and 3 years, respectively). Risk underprediction was more prominent in patients with an implantable device. The SHFM had different calibration properties in white versus black patients, leading to net underestimation of absolute risk in blacks. Race-specific recalibration improved the accuracy of predictions. When analysis was restricted to mortality, the SHFM exhibited better performance.
Conclusions In patients with advanced HF, the SHFM offers adequate discrimination, but absolute risk is underestimated, especially in blacks and in patients with devices. This is more prominent when including transplantation and LVAD implantation as an end point.
The incidence and prevalence of heart failure (HF) are rising (1,2), and these patients continue to experience poor outcomes (3,4). Considering the high mortality rate and the availability of life-saving therapies like transplantation (5) and left ventricular assist devices (LVAD) (6,7), accurate prognosis determination in HF is clinically important. This is especially true because a critical mismatch between the recipient pool and donor organ availability persists (8), and LVAD therapy is costly with a high risk for complications (9). Although peak exercise oxygen consumption remains an important prognostic tool (10,11), recent data suggest an altered risk relationship between exercise capacity and outcomes in the current era of HF therapy (10,12–14). Other multimarker risk prediction strategies (15,16) were developed in the pre–beta-blocker and defibrillator era and do not include the impact of medical therapy.
The recently developed Seattle Heart Failure Model (SHFM) uses widely available clinical variables to predict HF prognosis (17) and also incorporates the impact of therapy on outcomes. Although the model was validated on several cohorts, its derivation and validation were carried out in datasets driven primarily from clinical trials that enrolled mostly white subjects and were largely conducted in an era when beta-blockers and defibrillators were not the standard of care. Patient populations from clinical trials might not reflect those with advanced HF, the group in which prognosis determination is arguably most important. Moreover, the impact of contemporary therapeutic interventions including devices like defibrillators and/or biventricular pacemakers was incorporated in the SHFM by extrapolation (i.e., by using coefficients from “external” trials). Finally, recent studies suggest differential effects of medical therapies in white and black patients (18–21). In this study, we sought to assess the performance of the SHFM in patients with advanced HF referred for transplant evaluation with emphasis on the impact of device therapy and race on model performance.
Data on all consecutive patients between January 2000 and December 2006 referred for transplant evaluation were retrospectively abstracted to identify eligible patients on the basis of the following criteria: 1) adults 18 to 70 years old; 2) ejection fraction ≤30% documented within 6 months of evaluation; 3) receiving maximum tolerated medical therapy; 4) New York Heart Association functional class II to IV symptoms; and 5) availability of at least 12 of 14 variables comprising the SHFM within 4 weeks of evaluation. Patients with HF secondary to congenital heart disease and those scheduled to undergo planned cardiac surgery within 6 months were excluded. A total of 445 patients met these criteria. The institutional review board approved the study.
Demographic and clinical information during the index visit was abstracted. If multiple laboratory data were available, values from the date closest to the date of evaluation were used. Race was self-identified by patients, and race-based analyses only compared whites versus blacks.
The primary outcome was death, urgent cardiac transplantation (United Network for Organ Sharing status 1A), or LVAD support. In both the derivation and validation cohorts of the original SHFM study (17), only approximately 2% of events were LVAD or urgent transplantation as opposed to 15.6% in the current investigation. Therefore, we assessed the performance of SHFM for mortality alone where patients undergoing urgent transplantation or LVAD implantation were censored as alive at the time of event.
The Seattle Heart Failure Score (SHFS) was derived for all patients on the basis of the original risk factor coefficients as described by Levy et al. (17). Missing covariates were replaced with the cohort mean for score calculation. The online module, which integrates data from life tables for patients with <30% annual mortality, was used for mean life expectancy calculations (22). As noted by the SHFM investigators (17), the exponential SHFM equation is unsuitable for mean life expectancy calculations for populations with <30% annual mortality, because it overestimates survival.
Observed event rates were calculated with the Kaplan-Meier method. Predicted event-free survival rates were obtained by the original SHFM (17): where t is time in years, λ = 0.0405 (as estimated by the SHFM investigators), and SHFS is the SHFM score for each patient. The corresponding predicted event rates become:
Discrimination was assessed by: 1) the c-statistic, which is equivalent to the area under the receiver-operating characteristic (ROC) curve; and 2) the Royston-Sauerbrei D statistic. The latter is based on the variance of the linear predictor (i.e., the score) and quantifies the prognostic separation that a model can provide (23). Higher values of D indicate better separation; values >1 indicate adequate separation (24). In addition, we calculated the false positive, false negative, and combined classification error rates (logistic estimates) for years 1 through 5.
Calibration was assessed by: 1) the Hosmer-Lemeshow goodness-of-fit test and graph (25); and 2) fitting the linear predictor (i.e., the score) in an exponential survival model; a detailed background for the latter approach is provided in Online Appendix A. Briefly, if SHFM predictions were strictly valid, fitting the SHFS in the validation cohort with a type-1 equation would result in a λ equal to the original 0.0405 and a coefficient for the SHFS equal to 1 (26,27). If the resulting λ parameter is higher than the original, survival declines faster than predicted and thus the original equation leads to systematic underestimation of risk (respectively, a lower λ would point to overestimation of risk). If the resulting coefficient for the score (i.e., the SHFS) is <1, the original model predicts too low a risk for low-risk patients and too high a risk for high-risk patients; the opposite is true when the coefficient is >1. In both cases, the model can be improved by recalibration. We fitted the SHFS: 1) in the total cohort; 2) in patients with implantable devices versus medically treated patients; and 3) in race-based subgroups. In each case, we obtained estimates and standard errors for the λ parameter and coefficient of the SHFS by bootstrapping (1,000 random samples) (28,29). In addition, we used a Cox-Snell-type graph to assess observed versus predicted cumulative hazard in the total cohort.
Finally, because we detected both systematic deviation of observed versus predicted risk and different race-specific coefficients for the SHFS, we proceeded to race-specific recalibration of the model to provide possibly more accurate estimates (30). A detailed background for this process is provided in the Online Appendix A. In addition, the application of the Hosmer-Lemeshow goodness-of-fit test is further explained in the Online Appendix B. Analyses were performed with Stata 9.2 (StataCorp LP, College Station, Texas). The D-statistic was calculated with a Stata module written by Patrick Royston, Cancer Group, MRC Clinical Trials Unit, United Kingdom.
Baseline characteristics and outcomes
We studied 445 advanced HF patients receiving optimal therapy (Table 1). Total time at risk was 980 patient-years, and median follow-up was 21 months (25% to 75%: 10 to 37 months). Overall 92 of 445 (20.7%) patients died; annual mortality was 9.4%. In addition, 14 patients underwent urgent transplantation and 3 underwent LVAD implantation, resulting in a 24.5% cumulative and 11.1% annual event rate.
The median time to LVAD implantation or transplantation was 10 months, and the median SHFS for these patients was 1.17 (interquartile range [IQR]: 0.55 to 1.54). This was comparable to that for patients who died (1.04, IQR: 0.39 to 1.72, p = 0.887) but higher than those without an event (0.31, IQR: 0.19 to 0.81, p < 0.001). Actual listing for transplantation by quintile of SHFS (from lowest to highest risk) was 6.7% (n = 6), 12.4% (n = 11), 14.6% (n = 13), 15.7% (n = 14), and 21.4% (n = 19), p = 0.004 for linear trend.
Performance of the SHFM
Table 2 presents the observed versus predicted event rates. Overall the SHFM equation underestimated risk; the goodness-of-fit for observed versus the SHFM-expected event rates is presented in Figure 1. Systematic underestimation of event rates was detected, and the lack-of-fit attained statistical significance after year 2. The Cox-Snell type graph in Figure 2 shows the discordance between observed versus predicted cumulative hazards.
The SHFS achieved a likelihood ratio chi-square of 76.9 (p < 0.001) in the cohort when fitted in an exponential survival model. The λ parameter, however, was higher compared with the original (λ = 0.0585 vs. λ = 0.0405, p = 0.007), indicating that the original equation underestimated risk throughout follow-up (actual decline in event-free survival was faster than predicted). In the defibrillator and/or biventricular pacemaker subgroup, λ was significantly higher (λ = 0.0619 vs. λ = 0.0405, p = 0.013), whereas in medically treated patients it was not different (λ = 0.0500 vs. λ = 0.0405, p = 0.360) compared with the original λ, indicating more prominent risk underestimation in patients with devices.
The λ parameter was similar in whites versus blacks (λ = 0.0601 vs. λ = 0.0597). However, there was a significant modification effect of race on the coefficient of the SHFS (0.77 in whites vs. 1.15 in blacks, p = 0.010), pointing to underestimation of high risk in blacks and low risk in whites by the original SHFM (Fig. 3); this results in a net underestimation of absolute risk in blacks (Table 2).
The SHFM had adequate discrimination throughout the 5-year period (Table 2), although c-statistics were lower in patients with devices and in whites. Similarly, the D statistic was 1.376 overall, 1.350 in those with a defibrillator and/or biventricular pacemaker, 1.456 in those without devices, 1.171 in whites, and 1.605 in blacks. The false-positive classification error rate for years 1 through 5 ranged from 30.2% to 35.3%, the false negative classification error rate ranged from 27.8% to 30.5%, and the combined error rate ranged from 29.0% to 32.9%.
Performance of SHFM for mortality alone
When mortality alone was assessed, the SHFM exhibited better calibration. Table 2 summarizes observed versus predicted survival rates. The λ parameter for the SHFS was 0.0499 in the total cohort, 0.0514 in the defibrillator and/or biventricular pacemaker group, and 0.0460 in the medically treated group; none of these was significantly different from the original λ. The significant interaction with race, however, persisted; the coefficient of the SHFS was 0.80 in white versus 1.10 in black patients (p = 0.037). Discrimination was retained; the c-statistics for mortality prediction were 0.76, 0.71, 0.72, and 0.73 at 1, 2, 3, and 5 years, respectively.
Table 3 summarizes the observed versus SHFM-predicted mean event-free survival (primary end point) and mean survival.
The SHFM was recalibrated by: 1) adjusting predicted event rates with separate correction factors, as estimated in our cohort, for patients with a defibrillator and/or biventricular pacemaker versus medically-treated patients; and 2) using race-specific coefficients (0.77 for whites and 1.15 for blacks, as estimated in our cohort) for the SHFS. This resulted in adequate calibration for all groups (Fig. 4), and race-based discrepancies were resolved (Fig. 5). Adjusted predictions with the web-based SHFM module are presented in Table 4. The recalibrated equations and extended prediction tables are included in the Online Appendix.
In this study we assessed the performance of the SHFM in advanced HF patients referred for cardiac transplantation who were racially diverse and receiving optimal contemporary therapy—characteristics that set our study apart from the original study. We found that overall the SHFM provided good discrimination between low- versus high-risk patients. However, we detected that in terms of absolute risk the model systematically overestimated survival and underestimated risk, an effect more pronounced among patients with implanted devices. Moreover, the model had differential race-based properties. These deviations in absolute risk prediction are important when applying a model for clinical decision-making and suggest that recalibration might be necessary to improve SHFM applicability in transplant and LVAD eligible populations when the end point of interest is survival free of LVAD or urgent transplantation.
Several explanations can be provided for the observed higher-than-predicted event rate in our cohort. The expected benefits of medications and devices in the SHFM were extrapolated from clinical trials. It is well-known that, due to the strict enrollment criteria, subjects enrolled in trials might not represent the patients in “real-life,” and therefore the observed outcomes with these interventions might be different in the clinical setting. Also, we specifically focused on a sicker population of patients referred for transplantation who might have a higher relative but lower absolute benefit from these interventions. We did observe this in both medical- and device-treated patients, but it was more exaggerated in the subgroup with a defibrillator and/or biventricular pacemaker. However, the model was designed for prophylactic defibrillator use, and it is possible that a patient who has a therapeutic indication for a defibrillator might be at higher risk than predicted by the model. Finally, the deviation between observed versus expected survival became more pronounced with time. This raises the question of whether SHFM predictions should be calculated serially as both medical therapy and physiologic measures of risk change over time and whether the “baseline” measures are more accurate for only short- to intermediate-term outcomes.
The SHFM was designed to predict a death/LVAD/urgent transplantation combined end point, the same as in this analysis. However, 98% of the events in the original study were death. This fact raises important issues. A higher rate of LVAD implantation and/or urgent transplantation might lead to a higher overall event rate. Considering that our patient population was sicker as compared with the original SHFM cohort, it is not surprising that a larger proportion of patients underwent these procedures in the current study (16% vs. 2%). Thus, the miscalibration seen might not be due to SHFM performance but rather to the SHFM being more accurate for mortality prediction than a combined outcome. Indeed, when we assessed the model performance restricting the outcome to death alone, the model performance improved significantly. Unlike mortality, the timing for urgent transplantation or LVAD implantation can vary between institutions and physicians. In the current study, the model yielded systematic errors when applied to a composite end-point in which physician-determined components were more common. This calls for cautious use of models when predicting a composite end point.
Existing evidence suggests that therapies and prognostic factors might have a differential association with outcomes in whites versus blacks (31–33). We also observed different race-based prognostic properties of the SHFM score. Whether this represents an environmental or a biologic basis is beyond the scope of this discussion but does underscore the fact that data generated in 1 group might not be simply extrapolated in another. For a risk score to attain wide use for clinical decision-making, the transportability of absolute risk predictions to other settings beyond where they were originally developed needs to be explicitly tested (34). In this aspect, recalibration of a model is important (30,35). Indeed, we showed that race-specific recalibration significantly improves SHFM accuracy. It is important to note, however, that the recalibrated SHFM risk prediction functions have not been evaluated in an independent cohort. In this study, we also demonstrated differences based not only on race but also on whether patients were receiving device-based therapy. All of these interesting and provocative results need validation in different cohorts to understand their subtleties and nuances in the various groups. This can only be achieved in a timely and expedited manner by having easier access to the existing clinical trials and registry databases rather than creating newer cohorts for any given question individually.
The observed mean life expectancy was significantly lower than expected by the SHFM. These prediction models are derived to assess prognosis in populations and not individuals (36). However, the availability of mean life expectancy calculation on the basis of individual patient data makes it lucrative to extrapolate results to individual patients. Our results underscore that caution needs to be exercised when extrapolating results from prediction models to individuals. There are currently no standards as to what deviation from expected mean life expectancy (e.g., 15% or 20% around the mean) is “acceptable.”
By definition we limited our study sample to those with ≤2 missing variables for SHFM. Whether those patients in whom more variables were missing simply represent a random event or specific patient characteristics biasing our result is not known. We also imputed the cohort means for missing values. However, except for the lymphocyte count (70.6%), all other data were available on >90% of the cohort. Finally, only a minority of patients did not have a defibrillator or a biventricular pacemaker. Although we did observe a more exaggerated discrepancy in prediction for device versus medical therapy-alone patients, this might be related to the limited power to detect difference in the medically treated group.
Our study shows that in patients with advanced HF, although the discrimination of the SHFM is comparable to the original investigation, the model overestimates survival especially in patients with implanted devices. Moreover, the SHFM leads to an underestimation of risk in high-risk black patients. Prediction models are derived for populations, and individual patient data should be reviewed with caution. Finally, recalibration might be needed if the event of interest is transplantation and/or LVAD implantation rather than death.
For supplementary tables and background data for the validation and recalibration of the Seattle Heart Failure Model and the Hosmer-Lemeshow goodness-of-fit test, please see the online version of this article.
Utility of the Seattle Heart Failure Model in Patients With Advanced Heart Failure
The University of Washington owns the copyright to the Seattle Heart Failure Model.
Support for this project was partially funded through an Emory University Heart and Vascular Board grant entitled “Novel Risk Markers and Prognosis Determination in Heart Failure.”
- Abbreviations and Acronyms
- heart failure
- interquartile range
- left ventricular assist device
- Seattle Heart Failure Model
- Seattle Heart Failure Score
- Received April 14, 2008.
- Revision received September 16, 2008.
- Accepted October 7, 2008.
- American College of Cardiology Foundation
- Rosamond W.,
- Flegal K.,
- Friday G.,
- et al.
- Bleumink G.S.,
- Knetsch A.M.,
- Sturkenboom M.C.,
- et al.
- Rogers J.G.,
- Butler J.,
- Lansman S.L.,
- et al.
- Stevenson L.W.,
- Miller L.W.,
- Desvigne-Nickens P.,
- et al.
- Zaroff J.G.,
- Rosengard B.R.,
- Armstrong W.F.,
- et al.
- O'Neill J.O.,
- Young J.B.,
- Pothier C.E.,
- Lauer M.S.
- Abraham W.T.,
- Young J.B.,
- Leon A.R.,
- et al.
- Aaronson K.D.,
- Schwartz J.S.,
- Chen T.M.,
- Wong K.L.,
- Goin J.E.,
- Mancini D.M.
- Levy W.C.,
- Mozaffarian D.,
- Linker D.T.,
- et al.
- Russo A.M.,
- Hafley G.E.,
- Lee K.L.,
- et al.
- ↵Seattle Heart Failure Model. http://depts.washington.edu/shfm. Accessed April 12, 2008.
- Hosmer D.W. Jr..,
- Lemeshow S.
- Smith G.L.,
- Shlipak M.G.,
- Havranek E.P.,
- et al.
- Dunlap S.H.,
- Sueta C.A.,
- Tomasko L.,
- Adams K.F. Jr.