Author + information
- Received February 19, 2019
- Revision received March 21, 2019
- Accepted March 24, 2019
- Published online May 27, 2019.
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom
- ↵∗Address for correspondence:
Dr. Stuart J. Pocock, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom.
• This review is a constructive critical appraisal of key recent cardiology trial reports.
• The trials are CABANA, ATTR-ACT, COAPT, DECLARE, REDUCE-IT, and AUGUSTUS.
• For each study, the main findings are documented and interpreted.
• Particular attention is paid to new findings, their research context, and study limitations.
• Topical examples provide methodological insights pertinent to future clinical trials research.
In the past 12 months, many important new clinical trials in cardiology have had their first conference presentation and publication. This paper presents a constructive critical appraisal of 6 key studies. In time order of first presentation, they are CABANA, ATTR-ACT, COAPT, DECLARE, REDUCE-IT, and AUGUSTUS. For each study, the aim herein is to document and interpret the main findings, paying attention to new findings, their research context, and study limitations. These topical examples also provide methodological insights pertinent to future clinical trials research.
A year ago, we published a critical appraisal of some late-breaking trials at the 2018 American College of Cardiology Scientific Sessions (1). Rather than just repeat that for American College of Cardiology 2019, we consider it more useful to take a broader perspective of key trials first presented and published in the past 12 months. Thus, in addition to American College of Cardiology 2019, we cover trials from the annual 2018 Scientific Sessions of the Heart Rhythm Society, the European Society of Cardiology, Transcatheter Cardiovascular Therapeutics, and the American Heart Association.
We have selected 6 key recent trials (Central Illustration) for our constructive critical appraisal. These trials were chosen for a mix of reasons. They: 1) are of major clinical importance; 2) are within our sphere of expertise; 3) raise challenges in their interpretation; and 4) contain valuable insights for the methodology of clinical trials research.
The CABANA Trial
The CABANA (Catheter Ablation vs. Antiarrhythmic Drug Therapy for Atrial Fibrillation; NCT00911508) trial (2) randomized 2,204 patients with atrial fibrillation to either catheter ablation or antiarrhythmic drug therapy. The primary composite endpoint was death, disabling stroke, serious bleeding, or cardiac arrest. Key secondary outcomes were all-cause death and the composite of death and cardiovascular hospitalization (CVH). Over a median 4 years of follow-up, findings for these outcomes using analysis by intention-to-treat (ITT) are summarized in the top part of Figure 1. Neither the primary endpoint nor all-cause death showed a significant treatment difference, though there were numerically fewer primary endpoints and deaths in the ablation arm. For the composite of death and CVH, ITT analysis did show a significant reduction in the ablation arm (p = 0.002).
The main problem in interpreting CABANA is the high rate of treatment crossover: 9.2% of patients randomized to ablation did not undergo it, whereas 27.5% of patients randomized to drug therapy did in fact undergo ablation. This means that ITT analysis will inevitably dilute away any true benefits of ablation. The widely accepted argument in favor of ITT analysis is that it achieves an unbiased comparison of the 2 therapeutic strategies as actually carried out. But the key flaw in the conduct of CABANA is the apparent lack of clinical equipoise: more than one-quarter of investigators and their patients in the drug therapy arm underwent ablation, even though the whole purpose of the trial is that there is insufficient prior evidence that ablation is effective. In this context, the CABANA investigators have presented some alternative analyses (Figure 1). The precise methods undertaken are not fully explained, but our interpretation is as follows. Analysis by treatment received compares patients who actually underwent ablation with those who did not, regardless of which group they were randomized to. This approach shows highly significant differences in favor of ablation for the primary endpoint (hazard ratio [HR]: 0.67; p = 0.006) and all-cause death (HR: 0.60; p = 0.005). Per protocol analysis keeps patients in their randomized groups, but it removes from analysis those who crossed over. Only results for the primary endpoint are presented (HR: 0.73; p = 0.046).
Of course, all these alternative analyses may carry a substantial bias. The patients who cross over are not a random sample of all the patients in each randomized group, and such crossovers may carry a higher (or lower) risk for fatal and nonfatal events not directly related to the actual intervention received. Thus, such exploratory analyses, although supportive of the hypothesis that ablation is an effective treatment, cannot be considered definitive evidence. Overall, we believe that the evidence from CABANA is inconclusive.
The underlying problem is that CABANA may well have been done too late in the development of catheter ablation for atrial fibrillation. In general, for any new procedure, there is first a learning curve, while the technical details and the interventionists’ (or surgeons’) skills are perfected, when it is too soon to undertake the pivotal randomized trial. There then follows a “window of opportunity” before the procedure is widely accepted (3). In this case, that opportunity for unbiased randomized evidence appears to have been missed. Perhaps this is an example of Buxton’s law: “it is always too early for rigorous evaluation, until unfortunately it’s suddenly too late” (4).
In general, there remains controversy regarding the value of such “as treated” and “per protocol” analyses (5,6). To dismiss them as irrelevant is too harsh a stance, whereas to accept them as pivotal evidence is inappropriate. The key is to define exactly what one wishes to estimate (called the estimand) (7) and then to define the appropriate analysis strategy and any assumptions that it relies on.
Given the CABANA trial’s problems, what additional evidence can be drawn on to clarify the merit (or not) of ablation therapy in atrial fibrillation? Of note it is a large observational study using the OPTUM administrative database (8). The investigators identified 135,688 patients who would have been eligible for CABANA, of whom 6,907 patients underwent ablation. Mean follow-up duration after ablation was 1.5 years. Using a propensity score–weighted analysis to adjust for potential confounders, they found that patients undergoing ablation had a lower risk for all-cause death (HR: 0.60; p < 0.001) and the composite of death, stroke, major bleeding, and cardiac arrest (HR: 0.70; p < 0.001), the latter being the primary endpoint in CABANA (Figure 1). As in all such observational studies, one cannot rule out a selection bias whereby there are unmeasured confounders influencing who underwent ablation. Therefore, it is open to debate whether these apparent striking benefits of ablation are believable.
CASTLE-AF (Catheter Ablation vs. Standard Conventional Treatment in Patients With Left Ventricular Dysfunction and Atrial Fibrillation; NCT00643188) (9) is another trial of catheter ablation versus medical therapy in 363 patients who had both atrial fibrillation and heart failure (Figure 1). Over a median 3.15 years of follow-up, treatments were compared using a modified ITT analysis; this excluded events during the run-in period and also nonfatal events during a 12-week “blanking period” after the baseline visit. The ablation group had fewer deaths (HR: 0.53; p = 0.009) and fewer primary endpoints, a composite of death and heart failure hospitalization (HFH; HR: 0.62; p = 0.009). Again, the extent of treatment crossovers was substantial: 15.6% and 9.8%, respectively, in the ablation and medical therapy groups. But CASTLE-AF studied a different population (the subset of patients with atrial fibrillation who also had heart failure), and therefore is not directly relevant to the question tackled by CABANA.
The dilemma we face is that although no study provides definitive evidence that catheter ablation is an effective intervention in patients with atrial fibrillation, does the totality of evidence across a range of somewhat imperfect studies carry sufficient conviction? Recent guidelines (10) are cautious, providing catheter ablation with their weakest possible recommendation. The hope is that ongoing and future trials will resolve the issue. But will they, too, suffer from high crossover rates?
The ATTR-ACT Trial
The ATTR-ACT (Safety and Efficacy of Tafamidis in Patients With Transthyretin Cardiomyopathy; NCT01994889) trial (11) is a double-blind trial that randomized 441 patients with transthyretin amyloid cardiomyopathy in a 2:1:2 ratio to 80 mg tafamidis, 20 mg tafamidis, or matching placebo for 30 months. In the primary analysis, the investigators hierarchically assessed all-cause mortality, followed by frequency of CVH using Finkelstein-Schoenfeld (12) and win ratio (13) methods. Before we interpret this relatively novel methodology and its results, let us describe the essential results for the key components of the primary analysis (Table 1). Throughout, the results compare both tafamidis doses combined (n = 264) with placebo (n = 177).
For all-cause death over 30 months of follow-up, the HR was 0.70 (95% confidence interval [CI]: 0.51 to 0.96; p = 0.026). For CVH, the investigators concentrated on comparing their overall frequency including repeat CVH within the same patient, with rates per year of 0.48 and 0.70 in the tafamidis and placebo groups, respectively. One needs to allow for the nonindependence of repeat events (i.e., a few patients may contribute a lot of events), so negative binomial analysis is the most informative (14), with a rate ratio of 0.71 (95% CI: 0.54 to 0.93; p = 0.0068). The alternative Poisson regression analysis, though pre-specified, falsely assumes independence of events, which makes the consequent CI narrower and the p value smaller than is appropriate. Had the investigators chosen to analyze only time to first event (ignoring repeats), the results would have been less impressive (HR: 0.80; 95% CI: 0.62 to 1.03; p = 0.07).
The Kaplan-Meier plots of mortality and first CVH (not shown) both show the effect of treatment, with their respective plots showing a divergence after about 18 and 9 months, respectively. Thus, an extended follow-up period out to 30 months and inclusion of repeat CVHs in the primary analysis were crucial factors in demonstrating efficacy.
In addition, the tafamidis group showed highly significant steady improvements over time for both functional capacity (6-min walk distance) and quality of life (Kansas City Cardiomyopathy Questionnaire overall score).
Now let us interpret the primary analysis that hierarchically assessed all-cause mortality, followed by frequency of CVH using the Finkelstein-Schoenfeld test (12) and the win ratio method (13). The win ratio was 1.70 (95% CI: 1.26 to 2.29; p = 0.0006). First, note that this p value is smaller than those separately obtained for mortality and CVH. That is, combining the reductions in risk for death and CVH produces very strong evidence of a treatment effect and hence reinforces the value of this type of hierarchical analysis. But why not instead use a conventional composite endpoint of time to first CVH or death? Because that would have made inadequate use of the available data by: 1) ignoring repeat CVHs after the first one; and 2) ignoring any deaths that happen after a CVH.
The win ratio method is relatively new, so many people do not understand it. Hence, Figure 2 is our attempt to explain how it worked in ATTR-ACT. The win ratio analysis (15) was conducted according to 2 stratification factors: transthyretin status (variant or wild type) and baseline New York Heart Association functional class (I/II or III). Thus, there were 4 patient strata: variant type, functional class I/II (n = 58); variant type, functional class III (n = 48); wild-type, functional class I/II (n = 242); and wild-type, functional class III (n = 93). Then, within each stratum, every patient in the tafamidis group was compared with every patient in the placebo group: that is, there were 34 × 24 + 29 × 19 + 152 × 90 + 49 × 44 = 17,203 pairs of patients. For each pair, we determine whether the tafamidis patient “won” or “lost” compared with the placebo patient. There are 2 ways to decide this: 1) who died first (the “loser”); and then, 2) if neither died, who had the most CVHs (again the “loser”), both being assessed over their shared follow-up time (Figure 2). Adding up across strata, we get a total of 8,595 winners and 5,071 losers. Hence the win ratio is 8,595/5,071 = 1.70. The consequent 95% CI is 1.26 to 2.29, with p = 0.0006 using the Finkelstein-Schoenfeld test.
In conclusion, there is overwhelming evidence that patients on tafamidis had better outcomes than those on placebo. One outstanding issue is that the evidence was derived from 2 different daily doses, 80 and 20 mg, studied in a 2:1 randomization ratio. At present, no results are available on how the 2 doses compare. We are informed that such results will be available soon, which is crucial because they will have an important bearing on future use of tafamidis.
The COAPT Trial
The COAPT (Cardiovascular Outcomes Assessment of the MitraClip Percutaneous Therapy for Heart Failure Patients With Functional Mitral Regurgitation; NCT01626079) trial (16) randomized 614 patients with heart failure and moderate to severe mitral regurgitation to either mitral valve repair plus medical therapy (device group) or medical therapy alone (control group). Compliance was good: only 3% of the device group (n = 302) had no procedure attempted, and 1% of the control group (n = 312) underwent transcatheter mitral valve repair. The primary effectiveness endpoint was all HFHs over 24 months of follow-up, and all-cause death was a key secondary endpoint (Figure 3).
The COAPT trial is among the first published trials in cardiology to use a repeat-events analysis for its primary endpoint; that is, all HFHs (including repeats) contribute to the treatment comparison, not just the conventional time to first event (HFH or cardiovascular [CV] death). This approach has certain advantages: the total burden of relevant disease is better captured, and also the statistical power of the trial is likely to be enhanced (17). However, the consequent statistical techniques are more complex, and as yet there is no overall expert consensus regarding a universally accepted optimal method.
But first let us look at the descriptive statistics. In the device and control groups, respectively, 92 and 151 patients had at least 1 HFH, and the total numbers of HFHs (including repeats) were 160 and 283, respectively. These cumulative numbers of HFHs over time are shown in Figure 3A. The numbers of deaths over 2 years follow-up were 80 and 121 in the device and control groups, respectively (Figure 3B). These striking numeric differences require formal statistical analysis. For mortality that is straightforward, with an HR of 0.62 (95% CI: 0.46 to 0.82; log-rank p < 0.001). For the primary endpoint, all HFHs, the issue is made more complicated by the fact that death is an informative censoring or competing risk. That is, patients who are more prone to have 1 or more HFHs are also at a higher risk for dying: the jargon term for this is joint frailty. Another issue is that some patients have multiple HFHs, and this lack of independence of such repeat events needs correcting for in the analysis. The precise methodology required (18) is too technical to describe here, but the end result is a joint frailty–adjusted HR for all HFHs of 0.53 (95% CI: 0.40 to 0.70; p < 0.001).
As a simpler plausibility check on such “black box” complexity, one can convert the total HFHs in the device and control groups, 160 and 283, respectively, into annualized rates of 35.8 and 67.9 per 100 patient-years. The consequent rate ratio is 0.53, a reassuringly identical result. It is useful to convert such estimates of relative treatment effect into absolute risk reductions. Accordingly, the investigators report the number needed to treat to prevent 1 HFH (3.1; 95% CI: 1.9 to 7.9) and also to prevent 1 death (5.9; 95% CI: 3.9 to 11.7), both over 2 years of follow-up.
The MITRA-FR (Multicentre Study of Percutaneous Mitral Valve Repair MitraClip Device in Patients With Severe Secondary Mitral Regurgitation; NCT01920698) trial (19) also compared mitral valve repair with medical therapy alone and came to very different conclusions. The investigators randomized patients with severe secondary regurgitation to undergo percutaneous mitral valve repair plus medical therapy (device group, n = 152) or to receive medical therapy alone (control group, n = 152). The primary efficacy endpoint was a composite of death or unplanned HFH (i.e., a conventional time-to-first-event analysis, and follow-up is 12 months). The key results for both this primary efficacy endpoint and for all-cause death are shown in Figure 4, with the corresponding results for COAPT shown alongside.
MITRA-FR showed no significant difference between the device and control groups for the composite of death and heart failure hospitalization or for all-cause death (p = 0.53 and p = 0.66, respectively). Furthermore, the estimated HRs are numerically in favor of control (HRs: 1.16 and 1.11, respectively). This is in sharp contrast to COAPT’s highly significant findings in favor of the device (HRs: 0.57 and 0.62, respectively; p < 0.001 for both). Interaction tests for between-trial heterogeneity of treatment effect for the composite endpoint and for all-cause death are both statistically significant (p = 0.007 and p = 0.039, respectively).
There is no obvious single answer as to why the findings of these 2 trials differ so markedly (20,21). Points to consider are the following. 1) Follow-up was only 1 year in MITRA-FR but 2 years in COAPT. COAPT showed a marked difference in HFH incidence at 1 year, whereas its difference in mortality at 1 year, 19.3% versus 24.7%, is less striking, with a greater divergence in mortality out to 2 years, 29.1% versus 46.1%. Perhaps once MITRA-FR also achieves 2 years of follow-up, the discrepancy between the trials may become less marked. 2) Did the trials differ in their use of medical therapy in both arms, both before patient entry and during follow-up? 3) Were there differences in procedural performance? In all such trials of novel devices, there is a learning curve among interventionists; was the level of operator experience adequately achieved in both trials? 4) There are between-trial differences in patient eligibility criteria and also in the actual patterns of patient recruitment. Specifically, patients in MITRA-FR tended to have marked left ventricular dilation but a more modest degree of mitral regurgitation.
The evident superiority of outcomes in the device arm of the COAPT trial is impressive. But like all such strategy trials of device versus control, the proportion of patients with the relevant condition (heart failure and secondary mitral regurgitation) who actually got randomized is relatively low. Thus, the key question for future therapeutic practice is whether we can successfully define an identifiable patient population for whom the use of transcatheter mitral valve repair is an optimal strategy.
The DECLARE Trial
The DECLARE-TIMI 58 (Multicenter Trial to Evaluate the Effect of Dapagliflozin on the Incidence of Cardiovascular Events-Thrombolysis In Myocardial Infarction 58; NCT01730534) trial (22) randomized 17,160 patients with type 2 diabetes to dapagliflozin or placebo. Of these patients, 41% had established atherosclerotic CV disease, with the remainder having multiple risk factors. The coprimary endpoints were major adverse CV events (MACE) (CV death, myocardial infarction [MI], stroke) and the composite of CV death or HFH. The pre-specified hierarchy of secondary endpoints was a renal composite and then all-cause death. MACE was also the primary safety outcome, to comply with the U.S. Food and Drug Administration requirement to undertake noninferiority testing of all new antidiabetic agents (23).
Table 2 shows results for the primary, secondary, and other key outcomes over a median 4.2 years of follow-up. To preserve type I error and to accommodate potential interim stopping of the trial, it was pre-declared that each coprimary endpoint had to achieve p < 0.0231 to claim statistical significance. Only if both passed this hurdle was the sequence of 2 secondary endpoints allowed to enter formal significance testing.
In fact, of the 2 coprimaries, CV death and HFH revealed strong evidence of efficacy (HR: 0.83; 95% CI: 0.73 to 0.95; p = 0.005), whereas MACE did not (p = 0.17). Thus, no formal statistical testing was then performed on secondary endpoints. But the renal composite (the first secondary endpoint) had an HR of 0.76 (95% CI: 0.67 to 0.87; p = 0.00004), an impressive result that is declared an exploratory finding, which formally does not constitute a claim of efficacy for renal benefits of dapagliflozin.
We think this interpretation lacks common sense, and it could characterize a more general failing in the way secondary endpoints are handled in major trials.
Following are a few suggestions. First, if a secondary endpoint exhibits overwhelming evidence of treatment superiority (as in this case), this needs to be cautiously accepted in a trial’s conclusions. The U.S. Food and Drug Administration approval of empagliflozin for reducing CV death in the EMPA-REG OUTCOME (BI 10773 [Empagliflozin] Cardiovascular Outcome Event Trial in Type 2 Diabetes Mellitus Patients; NCT01131676) trial (24) is an example.
Second, the formal use of p values to aid decision making is most important to regulators (e.g., the FDA and the European Medicines Agency), whose responsibility is to approve a drug (or not) for a specific indication. Medical journals have a different responsibility: the presentation of scientific evidence. Although type I error control is an important principle, it should not become an obsession suppressing important secondary findings.
Third, we do not agree with the current New England Journal of Medicine practice of not presenting p values for secondary endpoints if the primary endpoints are not statistically significant. For instance, in Table 1, only the first 2 p values are reported in the paper by Maurer et al. (11). This feeds into the misinterpretation of p < 0.05 as an “accept or reject” negative or positive philosophy of trial statistics, an issue one of us has critically appraised in 2 New England Journal of Medicine papers (25,26).
Fourth, another widespread but suspect practice is to pre-define secondary endpoints in a pre-defined hierarchy of statistical testing. Thus, trialists struggle as to what order to choose for their secondary endpoints, in effect having to “gaze into the crystal ball” as to which outcomes have the best chance of success while also bearing in mind their relative clinical importance.
Fifth, we recommend that this practice should stop. Instead we propose that in future trials we consider all pre-defined secondary endpoints on an equal footing while implementing a statistical correction for multiple testing (e.g., Bonferroni or Hochberg procedure). Thus, if one has 5 secondary endpoints and the primary endpoint is formally significant, then if 1 of the secondary endpoints achieves p < 0.01, it should also be considered a formally significant finding.
We now turn our attention to how the DECLARE trial findings should be integrated into the overall evidence regarding the CV benefits of sodium-glucose cotransporter-2 inhibitors. One way of accomplishing this is in a meta-analysis (27) combining findings from EMPA-REG (28), CANVAS (Canagliflozin Cardiovascular Assessment Study; NCT01032629) (29), and DECLARE. We summarize and also extend their findings in Figure 5.
For MACE (CV death, MI, and stroke) (Figure 5A), we see a consistency of benefit across trials in patients with pre-existing atherosclerotic disease: the overall HR was 0.86 (95% CI: 0.80 to 0.93; p = 0.0002). Even though this fell short of conventional significance in DECLARE, there is no evidence of heterogeneity across trials (interaction p = 0.63). However, in the 2 trials that did include patients with multiple risk factors only (CANVAS and DECLARE), there was no such signal of a benefit on MACE for such patients: the overall HR was 1.00 (95% CI: 0.87 to 1.16; p = 0.98). Note that the heterogeneity test comparing these 2 overall HRs is of marginal significance (interaction p = 0.07), so we need to exercise some restraint before presuming that the whole MACE benefit story is solely confined to patients with pre-existing disease.
For HFH and CV death, we see strong evidence of treatment efficacy across trials (Figure 5B), which applies equally well in patients with and without histories of heart failure: the HRs are 0.71 and 0.79, respectively (p < 0.0001 for both). This is the most consistently positive finding from these trials of sodium-glucose cotransporter-2 inhibitors. Whether these findings for patients with heart failure also extrapolate to such patients without diabetes is currently being investigated in the EMPEROR-HF (Empagliflozin Outcome Trial in Patients With Chronic Heart Failure; NCT03057951 and NCT03057977) and DAPA-HF (Study to Evaluate the Effect of Dapagliflozin on the Incidence of Worsening Heart Failure or Cardiovascular Death in Patients With Chronic Heart Failure; NCT03036124) trials.
For CV death and all-cause death (Figures 5C and 5D), the striking reductions due to empagliflozin in the EMPA-REG trial have not been found in CANVAS and DECLARE. The tests for heterogeneity of effects across trials are significant (interaction p = 0.007 and p = 0.018, respectively, for CV death and all-cause death). As far as we understand, there is no obvious biological explanation for these differing effects on mortality. Faced with no backup evidence, one is entitled to speculate that the surprisingly large magnitudes of observed effects with empagliflozin (HRs: 0.62 and 0.68 for CV death and all-cause death, respectively) may be by chance an exaggeration of the truth, but it is hard to dispute the effects exists given their extreme significance (p < 0.0001 for both).
The REDUCE-IT Trial
The REDUCE-IT (Evaluation of the Effect of AMR101 on Cardiovascular Health and Mortality in Hypertriglyceridemic Patients With Cardiovascular Disease or at High Risk for Cardiovascular Disease; NCT01492361) trial (30) evaluated whether 2 g twice daily of icosapent ethyl, a purified ethyl ester of eicosapenteic acid (EPA), could reduce the risk for CV events. A total of 8,179 patients, 71% with established CV disease and 29% with diabetes and other risk factors, who had been on a statin were randomized to EPA or placebo and followed for a median 4.9 years. The primary composite endpoint comprised CV death, MI, stroke, coronary revascularization, or unstable angina.
Table 3 summarizes the key trial findings. There was a markedly reduced incidence of the primary endpoint (HR: 0.75; 95% CI: 0.68 to 0.83; p < 0.0001). Every component of the primary endpoint showed evidence of a significant effect, this being most marked for MI, coronary revascularization, and unstable angina. There were significantly fewer CV deaths on EPA (p = 0.03), but because non-CV deaths were almost identical in the 2 groups, the finding is more tenuous for all-cause death (p = 0.09). It is also worth noting that from Kaplan-Meier plots for the primary endpoint (and also for the key secondary endpoints of CV death, MI, and stroke), there is no evidence of a treatment effect in the first year of follow-up.
These positive results in REDUCE-IT need to be set against the previous disappointing findings from other trials of omega-3 fatty acids. For instance, a meta-analysis of 10 such trials in 77,917 patients (31) gave for any major vascular event an HR of 0.97 (95% CI: 0.95 to 1.01).
One statistical option when making sense of 1 positive trial in an otherwise negative field is to undertake Bayesian analyses. Thus, one expresses for the trial’s primary endpoint a plausible distribution of prior belief for the true HR and then uses Bayes’s theorem to combine such a prior belief distribution with evidence from the trial to produce a posterior belief distribution (32). Because no one prior suits everyone, we present 3 scenarios.
First, the pessimist’s prior: The pessimist does not perceive the specific omega-3 fatty acid used in REDUCE-IT as being any different from the rest. Hence, the pessimist uses the meta-analysis estimate as the prior distribution and integrates it with the new data, as shown in Figure 6A. Because the prior data and pessimist’s belief distribution are so strongly negative, the evidence from REDUCE-IT only slightly shifts the belief, with posterior belief having a median HR of 0.95 with a 95% credibility interval of 0.92 to 0.98. But one should note that the prior meta-analysis and REDUCE-IT are in sharp disagreement: an interaction test comparing their respective HRs, 0.97 and 0.75, has p < 0.001. Thus, one might argue that the pessimist’s prior reflects an overwhelming prejudice against all omega-3 fatty acids, which is clearly overruled by REDUCE-IT.
Second, the realist’s prior: The realist accepts the principle that the EPA in REDUCE-IT may be onto something better than previously studied omega-3 fatty acids but believes equally that it might conceivably do harm. Thus, the realist chooses a prior distribution centered on no true effect (HR: 1.00) with a wider spread of uncertainty (Figure 6B). To be precise, we use a normal distribution for the prior of the log (HR), which has a positive tail out to 20% reduction in hazard with prior probability 0.025. In this scenario, the realist’s caution leads to a posterior belief having a median HR of 0.79 with a 95% credibility interval of 0.72 to 0.86. That is, the realist now accepts the positive finding in REDUCE-IT but is inclined to believe that the true effect of EPA is somewhat less than that observed. Note that if REDUCE-IT had been a much smaller trial, then the interpretation would be more cautious. For instance, if REDUCE-IT had been only one-quarter as big with the same observed HR of 0.75, then the posterior median belief would have been 0.85, with a 95% credibility interval of 0.75 to 0.99. That is, one needs a very large trial to convince people when previous evidence has been so negative.
Third, the optimist’s prior: The optimist asserts that the EPA and its high dose used in REDUCE-IT bears no relation to previous trials of other omega-3 fatty acids. One therefore has no prior insight into what the treatment effect might truly be. In such circumstances, one chooses a vague prior, in which case the evidence is its own posterior distribution automatically. Such optimism is implicit in any conventional (frequentist) presentation of results. That is, the published estimate, 95% CI, and p value are compatible with a vague prior. This optimistic stance is encouraged by a Japanese trial of 1.8 g/day of icosapent ethyl (33), which showed a significant 19% reduction in coronary events, though that trial was not blinded.
By its very nature, the use of Bayesian techniques to see whether a trial’s evidence is strong enough to overcome prior skepticism has no single right answer. But Bayesian methods are a quantitative tool that may help clarify the arguments.
Another way of interpreting such a highly significant result in a unique trial is to argue how many replicate trials would have to achieve 1-sided p < 0.025 to have the same strength of evidence as the observed significance for the primary endpoint, which for REDUCE-IT is p = 0.00000001. The answer is that it would take 4 replicate trials, each with 1-sided p = 0.025, to approach the same level of proof beyond reasonable doubt that efficacy is established.
Further insight into the efficacy of EPA is documented by the large reduction in total ischemic events in the REDUCE-IT trial (34). In addition to the 1,606 patients having first primary events, there were a further 1,303 repeat events. There is an estimated 30% risk reduction in this total burden of ischemic events (95% CI: 22% to 38% reduction).
But we do need to recognize the adverse event profile of EPA in the REDUCE-IT trial (Table 3). There are statistically significant increases in incidence of constipation (+1.8%), peripheral edema (+1.5%), and atrial fibrillation (+1.4%), the latter being of particular concern. These absolute risks need setting against the observed absolute benefits, which are −4.8% for the primary endpoint and −3.6% for the composite of CV death, MI, and stroke. Furthermore, the increase in peripheral edema is not accompanied by an increase in heart failure, and the increase in atrial fibrillation is offset by a significantly lower incidence of stroke.
The European Medicines Agency has recently reviewed the totality of evidence for omega-3 fatty acid (35) and concluded, “the benefit-risk balance of omega-3 ethyl esters in secondary prevention after myocardial infarction is not favorable.” The key question is, does the REDUCE-IT trial genuinely show that a high dose of icosapent ethyl has a much more favorable benefit-risk profile?
The AUGUSTUS Trial
AUGUSTUS (An Open-Label, 2 × 2 Factorial, Randomized Controlled, Clinical Trial to Evaluate the Safety of Apixaban vs. Vitamin K Antagonist and Aspirin vs. Aspirin Placebo in Patients With Atrial Fibrillation and Acute Coronary Syndrome or Percutaneous Coronary Intervention; NCT02415400) (36) was an open-label 2 × 2 factorial trial to evaluate the safety of apixaban versus a vitamin K agonist (VKA) and aspirin versus placebo in 4,614 patients with atrial fibrillation and acute coronary syndrome and/or percutaneous coronary intervention. All patients also received a P2Y12 inhibitor for 6 months and also aspirin on the actual day of acute coronary syndrome and/or percutaneous coronary intervention. The primary outcome through 6 months was major bleeding or clinically relevant nonmajor bleeding using the International Society on Thrombosis and Haemostatis definitions. Secondary outcomes included major bleeding, death, rehospitalization, and ischemic events (i.e., MI, stroke, stent thrombosis, urgent revascularization).
The nature of such a factorial trial (37) is that each patient is simultaneously randomized twice: to apixaban versus VKA and to aspirin versus placebo. Thus, there are 4 treatment groups: apixaban + aspirin, apixaban + placebo, VKA + aspirin, and VKA + placebo. However, the pre-defined analysis plan is to concentrate on results for each of the 2 treatment factors (apixaban vs. VKA and then aspirin vs. placebo) in all 4,614 randomized patients. This is accompanied by an interaction test examining whether any difference between apixaban and VKA depends on whether the patients received aspirin or placebo. Comparison of the 4 treatment groups is considered a secondary descriptive data display without formal statistical analysis.
Results for the primary and key secondary endpoints are shown in Table 4. For the primary endpoint, major and clinically relevant nonmajor bleeding, there is strong evidence that apixaban is superior to VKA (HR: 0.69; 95% CI: 0.58 to 0.81; p < 0.0001). Also, aspirin is clearly inferior to placebo (HR: 1.90; 95% CI: 1.60 to 2.25; p < 0.0001). If one confines attention to major bleeds only, the same conclusions are evident.
Results for hospitalization over 6 months reveal a significant reduction on apixaban compared with VKA (p = 0.002). There were also slightly more hospitalizations on aspirin compared with placebo (p = 0.12). There was an overall 3.3% mortality over 6 months with no evidence of treatment differences (data not shown). The composite of death and ischemic events had an overall 6.35% incidence over 6 months, again with no evidence of treatment differences (Table 4). For all these outcomes, the statistical tests of interaction were not statistically significant.
We now display in Figure 7 the same outcomes broken down by the 4 randomized treatment groups: apixaban + aspirin, apixaban + placebo, VKA + aspirin, and VKA + placebo. This provides reassuring evidence that the lowest incidence of both bleeding and hospitalizations is for patients on apixaban + placebo. A legitimate concern is that giving apixaban without aspirin might have increased the risk for ischemic events, but in fact, the incidence of the composite of death and ischemic events for apixaban + placebo is comparable with that for apixaban + aspirin.
Given that the primary endpoint superiorities of apixaban over VKA and placebo over aspirin were very highly significant, one might question whether the trial should have been stopped earlier (38). One of us (S.J.P.) was on the trial’s data monitoring committee that decided not to recommend early stopping. Like all trials of antithrombotic drugs, there is a trade-off between treatment efficacy (prevention of ischemic events) and safety (an increased bleeding risk). Although AUGUSTUS was well powered to reach definitive findings on the latter, the incidence of ischemic events was substantially lower. Therefore, early stopping of the trial would have inhibited the ability to reach clear conclusions on treatment efficacy.
A network meta-analysis (R. Lopes, personal communication, March 19, 2019) has integrated the evidence from AUGUSTUS with 3 other relevant trials. They concluded that a strategy based on non-VKA oral anticoagulants (such as apixaban) plus a P2Y12 inhibitor, but without aspirin, is the preferred regimen post-percutaneous coronary intervention for patients who also have atrial fibrillation.
Each of the 6 clinical trials we have critiqued has important implications for future patient care. We have aimed to provide a balanced account of their key findings, how they fit into the context of previous knowledge, and each study’s strengths and weaknesses.
Along the way, we have highlighted some methodological advances and challenges pertinent to trials research in general. These include: 1) the problems of treatment crossovers and the cautious use of alternatives to ITT analysis; 2) when to do a pivotal randomized trial of a new procedure requiring investigator skills; 3) the use of new hierarchical methods (e.g., win ratio) for combining evidence from fatal and nonfatal events; 4) the use of repeat events (not just time to first event) when comparing treatments in a chronic disease such as heart failure; 5) allowance for the competing risk for death when reporting the treatment effect on nonfatal events such as hospitalizations by use of joint-frailty models; 6) how to interpret the overall evidence, when 2 related trials reach contrasting conclusions; 7) the need for a better understanding of secondary endpoint findings, perhaps abandoning the current practice of hierarchical testing; 8) how to place a new trial in the context of prior related trials and appropriate use of meta-analyses; 9) how to interpret a very positive trial in an otherwise negative field and the potential value of Bayesian methods; and 10) how to present and interpret the findings of a 2 × 2 factorial trial.
Overall, we appreciate this opportunity to review some key recent randomized trials, and we hope our insights are of value, for both their clinical relevance and their implications for future clinical trials research and its methodology.
Listen to this manuscript's audio summary by Editor-in-Chief Dr. Valentin Fuster on JACC.org.
Dr. Pocock has served on Steering Committees or Data Monitoring Committees for trials sponsored by Abbott Vascular, Amirin, AstraZeneca, Bayer, Biosensors, Boehringer Ingelheim, Boston Scientific, Bristol-Myers Squibb, Edwards Lifesciences, Idorsia, Medtronic, Novartis, and Vifor. Dr. Collier has served on Data Monitoring Committees sponsored by AstraZeneca, Boston Scientific, Daiichi-Sankyo, Devax, Infraredx, Medtronic, Pfizer, and Zoll.
- Abbreviations and Acronyms
- confidence interval
- cardiovascular hospitalization
- eicosapenteic acid
- heart failure hospitalization
- hazard ratio
- major adverse cardiovascular event(s)
- myocardial infarction
- vitamin K agonist
- Received February 19, 2019.
- Revision received March 21, 2019.
- Accepted March 24, 2019.
- 2019 American College of Cardiology Foundation
- Pocock S.J.,
- Collier T.J.
- Packer D.L.,
- Mark D.B.,
- Robb R.A.,
- et al.,
- for the CABANA Investigators
- Buxton M.J.
- Hernán M.A.,
- Scharfstein D.
- Leuchs A.K.,
- Zinserling J.,
- Brandt A.,
- Wirtz D.,
- Benda N.
- Noseworthy P.A.,
- Gersh B.J.,
- Kent D.M.,
- et al.
- Marrouche N.F.,
- Brachmann J.,
- Andresen D.,
- et al.
- January C.T.,
- Wann L.S.,
- Calkins H.,
- et al.
- Dong G.,
- Qiu J.,
- Wang D.,
- Vandemeulebroecke M.
- Stone G.W.,
- Lindenfeld J.,
- Abraham W.T.,
- et al.,
- COAPT Investigators
- Claggett B.,
- Pocock S.,
- Wei L.J.,
- Pfeffer M.A.,
- McMurray J.J.V.,
- Solomon S.D.
- Rogers J.K.,
- Yaroshinsky A.,
- Pocock S.J.,
- Stokar D.,
- Pogoda J.
- Obadia J.F.,
- Messika-Zeitoun D.,
- Leurent G.,
- et al.,
- for the MITRA-FR Investigators
- Nishimura R.A.,
- Bonow R.O.
- Packer M.
- U.S. Food and Drug Administration, Center for Drug Evaluation and Research
- U.S. Food and Drug Administration, Center for Drug Evaluation and Research
- Pocock S.J.,
- Stone G.W.
- Zelniker T.A.,
- Wiviott S.D.,
- Raz I.,
- et al.
- Spiegelhalter D.J.,
- Freedman L.S.,
- Mahesh K.B.P.
- Yokoyama M.,
- Origasa H.,
- Matsuzaki M.,
- et al.,
- for the Japan EPA Lipid Intervention Study (JELIS) Investigators
- Bhatt D.L.,
- Steg P.G.,
- Miller M.,
- et al.,
- for the REDUCE-IT Investigators
- European Medicines Agency
- Lopes R.D.,
- Heizer G.,
- Aronson R.,
- et al.,
- for the AUGUSTUS Investigators