## Journal of the American College of Cardiology

# Trials and Tribulations of Non-InferiorityThe Ximelagatran Experience

## Author + information

- Received May 21, 2005
- Revision received July 6, 2005
- Accepted July 11, 2005
- Published online December 6, 2005.

## Author Information

- Sanjay Kaul, MD
^{⁎},^{⁎}(kaul{at}cshs.org), - George A. Diamond, MD, FACC
^{⁎}and - William S. Weintraub, MD, FACC†

- ↵⁎
**Reprint requests and correspondence:**

Dr. Sanjay Kaul, Division of Cardiology, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Los Angeles, California 90048.

Trials and Tribulations of Non-Inferiority: The Ximelagatran Experience

Sanjay Kaul, George A. Diamond, William S. Weintraub

A variety of subtle assumptions challenge the design, analysis, and interpretation of non-inferiority trials. Among these are the arbitrary marginal and fractional thresholds employed for characterization of non-inferiority and that the standard treatment performs as it did in previous placebo-controlled trials. These assumptions must be made explicit and their influence on the resultant conclusions must be assessed rigorously via sensitivity analyses. Thus, when these sensitivity analyses were applied to the key assumptions underlying the recently reported Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation (SPORTIF) trials, they materially undermined the authors’ conclusion regarding non-inferiority of ximelagatran relative to warfarin in the management of patients with nonvalvular atrial fibrillation.

## Abstract

Ximelagatran is a novel oral direct thrombin inhibitor that offers a number of advantages over the standard treatment, warfarin, in patients with atrial fibrillation. Two large clinical trials, one open-label (Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation [SPORTIF] III), one double-blind (SPORTIF V), have compared the efficacy and safety of fixed-dose ximelagatran without anticoagulation monitoring with dose-adjusted warfarin using a non-inferiority design. On the basis of the results, the investigators concluded that ximelagatran was just as effective as warfarin in preventing stroke or systemic embolism (the primary end point), because the pre-specified non-inferiority criterion was met. Reanalysis of the data with rather conservative interpretive criteria, however, revealed a number of deficiencies: 1) an unreasonably generous margin that was potentially biased toward non-inferiority, given the low baseline event rate of warfarin; 2) the inappropriateness of the analytical method used to estimate the non-inferiority margin; 3) a lack of confidence that ximelagatran retains at least 50% of warfarin’s effect (a prerequisite to the establishment of non-inferiority); 4) significant heterogeneity in the magnitude of efficacy observed in the two trials; and 5) safety concerns regarding increased liver toxicity with ximelagatran without a significant offsetting advantage in major bleeding. This imbalance in the benefit-risk profile materially undermines the investigators’ claim of non-inferiority of ximelagatran and led the Food and Drug Administration to reject the sponsor’s application for ximelagatran. Despite published conclusions to the contrary, we conclude that ximelagatran has not been shown to be non-inferior to warfarin. Such determinations of non-inferiority are highly dependent on the underlying assumptions, and graphical sensitivity analyses make this dependence explicit.

Nonvalvular atrial fibrillation (AF) is associated with an increased risk of ischemic stroke and systemic embolization (1). Anticoagulation therapy with warfarin reduces this risk by approximately two-thirds (1,2). Aspirin is generally less effective than warfarin, but evidence suggests that it is superior to placebo, especially in low-risk patients (1). Despite the compelling evidence that anticoagulation with warfarin reduces the risk of stroke in most patients with AF, this therapy continues to be underused, with fewer than one-half of eligible patients taking it (3).

Alternative approaches to anticoagulation have resulted in the development of an oral direct thrombin inhibitor, ximelagatran (4–6), which offers several practical advantages over warfarin. It has a stable and predictable pharmacokinetic profile that is independent of body weight and other patient variables, rapid onset and offset of action, and minimal interaction with diet and drugs, thereby eliminating the need for anticoagulation monitoring and dose adjustment. The safety and efficacy of ximelagatran in patients with AF at risk for ischemic stroke has been evaluated in two phase-III studies within the Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation (SPORTIF) program (4–6).

The SPORTIF III study was a randomized, open-label parallel group trial that sought to determine if ximelagatran, administered in a 36-mg twice-daily fixed dose, was non-inferior to the current efficacy standard of warfarin (dose-adjusted to an international normalized ratio of two to three) for prevention of strokes and systemic embolic events in 3,407 patients with nonvalvular AF and, at least, one stroke risk factor (4). The SPORTIF V trial was similar, but with double-blind treatment allocation involving 3,922 patients (6). The primary efficacy and secondary safety results are summarized in Table 1.

On the basis of these results, the SPORTIF investigators concluded that “ximelagatran treatment is at least as effective as well-controlled warfarin treatment for prevention of stroke and systemic embolism [and] might have a more favorable benefit/risk profile than warfarin for patients with atrial fibrillation” (4–6). As a result of these findings, the corporate sponsor submitted a new drug application to the Food and Drug Administration (FDA). Nevertheless, on October 8, 2004, the FDA rejected the application on the recommendation of its Cardiovascular and Renal Drugs Advisory Committee, owing to concerns over safety and, to a lesser degree, the method of measuring efficacy (7). An understanding of this decision requires a deeper appreciation of the methods and assumptions underlying the design and analysis of the SPORTIF trials, in particular, and of non-inferiority trials in general.

The non-inferiority design employed in the SPORTIF trial, unlike a placebo-controlled design, compares the new experimental treatment with the current standard treatment (“active control”) rather than placebo. This design is justified whenever treatment with placebo is considered “unethical,” as in the SPORTIF trial (4–6). The critical issues involved in non-inferiority trials are highlighted in Table 2.There are two basic approaches to non-inferiority analysis (8–15). The first approach seeks to determine if the new treatment is inferior to the active control treatment by no more than some pre-defined margin (“marginal analysis”). The second approach seeks to determine if the new treatment preserves some pre-defined fraction of the standard treatment’s effect (“fractional analysis”). Both of these approaches introduce a number of statistical assumptions, not always specified or justified. Our goal is to describe the critical assumptions underlying the design and analysis of the SPORTIF trial and to explore the robustness of its conclusions with respect to these assumptions.

## Non-inferiority trial design

### Marginal analysis of non-inferiority

One of the most crucial steps in the non-inferiority trial design is the specification of a non-inferiority margin which quantifies the worst case loss in efficacy that is clinically acceptable, considering the potential safety, convenience, or cost advantages of the new treatment. No universally accepted “gold standard” criterion exists for estimation of this margin (8–15). The International Conference on Harmonization guidance (16) advises that the determination of the margin should be: 1) specified a priori, 2) on the basis of both clinical judgment and statistical reasoning, and 3) suitably conservative, reflecting the uncertainty in evidence.

A clinically accepted norm for non-inferiority margin is a proportional difference of 15% to 20% or less, smaller than the typical 20% to 25% “minimally clinically important difference” criterion employed in superiority trials; however, what constitutes a clinically acceptable difference is ultimately a matter of judgment and might vary widely for each patient, physician, investigator, regulator or payor, and the clinical circumstance. For example, any difference in hard outcomes like mortality or irreversible morbidity (myocardial infarctions or disabling stroke) might be argued to be clinically meaningful, thereby warranting the choice of narrow conservative margins. In contrast, a larger reduction in efficacy and, therefore, a more liberal margin might be tolerable if the outcome is less robust, as with any reversible morbidity (recurrent ischemia or transient ischemic attack), and if the new treatment offers significant improvements in administration, adverse effects, and cost.

Despite the emphasis on clinical judgment, the deciding factor for determination of the margin is often statistical, given the subjective and somewhat arbitrary nature of the former. From a statistical perspective, the margin should be, at the very least, no larger than the worst limit of 95% confidence interval (CI) of standard treatment effect relative to placebo (8,13,14), but it could be smaller so as to have assurance that the new treatment has greater than minimal efficacy (10). One proposal for selecting the margin is to take one-half of the magnitude of the worst limit of this CI—the so-called “50% rule” or “95-95 method” recommended by the FDA (13,14). This conservative margin, however, often results in a high “false-negative” rate (type II error; i.e., low power to demonstrate non-inferiority). In general, the objective should be to limit “false-positive” (type I) errors by avoiding too liberal a margin and “false-negative” (type II) errors by avoiding too conservative a margin with respect to a claim of efficacy. The margin is generally set in terms of an absolute or relative difference, the latter being favored over the former given its greater trial-to-trial stability (10). The active control effect is best determined from a random effects meta-analysis to account for trial-to-trial variability (8,10,13,14).

### Estimation of the non-inferiority margin for SPORTIF

Figure 1summarizes the estimation of the non-inferiority margin. The warfarin effect has been assessed in six placebo-controlled studies (17–22), five of which enrolled patients without and one (European Atrial Fibrillation Trial [EAFT]) with recent transient ischemic attack or stroke. Because the EAFT study enrolled a higher risk population than did the SPORTIF trial, one can justify not including it in the estimation of the summary effect of warfarin. The SPORTIF investigators assumed a baseline warfarin event rate of 3.1%/year and a non-inferiority margin of 2% in terms of absolute difference, representing their estimate of the maximal difference considered to be clinically acceptable (4,6). This margin, however, is approximately equivalent to the 95% lower limit of the absolute risk difference derived from the five pooled trials. Taking 50% of this limit, as suggested by the “50% rule,” results in a margin of 1%, one-half of that of the margin used by the SPORTIF trial. With meta-analysis, the margin is estimated to be even smaller—0.85% for a fixed-effects model and 0.68% for a random-effects models. The corresponding margins in terms of risk ratio vary from 1.22 for an absolute margin of 0.68% to 1.65 for an absolute margin of 2% at an expected warfarin rate of 3.1% (Fig. 1). Thus, the non-inferiority risk ratio margin of 1.65 in the SPORTIF trial was unreasonably generous, exceeding those typically encountered in contemporary non-inferiority cardiovascular clinical trials such as the Randomized Evaluation in PCI Linking Angiomax to Reduced Clinical Events (REPLACE II) (1.18), Valsartan in Acute Myocardial Infarction (VALIANT) (1.13), Pravastatin or Atorvastatin Evaluation and Infection Therapy (PROVE-IT) (1.17), Superior Yield of the New strategy of Enoxaparin, Revascularization, and GlYcoprotein IIb/IIIa inhibitors (SYNERGY) (1.10), and Aggrastat to Zocor (A-to-Z) (1.11). The risk ratio margin would have been even higher (nearly 2.0) had the investigators chosen an appropriate warfarin rate—their choice for the warfarin rate (3.1%) exceeded the rate supported by historical pooled data (1.9%) by more than 50%.

### Marginal analysis of the SPORTIF trial

The primary non-inferiority analyses reported for the SPORTIF III and SPORTIF V trials are shown in Figure 2.Unlike superiority trials, in which intention-to-treat (ITT) is preferred to on-treatment (OT) analysis, the ITT analysis is biased toward non-inferiority. Hence, both analyses should be performed for a non-inferiority trial, similar results supporting the robustness of the conclusion (8,13,14). Non-inferiority is established for all margins in the SPORTIF III trial for both ITT and OT analyses. Superiority of ximelagatran over warfarin is established in the SPORTIF III trial for OT analysis. In contrast, non-inferiority is established only for a margin of 2% with respect to absolute difference and is not established for any margin with respect to risk ratio in the SPORTIF V trial. According to this analysis, an inference of non-inferiority is highly sensitive to one’s choice of the non-inferiority margin—the higher the margin, the easier it is to establish non-inferiority. A sensitivity analysis over a continuous range of values is shown in Figure 3.Non-inferiority is supported in the SPORTIF V trial only for an absolute risk difference ≥1.04% and a risk ratio ≥1.87. Such margins are arguably too liberal to be considered clinically relevant. In contrast, the marginal threshold for non-inferiority for the SPORTIF III trial is suitably conservative—an absolute risk difference of 0.17% or a risk ratio of 1.05.

### Fractional analysis of non-inferiority

Unlike placebo-controlled superiority trials, non-inferiority trials do not provide a direct way to distinguish effective from ineffective therapies (i.e., the two therapies could be equally effective or equally ineffective). Many non-inferiority analyses are conducted on the tacit assumption that the new and standard treatments would have demonstrated effectiveness when compared directly with placebo had such a comparison been made, fulfilling the so-called requirement of “assay sensitivity” (8,10–15). One can perform such a comparison, albeit indirectly, via the “putative placebo” approach. The effect of the new treatment versus “putative” placebo is imputed from the observed effects in the current trial and the historical placebo-controlled trials of the standard treatment, as illustrated in Figure 4(11–15). The putative placebo approach makes a critical assumption of “constancy,” to the effect that the standard treatment performs as it did in previous placebo-controlled trials. In practice, this assumption is not necessarily plausible because of differences in patient characteristics, concomitant medications, intensity of treatment, and other key design features (12–15). One way to “discount” for this limitation is by estimating the fraction of the standard treatment effect retained by the new treatment. This is determined as a ratio of the imputed effect of the new treatment versus putative placebo relative to the effect of the standard treatment versus placebo along with its estimated variance and CI (12–15). Non-inferiority is inferred if the CI of this fraction exceeds a pre-specified minimum threshold (arbitrarily, 0.5) that is considered to be “clinically important” (i.e., the 95% lower limit should exceed 0.5 fraction) (12–15). Thus, for a new treatment to be declared effective, it must not only be superior to the putative placebo but it must also preserve a pre-specified fraction of the standard treatment’s effect.

### Fractional analysis of the SPORTIF trial

Comparison of ximelagatran versus putative placebo is shown in Figure 4. Superiority is established for all three analyses—ximelagatran exhibited a 73% relative risk reduction (95% CI, 56% to 85%) relative to placebo in the SPORTIF III trial, 48% relative risk reduction (95% CI, 14% to 69%) in the SPORTIF V trial, and a 63% relative risk reduction (95% CI, 45% to 76%) in the SPORTIF III+V trial. Non-inferiority is established in the SPORTIF III trial and in the combined SPORTIF III+V trial analyses but not in the SPORTIF V trial alone with the liberal “lower bound” criterion. With the more conservative “upper bound” criterion (15), non-inferiority is established only for the SPORTIF III trial.

Again, because non-inferiority depends on one’s arbitrary choice of the threshold fraction—the lower the fraction, the easier it is to establish non-inferiority—a sensitivity analysis is best performed to define the robustness of this choice. On the basis of this analysis shown in Figure 5for the SPORTIF V trial, non-inferiority is not supported at the conventional fractional threshold of 0.5 but only for fractions ≤0.2. Although superiority of ximelagatran to placebo (assessed at 0 fraction) is established, there is some suggestion that warfarin is actually superior to ximelagatran (assessed at 1.0 fraction; one-sided p = 0.06).

The fraction of the warfarin effect retained by ximelagatran can also be estimated from Figure 5and is equivalent to 0.67 (0.19 to 1.10), corresponding to the 50th (p = 0.5), 2.5th (p = 0.025) and 97.5th (p = 0.975) percentile of the distribution, respectively. With a different approach, the Hasselblad and Kong method (12), the fraction retained is estimated to be 0.67 (0.37 to 0.97). Because the CI of both of these estimates contains the pre-specified fractional threshold of 0.5, the data do not support a claim of non-inferiority. Thus, in contrast to the “official” conclusion regarding the SPORTIF V trial, these fractional analyses are inconsistent with a claim of non-inferiority relative to warfarin.

### Bayesian analysis of non-inferiority

The Bayesian approach to hypothesis testing can be employed for determination of non-inferiority. Briefly, normal posterior distributions are derived with the log mean risk ratio (μ) and its standard deviation (σ) according to Bayes’ theorem, which states that the probability for the hypothesis (non-inferiority), given the evidence (the “posterior”), is proportional to the probability for the evidence, given the hypothesis (the “likelihood”) times the probability for the hypothesis independent of the evidence (the “prior”) (23–25).

Essential to Bayesian analysis is the choice of prior and the weight assigned to that prior (23,25). Briefly, priors range from the uninformative, which impart no information (expressed mathematically as μ = 0, σ >>1), to the skeptical, which express cautiously reasonable skepticism about efficacy of the new treatment, to the informative, which impart substantial information from previous clinical trials. The uninformative prior has the least influence on the analysis; inferences on the basis of it are equivalent to the conventional frequentist results. The informative prior is especially helpful if the previous clinical trial from which it is derived closely resembles the current non-inferiority trial. Any differences in patient characteristics, study protocols, or outcome assessment between the current and historical trial can be accounted for by discounting the latter relative to the former by varying the proportion or weight assigned to the prior information (23). The influence of these choices on the resultant probability of non-inferiority can be assessed through sensitivity analysis. An analysis that is insensitive to the choice of prior indicates a greater degree of stability in the resultant inferences.

The advantages of the Bayesian approach, and its applications to non-inferiority trials, are reviewed in greater detail elsewhere (23–25). In general, Bayesian analysis replaces a categorical (yes/no) non-inferiority judgment with a continuous probability statement relative to the non-inferiority hypothesis. Accordingly, the probability of non-inferiority relative to any assumed marginal or fractional threshold can be computed, and non-inferiority is thereby inferred at a posterior probability of ≥0.975 (corresponding to a conventional one-sided p ≤ 0.025).

### Bayesian analysis of the SPORTIF trial

Figure 6shows posterior probability distributions for the SPORTIF V trial with three different priors, as detailed in the figure legend. The probability of non-inferiority with the pre-defined risk ratio margin of 1.65 (equivalent to a log risk ratio of 0.5) is 0.804, 0.913, and 0.999 with an uninformative, skeptical, and informative prior, respectively. Thus, non-inferiority is established (posterior probability ≥0.975) only when prior information from the SPORTIF III trial is used. The probability of non-inferiority is directly proportional to the magnitude of the margin and the weight of prior information and exceeds the threshold of 0.975 for all weights (0 to 1) with the investigators’ absolute difference margin of 2% (1.65 risk ratio) but only for weights >0.4 for a conservative absolute difference margin of 0.68% (1.22 risk ratio; i.e., 40% or higher portion of data from the SPORTIF III trial is required to establish non-inferiority at a conservative marginal threshold in the SPORTIF V trial). Thus, the lower the prior weight, the lesser the dependence on prior studies and the stronger the evidence of non-inferiority.

## Discussion

Two large non-inferiority trials have concluded that “ximelagatran is at least as effective as warfarin” in preventing stroke and systemic embolism in patients with nonvalvular AF (4–6); however, our re-assessment reveals several limitations in the design and analysis of the data that refute the investigators’ interpretation and point to a contrary conclusion. On the basis of the pivotal double-blind SPORTIF V trial, there is, in fact, very little evidence that ximelagatran is non-inferior to warfarin, unless one uses a liberal non-inferiority margin that is not supported by historical studies. The best case scenario indicates a negligible benefit of ximelagatran over warfarin—a 0.13% absolute or 9% relative risk reduction and 110% retention of warfarin’s effect. In contrast, the worst case scenario reflects a 1% absolute loss of benefit or a >2-fold relative increase in risk and ≤20% preservation of the warfarin effect.

This loss in efficacy would have been tolerable if ximelagatran had been shown to have other noteworthy benefits (less toxic, less costly, easier to administer) that outweigh the seeming loss of efficacy; however, this is not the case, owing to major safety concerns over increased liver toxicity and increased withdrawal rate associated with ximelagatran in comparison with warfarin. Even though the incidence of major bleeding is numerically greater with warfarin, the difference is not statistically significant. Thus, the potential pharmacologic advantages and ease of administration of ximelagatran without the need for monitoring or dose titration are offset by reduced efficacy, increased safety concerns, and potentially increased cost. This imbalance in the benefit-risk and potentially benefit-cost profile challenge the investigators’ claim of non-inferiority of ximelagatran and led the FDA against recommending ximelagatran for approval.

### Critical issues in the design and analysis of non-inferiority trials

The results of the SPORTIF trials highlight several fundamental issues in the design and analysis of non-inferiority trials, as summarized in Table 2. The choice of the non-inferiority margin is a key step. Several points are worthy of consideration with respect to the non-inferiority margin in the SPORTIF trials.

First, one might argue that the 2% margin was unreasonably generous and potentially biased toward non-inferiority, given the low baseline event rate in this study. If the investigators had chosen a smaller, more conservative margin, they would not have drawn the conclusion that ximelagatran was non-inferior to warfarin.

Second, there is uncertainty about the magnitude of the warfarin effect because of the variability between the five historical trials in terms of the design and the observed results. Only two of the five trials were double-blind (Fig. 1), and four were stopped prematurely because of significant benefits observed on interim analysis (18–21). Given this variability, a random-effects meta-analytical model, which allows for differences in treatment effects between studies, would have provided a reliable estimate of the warfarin effect compared with the fixed-effects model or the pooled analysis.

Third, the decision to employ absolute or relative difference as the basis for judgments regarding non-inferiority is arbitrary. In general, relative differences provide more conservative thresholds than absolute differences when event rates are changing and/or unpredictable (as in the SPORTIF trial, where the observed event rates were lower than the assumed rate) owing to differences in patient populations or new modalities of treatment (8,11). Accordingly, an analysis of the SPORTIF V trial on the basis of risk ratio is inconsistent with the “official” analysis on the basis of absolute difference, the observed upper bound of 2.1 being greater than the non-inferiority risk ratio margin of 1.65. Non-inferiority becomes even more difficult to establish with more conservative relative margins. In such cases, a judgment of non-inferiority would be more confident if analyses on the basis of absolute and relative difference were concordant.

Finally, the impact of active control event rate and non-inferiority margin on sample size is quite substantial. For a relative risk margin of 1.65, the total sample size required to ensure 90% power increases from 3,156 in the SPORTIF V trial at an expected warfarin event rate of 3.1%/year to 4.875 at the pooled historical warfarin rate of 1.9% per year and to 8,190 at the actually observed warfarin rate of 1.2%/year. More conservative margins would also require greater sample sizes at any given warfarin rate with the sample size increasing as the reciprocal of the square of the margin (13,14). Thus, both SPORTIF trials were arguably underpowered (resulting in a high “false-negative” type II error) to determine the relative efficacy of ximelagatran versus warfarin, given lower than expected warfarin event rates.

Although the size of the margin is determined by trial logistics (the larger the margin, the smaller the trial), a potentially serious consequence of choosing liberal margins is “biocreep,” a well recognized phenomenon that can occur when a slightly inferior treatment becomes the active control for future non-inferiority trials and so on until the active control becomes no better than a placebo (13,14). Ideally, stringent margins on the basis of the best comparator should be used to enhance the strength and credibility of non-inferiority trials; however, such stringent margins often result in large sample sizes that render the trials impractical. Reconciling these two important considerations of feasibility and stringency poses a substantial challenge. In this article, we have shown how a sensitivity analysis across a range of margins, from liberal to clinically relevant to conservative (reflecting the core philosophies of the sponsor, practitioner, and regulator, respectively), might provide useful insights.

Another key aspect of the non-inferiority inference is its reliance on two critical assumptions: assay sensitivity defined as the ability to detect differences between treatments if such differences exist; and constancy, which assumes that the historical difference between the active control and placebo will be constant in the setting of the current active control trial if a placebo control had been used (8,10–15). The validity of these two key assumptions, however, cannot be verified directly. Assay sensitivity is affected by poor trial design and conduct that does not ensure maximal compliance, minimization of protocol deviations, and outcome misclassifications (Table 2). The constancy assumption cannot be plausibly demonstrated because of differences with respect to patient characteristics, concomitant medications, intensity of treatment, and other key design features (12–15). Given this limitation, it seems reasonable to raise the standard of evidence required for the establishment of non-inferiority.

Both marginal and fractional analyses are considered to be forms of “discounting” to raise the standard of evidence (15). The “50% rule” endorsed by the FDA represents a form of “double discounting,” in which preservation of a fraction of the active control effect is applied to the non-inferiority margin to make it “suitably conservative” (16). The fractional approach addresses the issue of constancy by discounting the historical data when the event rates are dissimilar in the current and historical trials (15). In the SPORTIF V trial, the observed warfarin rate of 1.2% was nearly 50% lower than the historical rate of 1.9%. Thus, a proper discounting via fractional analysis would have minimized the type I error and led the SPORTIF investigators away from an erroneous conclusion of non-inferiority.

### Implications

One’s choice of the statistical approach to inference has important implications in the interpretation of non-inferiority trials. In this context, the Bayesian approach offers a number of advantages over the conventional frequentist approach (23–25). Chief among them is the integration of prior information with the empirical data to upgrade the evidence. Any degree of heterogeneity between the current and prior trial can be corrected by varying the proportion or weight assigned to the prior information (23). For example, there is substantial heterogeneity in the primary outcome between the SPORTIF III and SPORTIF V trials (p = 0.03) (5) that might be related to bias due to lack of blinding in the SPORTIF III trial or to other confounding factors such as significantly greater degree of concomitant aspirin use in the ximelagatran group in the SPORTIF III trial (4,5). Bayesian analysis supports the appropriateness of lowering the marginal threshold for non-inferiority via incorporation of prior information, thereby strengthening the evidence in favor of non-inferiority. It is exactly in this way—by taking optimum advantage of the available prior information—that the Bayesian approach offers a major advantage over the frequentist approach. Ideally, robust conclusions regarding non-inferiority should be on the basis of concordant analyses with both approaches.

In conclusion, a variety of subtle assumptions challenge the design, analysis, and interpretation of non-inferiority trials. Among these are the arbitrary thresholds employed for the characterization of “non-inferiority” and the use of historical controls to derive the effect of the new treatment relative to a hypothetical putative placebo. In the extreme, this trial design might result in a “regression toward mediocrity” whereby any treatment becomes non-inferior to another by suitable choice of the underlying assumptions. In general, if such trials are to be applied to clinical and regulatory decisions regarding the marketing and use of new treatments, the underlying assumptions must be made explicit and their influence on the resultant conclusions must be assessed rigorously via sensitivity analyses. Thus, when these sensitivity analyses were applied to each of the key assumptions underlying the recently reported SPORTIF trials, they materially undermined the authors’ conclusion regarding the non-inferiority of ximelagatran relative to warfarin in the management of patients with AF.

- Abbreviations and Acronyms
- AF
- atrial fibrillation
- CI
- confidence interval
- FDA
- Food and Drug Administration
- SPORTIF
- Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation

- Received May 21, 2005.
- Revision received July 6, 2005.
- Accepted July 11, 2005.

- American College of Cardiology Foundation

## References

- ↵
- ↵
- ↵
- ↵
- Halperin J.L.

- ↵
- ↵Lawrence J, Hung J, Mahjoob K, reviewer. Statistical review and evaluation, clinical studies, NDA 21-686 (2004). FDA web site. Available at: http://www.fda.gov/ohrms/dockets/ac/04/briefing/2004-4069B1_07_FDA Backgrounder-C-R-stat%20Review.pdf. Accessed October 10, 2004.
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵International Conference on Harmonisation. Statistical principles for clinical trials (ICH E 9) (1998); International Conference on Harmonisation. Guidance on choice of control group and related design and conduct issues in clinical trials (ICH E 10) (2000). Food and Drug Administration, Department of Health and Human Services. Available at: http://www.fda.gov/cder/guidance/index.htm. Accessed October 17, 2005.
- ↵
- Conolly S.J.,
- Laupacis A.,
- Gent M.,
- Roberts R.S.,
- Cairns J.A.,
- Joyner C.

- ↵
- Stroke Prevention in Atrial Fibrillation Investigators

- Petersen P.,
- Boysen G.,
- Godtfredsen J.,
- Andersen E.D.,
- Andersen B.

- ↵
- Diamond G.A.,
- Kaul S.