Author + information
- Received May 21, 2004
- Accepted July 24, 2004
- Published online December 21, 2004.
- Jeffrey S. Borer, MD* ()
- ↵*Reprint requests and correspondence:
Dr. Jeffrey S. Borer, New York-Presbyterian Hospital Weill Cornell Medical Center, 525 East 68th Street, New York, New York 10021
The Food and Drug Administration (FDA) is responsible for assuring that drugs, devices, and biologicals available in the U.S. are effective and acceptably safe for their intended uses. Both law and regulation define the procedures to be followed by the FDA in judging the effectiveness and safety of therapies. The FDA comprises a cadre of highly skilled public servants who receive and evaluate alldata collected by the manufacturer during therapy, not just the portion that reaches publication. To assist in reaching final conclusions about approvability, the FDA can empanel legally constituted advisory committees and external consultants when the need is perceived for additional specific scientific/technical expertise and substantial experience in clinical practice. Evidentiary standards for marketing approval of drugs, biologicals, and devices generally require direct demonstration of clinical benefit, rather than inferences drawn from “surrogate” pharmacologic/device-mediated effects, sufficient exposure to enable a reasonable assessment of countervailing risk, consideration of specific design elements in the pivotal clinical trials (including prespecified hypotheses [implicitly incorporated in “primary end points”], rigorous plans for statistical analyses, and so on), and assessment of persistence of effectiveness and associated stability of safety over time. Finally, sufficient information must be available so that practitioners can receive written instructions for use (the label) adequate to support the likelihood that recipients of the therapy will receive the expected benefits within the envelope of the stated risks. This article will discuss and expand on these issues, with examples.
The Food and Drug Administration (FDA) is responsible for ensuring that drugs, devices, and biologicals available in the U.S. are effective and acceptably safe for their intended uses. At first glance, the bases for assessing effectiveness and safety might seem intuitively obvious; indeed, operational definitions for effectiveness exist and are discussed subsequently. However, both conceptually and in regulatory considerations, effectiveness and safety are inextricably intertwined: acceptable safety must be judged in relation to the importance of the benefit, a subjective assessment, and to the availability of alternative therapy.
The law requires that, before therapeutic availability by prescription or “over the counter” without prescription, the FDA must apply a benefit/risk standard to approve any new drug (in this context, a product [usually but not necessarily synthetic] claiming health benefits from diagnosis or treatment based on direct or indirect interaction with intermediary metabolism of the beneficiary), any device (a synthetic product claiming similar benefit from physical interaction with the beneficiary), or any biological (a derivative or product of a living organism other than its donor or, if autologous, undergoing substantial processing before administration). The FDA's approval is based on evaluation of a New Drug Application, Pre-Marketing Approval, or Biologics Licensing Application from the sponsor (producer) supporting its use. The application includes a proposed label describing the product's characteristics and directions for use. This must also be approved by the FDA. Before any new therapeutic can be studied in people, the preclinical pharmacology/biology/actions, the general outline of the proposed human studies, and the development plan must be described in an Investigational New Drug application for human studies (drugs, biologicals) or an Investigational Device Exemption for devices. The FDA can prevent any or all of the proposed human studies if it perceives safety concerns. During development and after approval, the FDA continually reviews and, if necessary, stops studies or withdraws approval on the basis of adverse events (AEs), focusing especially on AEs meeting an operational definition of seriousness (serious [S]AEs). Allowable AE reporting intervals are strictly defined and enforced.
Riding herd on this effort is a Herculean task, in part because of the rigorous scope of FDA reviews: alldata collected by the manufacturer must be tabulated, submitted to, and evaluated by the FDA. Although it is not necessary for the FDA to review each source document (raw data), these must be available for audit/sampling. Evidence of complete and appropriate data collection must be provided from properly documented audits; “Good Clinical Practice,” “Good Laboratory Practice,” and “Good Manufacturing Practice” standards must be employed in all research upon which FDA approval is based. By federal regulation, approval requires evidence of efficacy from adequate and well-controlled clinical studies, as well as safety acceptable for the intended use defined from adequate exposure. Until 1997, the plural “studies” was generally interpreted as requiring at least two independent clinical trials. However, by that time, progressively larger clinical trials had emerged, some focused on major adverse outcomes (death, stroke) that rendered study repetition ethically tenuous and/or impractical. Although the FDA accepted single persuasive trials as a basis for approval in such cases, the 1997 law explicitly provides FDA authority in certain circumstances to approve a therapy based on a single trial if adequate “confirmatory evidence” is available; the nature of the confirmatory evidence is not legally specified. The law also requires performance of all tests of safety that are “reasonably applicable.”
This summary of approval requirements masks far more complex FDA judgments based on principles often very different from those that influence unfettered academic research or clinical practice. The scientific method underlies problem solving in each area, but the latitude permitted in interpreting the results of scientific inquiry differs widely in the three circumstances and is most stringent in the area of therapeutics regulation.
This article describes my perception of some of these limitations on interpretation. In so doing, it aims to elucidate the process and principles by which cardiovascular therapeutics and, most specifically, cardiovascular drugs are approved. The same overarching principles apply to approval of devices and biologicals that are increasingly important in cardiovascular therapy, although characteristics specific to these other modalities impose unique limitations, which I will briefly review.
Twenty-seven years ago, as a young Senior Investigator and Head of Nuclear Cardiology at the National Heart, Lung, and Blood Institute (National Institutes of Health [NIH], Bethesda, Maryland), I was offered the opportunity to join the Cardio-Renal Drugs Advisory Committee of the FDA when more senior NIH investigators were unavailable. As a result, for three decades, I have been continually involved in an extraordinarily gratifying professional activity, serving three separate terms on the Advisory Committee, each culminated by terms as Committee Chairman, with service as FDA Consultant during the intervening periods. In recent years, I have also acted as a guest member of the Circulatory System Devices Panel and the Biological Response Modifiers Committee. From these activities, I have learned about the unique discipline of therapeutics development as modulated by regulatory principle, about clinical pharmacology, and about evidence evaluation, all from full data sets and primary sources far more complete than those that reach publication. (As suggested by Peter Medawar [via Alvan Feinstein], “… the careful organization given to the published material does not reflect the way things happened… . After conquering [his/her] ignorance, the scientist … may be reluctant to discuss how much ignorance” needed to be overcome (1); to the FDA, information on the ignorance is also important.) Most importantly, though, I have had the unique satisfaction of participating in deliberations that have overwhelming implications for the health of the American people. This has been undertaken in conjunction with FDA officials who manifest extraordinary dedication, skill, and professionalism in assimilating, evaluating, and acting on an overwhelming mass of data in order to balance the need of guarding the public health with timely approval (or rejection) of new therapeutics, meet the requirements of scientific rigor, and provide fairness to commercial sponsors. To all of these men and women, we each owe incalculable appreciation and admiration.
Personnel: Officials, advisors, and consultants
The FDA comprises full-time experts in basic and preclinical pharmacology, clinical pharmacology, toxicology, statistics, and more. Most have previous clinical and/or research experience. After several years of reviewing Investigational New Drug/New Drug Applications, Pre-Marketing Approval, or Biologics Licensing Applications and the associated literature, these officials are highly knowledgeable regarding the therapeutic areas under their purview (including, for the Cardio-Renal Drugs Division, hypertension, ischemic heart disease, heart failure, arrhythmia, renal disease, and pulmonary hypertension). When appropriate, they can call on the expertise of other FDA Divisions and Centers. What, then, is the need for Advisors and Consultants? Primarily, to provide perspective gained from ongoing contemporary clinical and consumer experience, generally not available from full-time regulators because of their workload. Also, although FDA officials review and analyze massive amounts of data, they do not produce the data; value is gained from the perspective of Advisors involved in research like that under review.
The FDA and clinical practice
The FDA clearly plays a fundamental role in defining medical practice. However, though FDA decisions determine the pharmaceutical, device, and biological therapies (and many of the diagnostic tools) available for patient care, these decisions refer only to uses the sponsor proposes for marketing. In this sense, the FDA is a reactivebody. Although the FDA is increasingly involved with the details of developing therapeutics and often influences decisions about specific components of trial design such as outcome measures, it does not initiate therapeutics development. More importantly, the FDA is not more broadly charged with definition of practice standards. Therapeutics approval and labeling define the benefit (the “indication”) that can generally be expected from therapy, the characteristics of patients who can generally be expected to benefit, the administration regimen(s) likely to provide the benefit with acceptable relation to risk, and the adverse experiences that may be associated with using the therapy. Approvals are generally conservative: they focus only on one or more very carefully defined indication(s) supported by well-designed clinical trials that provide statistically persuasive evidence. Thus, evidentiary standards for FDA approval imply a certain degree of scientific and statistical rigor. However, except when safety considerations result in the recommendation for use only if safer therapies are intolerable or inadequate to provide the desired benefit, or when study designs and results clearly support a sponsor-submitted claim of superiority of one therapy over another, the FDA does not prioritize among approved treatments for a given indication. The FDA does not recommend the use of any specific therapy, does not comment on the relative cost of different treatments, and does not sanction the use of treatments outside the labeled indications or doses, but it does sanction sponsors who advertise outside the approved label. Recommendations for treatment and the prioritization of therapies are the purview of professional societies or “guidelines” panels, generally comprising small groups of experienced clinicians/researchers with expertise in a specific disease area, whose consensus recommendations are usually based on personal experience and published literature rather than on the far more complete data available to the FDA, and who are not held to any predetermined standard of statistical or scientific rigor. Because patients must receive the best possible care despite recognized deficiencies in supporting data, consensus panels function in part to provide reasonable recommendations when rigorously defined supporting data are not available. Thus, their function is fundamentally different from that of the FDA. It is entirely conceivable (and regularly true) that “guidelines” include recommendations for use of drugs in settings and for indications not considered and sometimes even specifically rejected by the FDA. Thus, the occasional argument by sponsors seeking new indications for already approved drugs that “the horse is out of the barn,” meaning that clinical practice has moved beyond the FDA, generally has little impact on approval decisions. Because of the FDA's legal mandate and the legal implications of its decisions, the FDA must require well-established evidentiary standards for its imprimatur. Sanctions on physicians are not the responsibility of either the FDA or consensus panels, but rather of the courts and, increasingly, of third-party payers; sometimes such sanctions are based on the use of drugs or other treatments for “off-label” indications (i.e., to achieve benefits not confirmed by the FDA).
For approval, drugs must provide one or more specifically defined benefits, benefits must outweigh associated risks, and instructions for use must adequately inform legally constituted prescribers. Although the potential cost reduction in drug development that could result from using “surrogate end points” is currently subject to considerable discussion among drug developers and academic researchers, experience in cardiovascular medicine has taught caution in relying on surrogates to predict clinical benefit. An illustration of the pitfalls associated with surrogates is provided by the Cardiac Arrhythmia Suppression Trial (CAST) (2), which demonstrated that anti-arrhythmic therapy applied after myocardial infarction to suppress premature ventricular contractions (believed to be pathophysiologically similar to and predictive of lethal arrhythmias) resulted in greater mortality than withholding such treatment. Thus, with the single exception of hypertension relief (a surrogate for vascular event reduction), the Cardio-Renal Division requires that clinical benefit, not an effect on a surrogate end point, must be demonstrated by direct measurement. (Low-density lipoprotein cholesterol lowering, a surrogate for atherosclerosis reduction and favorable effects on acute myocardial infarction and survival, is an accepted basis for approval by the FDA's Metabolic-Endocrine Division, which also recognizes decreases in glycosylated hemoglobin level and blood glucose control as measures of diabetes control that support approval.) Clinical benefits fall into two broad categories: 1) reduction in bothersome symptoms (“feeling better”); and 2) increased survival duration and/or reduction in generally recognized major morbid events. Clinical benefits must be distinguished from pharmacologic effects that may underlie, may be expected to lead to, or may vary with the benefits. For several reasons, pharmacologic effects are generally not acceptable bases for approval, even if there is a strongly held belief about the relation between a pharmacologic effect and a clinical benefit. These include, first, that almost all drugs have multiple pharmacologic effects; we may not understand the clinical impact of some and may not even be aware of others. Research and development tends to focus on pharmacologic effects believed likely to underlie clinical effects, not on those for which such a relation is not perceived. Of course, no evaluation can be undertaken of pharmacologic effects not recognized to exist; however, the clinical effect of a drug is the net effect of allits pharmacologic effects, whether understood, misunderstood, or unknown. Only a direct measurement of clinical effect integrates the impact of all pharmacologic effects (3,4). For example, angiotensin-converting enzyme (ACE) inhibitors are known to have at least 10 pharmacologic effects. For several ACE inhibitors, clinical trials show that the net effect is beneficial for most patients with hypertension or heart failure. However, experimental data suggest that ACE inhibition may be deleterious in the setting of primary mitral regurgitation, where effects on the specific form of extracellular matrix remodeling induced by the drug may potentiate loss of contractility (5). This concern can be resolved only by a clinical trial. Secondly, even when a group of drugs (“class”) shares many structural and pharmacologic features, intragroup molecular variations, albeit modest, can cause important variations in pharmacologic effects and in dose-response relations for different effects. For example, different clinical effects have been discerned among beta-blocking drugs, depending on whether they manifest intrinsic sympathetic agonist activity, on their relative selectivity for beta-receptor subtypes, etc. Among Ifcurrent blockers now gaining interest as anti-anginal/anti-ischemic drugs (effect attributed to isolated heart rate slowing), an early prototype, zatebradine, caused reversible but unacceptable visual symptoms, presumably because similar ion channels subserve effects in the sino-atrial node and in the retina, whereas another, ivabradine, has prevented angina with far more modest visual symptoms (6). Thirdly, although it is inviting to ascribe clinical benefits to specific pharmacologic effects, the perceived basis of clinical effects changes as fundamental pathophysiology and pharmacology are increasingly clarified. For example, although all hydroxymethyl glutaryl coenzyme A reductase inhibitors (statins) tested to date have been effective in reducing ischemic events and therefore, it has been plausible to ascribe their effects solely to cholesterol lowering, recent data also tend to support the impact of drug-mediated reduction in plaque inflammatory activity (7,8).
Surrogate end points are laboratory tests or clinical measures, such as blood pressure, that are believed to relate invariably to clinical outcome but are not themselves clinical benefits (4). To be an acceptable basis for drug approval, the relation between the surrogate and clinical outcome must be constant. Surrogates and interventions must not interact (i.e., surrogate and clinical outcome both must change similarly irrespective of the form of intervention); improvement in the surrogate invariably must lead to clinical benefit. Historically, variations in blood pressure have been closely related to variations in the risk of stroke, with less clear effects on myocardial infarction and heart failure. Also, despite possible differences in some effects of interventions noted in the recent Antihypertensive and Lipid-Lowering treatment to prevent Heart Attack Trial (ALLHAT) (9), all interventions that reduce blood pressure, irrespective of pharmacologic class, have had directionally similar effects on events. Consequently, blood pressure reduction has been accepted as a surrogate for clinical benefit, and drugs are approved if blood pressure reduction is demonstrated. An antihypertensive drug could have additional benefits (e.g., reduction in rate of deterioration of renal function in type II diabetics by losartan and irbesartan but not by an equi-effective antihypertensive dose of amlodipine) or improvement in some heart failure outcomes by certain antihypertensives, but these benefits must be demonstrated directly for approval to be granted for such an indication. In contrast to blood pressure, left ventricular ejection fraction for heart failure outcomes, ST-segment depression on the exercise electrocardiogram (ECG) or on the 24-h ambulatory ECG, or radionuclide-based measures of ischemia for ischemic event risk, among other measures, have been proposed as surrogates, but have either failed to manifest an invariant relation between the effect of the intervention on the surrogate and on the outcome or have not been studied with sufficient interventions to allow confidence that such an invariant relation exists. The search for surrogates continues, driven by the economics of drug development and the realities of trial conduct. The success of multi-drug, multi-modality therapy in reducing the risks from major cardiovascular diseases requires conducting progressively larger (and costlier) clinical trials to demonstrate relatively small but potentially clinically important incremental benefits. Resulting costs increasingly threaten further therapeutics development.
Study design: Efficacy assessment
Evidence of efficacy requires a rigorous comparison between a new therapy and either no therapy, treatment with an agent already approved for a similar purpose, or treatment with different doses of the new agent. Regulations do not specify the form of such comparisons and allow for nonrandom and even “historically controlled” trials. However, because of the potential for unintentional bias and confounding, cardiovascular drug approvals are currently based solely on contemporaneous comparisons using randomized, usually double-blinded study designs. In theory, a clinical entity might have outcomes sufficiently predictable so that historical controls, alone, could be used (e.g., anuric renal failure, inevitably rapidly fatal without dialysis); FDA approval of cardiovascular therapies has not been sought for such entities. However, although “non-inferiority” trials (discussed subsequently) are based on contemporaneous comparison of a new treatment with an active control, historical information must be employed in defining the extent of non-inferiority that must be precluded (it cannot be larger than the effect of the control).
Several study designs may be appropriate for comparisons; selection is based on multiple factors, including the benefit expected. For example, if symptom reduction is sought, parallel-arm or cross-over studies might be acceptable; the parallel comparison might even come at the end of a period in which all patients received the new treatment (“randomized withdrawal”), a very attractive strategy in certain situations (10). If natural history alteration is anticipated, only parallel-arm studies may be appropriate.
Data interpretation is least ambiguous when a drug is compared with placebo without any background therapy. However, in the present era, to achieve requisite population sizes, it may not be possible to maintain placebo therapy alone for sufficient time to study the drug effects on certain end points with adequate statistical power to be likely to demonstrate a drug effect, even if it truly exists. Therefore, comparisons with placebo are often carried out with background therapy, sometimes involving several drugs and varying among subsets within the population. This may confound data interpretation but, given the proliferation of treatments for many conditions and the ethical and/or practical necessity of applying at least some of these, the problem is unavoidable. The potential impact of background therapy can be evaluated statistically, albeit imperfectly unless background treatment is uniform or study designs include complicated (and often impractical) stratification and balancing schemes.
Comparison can be undertaken with a labeled dose of an approved drug, aimed at demonstrating superiority of the new agent. Although sometimes successful (and providing unambiguous evidence of effectiveness when it is), this strategy minimizes the likelihood of demonstrating a benefit from the new therapy (11). Therefore, trials can be designed to demonstrate “non-inferiority” of one regimen to another (i.e., to show that a defined amount of the effect of the control agent is retained by the new therapy). The effect of the control agent in the non-inferiority trial is inferred from the results of earlier trials that compared the control agent with placebo.
The FDA has suggested that the acceptable difference between a new drug and active comparator should be less than the difference between the effect of the comparator and the upper boundary of the confidence interval of the placebo effect in the historical comparison; however, other regulatory authorities (and the FDA) may allow greater flexibility in defining “non-inferiority” standards. For optimal application, this approach requires an extensive comparison of the active control and placebo, so that the incremental effect of the active control is reasonably well defined. Unfortunately, for most approved treatments, the point estimate of the effect versus placebo has fairly broad confidence intervals. Thus, with few exceptions, assumptions underlying non-inferiority trials involve considerable uncertainty. In the U.S., an efficacy claim supported solely by non-inferiority trials would require additional support of some sort but, as in all other cases, decisions would be importantly influenced by situation-specific data, including confidence in the anticipated effect of the control versus placebo. Approval for clopidogrel was based on a single trial in which non-inferiority was persuasively demonstrated versus aspirin (for which the effect versus placebo was relatively well defined historically). In this trial, clopidogrel actually demonstrated nominal (p < 0.05) superiority to aspirin (12). Generally, this would be considered insufficient evidence to support approval from a single trial (see subsequent Statistics section), but because of the known effect of aspirin, clopidogrel was judged to be clearly effective (i.e., superior to placebo).
A comparison of the new agent with itself at different doses (i.e., demonstration of a dose-response relation) indicates drug efficacy if, within the proposed range of administration, higher doses cause progressively greater beneficial effects.
As inferable from the 1997 law, support for the efficacy of a new treatment might result from experience with other approved drugs of the same class or with some similar properties, perhaps allowing approval based on a single study without a particularly high level of statistical significance. During the past three years, the Cardio-Renal Advisory Committee has suggested on multiple occasions that, in most instances, such support is at best modest, primarily because of the previously noted arguments favoring direct testing of clinical efficacy rather than extrapolating clinical effects from pharmacologic effects. However, depending on the specific characteristics of the parallel treatments and of the relevant data, outcomes with similar therapies might affect the strength of evidence expected from the primary (pivotal) trials of a new therapy or might help justify acceptance of a large single trial demonstrating effects considered particularly beneficial.
Study design: Efficacy persistence
Usually it is assumed that drug regimens for long-term use (though not necessarily those intended for single, short-duration or repeated transient use) will maintain their efficacy indefinitely, although this presumption has never been tested. To provide evidence of reasonable effect persistence, drugs aimed at chronic symptom relief generally must demonstrate efficacy at least three to six months after treatment initiation (interval depending on the disease); this duration is based largely on empirical observations on the predictive value of effect persistence over this interval and is subject to change with new empirical data. The three- to six-month standard also is driven by the observation that drugs for long-term use sometimes do not achieve a maximal effect until many weeks after initial administration and by the need to allow reasonable exposure for detection of certain types of AEs.
Assessment of efficacy persistence for symptom reduction (and concomitant evaluation for “rebound phenomena,” discussed later) is best achieved with randomized withdrawal from treatment at the conclusion of an appropriate interval. This study design feature can be applied after prolonged open-label active therapy, thus minimizing the practical difficulty and costs potentially associated with randomized, double-blinded comparisons of similar duration, and can provide persuasive evidence of effectiveness as well. For example, for the Ifcurrent inhibitor, ivabradine, efficacy in angina prevention was confirmed, and a lack of pharmacologic tolerance or rebound was demonstrated by randomized withdrawal at the conclusion of a trial that began with a two-week, placebo-controlled, parallel-arm comparison of multiple ivabradine doses, followed by two to three months of open-label treatment of all patients receiving the highest ivabradine dose (6).
For treatments intended to alter survival and/or to minimize major morbid events, no specific standard exists for effect persistence. The basis for approvability would depend on the expected natural history of the specific disease, whether the treatment is intended for application during acute events or for a chronic condition, as well as on the specific benefits and risks expected. In general, for most chronic conditions, evidence of survival improvement and/or major morbid event reduction for at least one year is presented, often including information on many patients studied for several years. Generally, there should be no substantial narrowing of the gap between treatment and comparator during the interval of observation, but this pattern probably reflects the specific nature of the claims that have been sought and by expectations of the scientific community and Advisory Committees and is not mandated by the FDA.
A related concern is that of rebound phenomena. Unfortunate experience 30 years ago revealed that sudden cessation of short-acting beta-blockers in patients with ischemic heart disease is associated with a modest risk of myocardial infarction or sudden death. Now, some assessment of the effect of stopping a drug is expected, specifically to detect rebound.
Unless special circumstances suggest the prudence of more intensive scrutiny (concerns based on fundamental pharmacologic properties, adverse findings from animal studies, the likelihood of co-administration with other drugs that might interact adversely with the new agent, all of which also might lead to requests for studies in special populations), in accordance with recent international regulatory harmonization agreements, the Cardio-Renal Division expects exposure of at least 1,500 to 2,000 patients to a new drug product before approval, with 300 to 600 exposed for six months or more and at least 100 patients exposed for one year or more. Past experience with certain types of treatments may suggest the need for treatment-specific safety data for certain outcomes; sometimes these requirements encompass all drug classes. On the basis of extensive empirical evidence linking ECG QT prolongation with sudden death and torsade de pointes arrhythmias, the FDA now requires evaluation of potential QT effects of all new drugs.
After approval, voluntary physician reports of AEs are regularly evaluated by the FDA. It is clear that relatively rare AEs, even if catastrophic, could easily be missed with the required pre-marketing exposure. However, the exposure standard is justified by the perceived balance between the risk of harm to individuals from missing a serious AE and the harm to the public from the likelihood that requiring substantially larger and longer pre-marketing exposure might suppress development of generally beneficial new therapies. Empirically, this system is acceptable: relatively few drugs have required post-approval withdrawal because of AEs first recognized after approval.
Statistical evidence of a drug-mediated benefit is expected to be strong, in accordance with the law. The usual standard for each study is statistical confidence that, if repeated 100 times, efficacy would be found in >95 studies (p < 0.05). On the basis of this standard, in the usual case, if two studies are positive, the likelihood is very low that an ineffective drug will be approved. The FDA has described evidence that may be considered in judging the persuasiveness of a single trial but, ultimately, such decisions are largely based on intuitive judgments. The Cardio-Renal Division has suggested that equivalence to two trials might be inferable if a single trial achieves a p value approximating that of p < 0.05 in two trials, roughly equivalent to p < (0.05)2/2 = 0.00125, but several drugs have been approved on the basis of single trials without achieving quite such a stringent standard when Advisors and the FDA found the trials and supporting evidence otherwise compelling. Safety needs to be demonstrated by the sponsor; however, no rigorous statistical standard exists to define a lack of safety. Alarming clustering of SAEs, without statistical significance, can result in withholding approval.
About 20 years ago, a labeling amendment was requested for short-acting nifedipine that included an indication for treatment of “hypertensive urgencies.” Three years earlier, the drug had been approved for prevention of angina due to vasospasm, as well as for typical effort-induced angina. Among other reasons, the Advisory Committee recommended denial of the amendment (the FDA concurred) because directions for use were inadequate: neither the proposed label nor the supporting evidence defined “hypertensive urgency,” the condition for which the drug would be applied. More recently, a labeling amendment was sought for aspirin, already approved for prevention of ischemic events in patients with established ischemic heart disease (secondary prevention); the sponsor proposed the additional indication of event prevention in patients without clinically evident ischemic disease but with a clinical profile suggesting ≥20% event risk over the succeeding 10 years (primary prevention). Supporting data included multiple placebo-controlled trials (different risk profiles) involving more than 55,000 patients. No individual trial prespecified the risk profile envisioned in the new label, and not all achieved statistical significance for event reduction. However, the data suggested that: 1) aspirin reduces ischemic events across populations of various risk profiles, probably including persons who would form the newly targeted population; and 2) major bleeding events, including transfusion-requiring hemorrhage and hemorrhagic stroke, occur with sufficient frequency that a benefit might be judged to exceed the risk only if the pre-therapy ischemic event risk is ≥2%/year. Advisory Committee concerns included the difficulty of creating an equation that weighs disparate benefits and risks appropriately. More importantly, the target population was defined by a Framingham Study event risk algorithm that, although intrinsically reasonable, had not been applied, even retrospectively, to the populations that produced the benefit/risk data presented to the Committee. The aspirin labeling amendment proposal exemplifies a fundamental regulatory concern that arises whenever treatments are considered for prevention of events in asymptomatic patients. In part, such claims are based on the assumption that, on average, the long-term benefit/risk relation will be better if treatment is begun before evidence of disease develops than if applied immediately after non-lethal evidence emerges, because with the latter strategy, some patients will die before treatment application. This assumption may be correct but was not tested in any of the trials in which aspirin was studied and seldom is formally assessed in other parallel situations. (Conversely, it is also possible that treatment primarily prevents early events, in which case a therapeutic benefit could be missed if treatment began only after such events had occurred.) The FDA's approval for a preventive indication has potential legal as well as public health ramifications based specifically on the population targeted for drug use in the label. Therefore, considerable rigor should be applied in designing the supporting studies justifying the targeting of a population for labeling.
Devices and biologicals
The general principles underlying drug approval apply to devices and biologicals. However, important specific modifications can affect interpretation of data. For example, randomizationto device versus no device may not be practical in certain settings, particularly when benefits are immediate and obvious, as in angina prevention with coronary angioplasty or prosthetic valve implantation for heart failure. Active control studies might detect major differences between devices; however, for more subtle differences, the number of device insertions performed in the U.S. each year may not provide study populations sufficiently large to adequately power either superiority trials of new devices versus approved comparators or rigorously designed non-inferiority trials during the relatively short interval available before engineering upgrades alter the device (see subsequent text). Comparative data may not be needed for certain important safety data (e.g., device failure rate), which could be acceptably defined from observational studies (e.g., registries). Parenthetically, if a strong consensus supports use of already approved devices—like prosthetic valves or coronary stents—for reduction in mortality or major morbid event rates (indications for which randomized trials have never been performed), supporting these claims with well-controlled trials also may be impossible: randomization may be ethical, but recruitment may be impeded by patient and physician bias. Blindingmay also be impossible for some evaluations. When device (or biological) insertion/administration involves substantial risk (e.g., invasive procedures), sham administration to enable blinding may not be ethically acceptable. If the device provides signals clearly perceptible to the senses (e.g., localized paresthesias from nerve stimulation), blinding of both patients and/or evaluators may be impossible. In these situations, objective outcome measures and other study design features must be carefully selected to minimize the impact of non-blinding. Device and biological safety concernsdiffer somewhat from those of drugs. Most (but not all) nonfatal drug-induced AEs are reversible with drug cessation. Removal or deactivation of devices and biologicals like gene and somatic cell therapies may be dangerous or impossible. Therefore, formal multi-year/multi-decade follow-up is a condition of approval, and reporting of major AEs is legally required (if not always complete). The FDA can withdraw approval and mandate recall if harm is found. For drugs, unless the form and bioavailability of active principals are identical, different formulations of the same drug may (though often do not) require separate efficacy/safety studies for approval. In contrast, device (and biological) designs are often altered during development; a paucity of study populations and the expectation of such improvements renders long-term pre-approval follow-up impractical. As a consequence, statistical evidence supporting approval of devices is often less than that expected for drugs, making experienced judgments particularly crucial.
The foregoing has summarized some of the principles underlying regulatory evaluation of therapeutics. Many important issues have not been discussed, including considerations relating to drug or drug-device combinations, selection of end points (particularly if composite), statistical concerns relating to “splitting alpha” for multiple primary end points, the potential for combining superiority and non-inferiority analyses in the same trial, the role of FDA guidance documents, the importance of defining a dose-response relation from minimally effective to maximally tolerated doses, the degree of processing that renders autologous biologicals subject to FDA approval, and others. However, most importantly, I hope this account has indicated the intensive effort to achieve scientific rigor, logic, and fairness that is central to the process of identifying effective and safe therapeutics to guard the public health. For this, all of us owe the FDA an extraordinary debt.
Dr. Borer is the Gladys and Roland Harriman Professor of Cardiovascular Medicine at Weill Medical College and is supported in part by an endowment from the Gladys and Roland Harriman Foundation, New York, New York.
- Abbreviations and acronyms
- angiotensin-converting enzyme
- adverse event
- Food and Drug Administration
- Received May 21, 2004.
- Accepted July 24, 2004.
- American College of Cardiology Foundation
- Feinstein A.R.
- Borer J.S.,
- Somberg J.C.
- Temple R.J.
- Nemoto S.,
- Hamawaki M.,
- De Freitas G.,
- Carabello B.A.
- Borer J.S.,
- Fox K.,
- Jaillon P.,
- et al.
- The ALLHAT Officers and Coordinators for the ALLHAT Collaborative Research Group