Author + information
- Received December 20, 1999
- Revision received March 27, 2000
- Accepted June 1, 2000
- Published online October 1, 2000.
- ↵*Reprint requests and correspondence: Dr. Ralph Stewart, CCU Research Suite, Green Lane Hospital, Private Bag 92 189, Auckland 1030, New Zealand
The study systematically compared different measures of ST segment depression from the treadmill exercise test.
The value of the treadmill exercise test for objectively measuring treatment effects is limited by random error in the measurement of ST depression and may be biased by regression to the mean or by the decision to terminate the test.
Treadmill exercise was performed in 21 subjects with ischemic heart disease 1 h after isosorbide dinitrate 10 mg or placebo in a double-blind randomized crossover study. A 12-lead electrocardiogram (ECG) was recorded every 30 s during and at peak exercise. The relative sample size needed to detect the nitrate effect was compared for different summary measures of ST depression.
The ST depression measured from a single unmatched lead at longest equivalent sub-maximal exercise needed the lowest sample size to detect the nitrate effect in paired comparisons (p = 0.000006). Averaging over multiple leads or times did not improve detection of the nitrate effect. The rate of increase in ST depression (in mm/min) calculated by linear regression needed a similar sample size (×1.32, 95% CI 0.62 to 2.58). A larger sample size was needed for ST depression at peak exercise (×2.9, CI 1.3, 11.1) and exercise duration (×4.5, CI 1.5, 38). Time to 1-mm ST depression was the least efficient measurement (relative sample size ×15.5, CI 1.6, >1000). Comparison of matched leads resulted in >2-fold differences in estimates of the nitrate effect because of bias from regression to the mean.
Maximal ST depression at longest equivalent sub-maximal exercise and the maximal rate of increase in ST depression had less bias and random variation than did other commonly used measures. The rate of increase in ST depression is preferred because it can be calculated in either paired or unpaired studies.
The treadmill or bicycle exercise test is widely used for the diagnosis and management of coronary artery disease. The diagnosis of myocardial ischemia is based on an assessment of the pretest probability and the presence or absence of >1 mm flat or downsloping ST segment depression during exercise or recovery (1,2). A quantitative assessment of the level of ST segment depression during exercise is of additional value to assess prognosis (3,4) and to measure the efficacy of antianginal medication, coronary angioplasty, or coronary artery bypass surgery (1,2). An important limitation of exercise testing is random variability in the measured level of ST segment depression, resulting from noise in the measurement and recording process. This random error reduces the sensitivity and specificity of the test for the diagnosis of myocardial ischemia and its ability to measure treatment effects reliably. Few studies have systematically assessed the effects of random variation in the level of ST segment depression or described methods to reduce its impact. Guidelines on exercise testing from the American Heart Association/American College of Cardiology (AHA/ACC) (1) and European Cardiac Society (2) refer to the use of standard and widely used measurements such as “time to 1.0 mm (or 0.1 mV) ST depression” and “total exercise time,” but limited consideration has been given to the variability of these measures.
The purpose of this study was to evaluate several methods for measuring ST segment depression during exercise testing. The goal was to identify measures that give an unbiased estimate of ST segment changes and have low variability. Different measures of ST segment depression are compared in a crossover comparison of isosorbide dinitrate and placebo in subjects with ischemic heart disease. The study design was chosen because the effects of nitrates on myocardial ischemia during exercise are clearly established (5).
The study population was 21 subjects with known ischemic heart disease. All participants had a prior treadmill exercise test that was positive for ischemia with at least 1 mm horizontal or downsloping ST segment depression during exercise. Exclusion criteria were a history of acute myocardial infarction or unstable angina during the preceding two months, known aortic stenosis (gradient >20 mm Hg), left main stem disease (diameter stenosis >50%), hypertrophic cardiomyopathy, severe hypertension (blood pressure >180/120 at rest), congestive heart failure, bundle branch block, treatment with digoxin, or resting ST segment changes on the electrocardiogram (ECG). The mean age was 62 years (range 48 to 72 years). All subjects were men. The study protocol was approved by the institutional ethics committee, and all participants gave written informed consent.
Treadmill exercise procedure
Subjects were practiced in exercise testing before starting the study protocol. Antianginal medication was withheld for 48 h, and studies were performed at least 3 h after a light meal. Caffeine and smoking were avoided on the day of exercise. Exercise testing was performed 1 h after oral isosorbide dinitrate 10 mg or matching placebo given double blind and in random order on different days, but at the same time of day. A modification of the Bruce protocol was used. The workload at 3, 6, 9 and 12 min was the same as the Bruce protocol, but both the speed and the gradient were increased in equal increments every minute rather than every 3 min, thus allowing a more gradual increase in work so that exercise time more accurately reflected the work performed. Subjects were asked to exercise until they would normally stop to rest, until 2.5 mm ST segment depression was reached, or until the test was terminated at the discretion of the supervising physician. Every 30 s during exercise and at peak exercise a 12-lead ECG was obtained. The ST segment levels were measured 0.06 s after the J point by an ECG computer with signal averaging capacity (Cardiovit AT60, AG, Baar, Switzerland). The level of ST segment depression in three leads (III, aVF, V5) was monitored continuously using a video monitor, and heart rate was measured automatically from the ECG.
Analysis of ECGs
The ECGs were reviewed to ensure that the level of ST segment depression measured by the ECG computer was not influenced by a wandering baseline, by development of bundle branch block, or by an ectopic beat. The level of ST segment depression for all 12 leads at all times during exercise (Fig. 1A) was entered onto a computer spreadsheet. For missing values the average of the measurement made 30 s before and after was used. The measurements used to compare the two exercise tests were 1) total exercise time; 2) ST depression at peak exercise; 3) ST depression during equivalent stages of sub-maximal exercise; 4) the rate of increase in ST depression with time (measured in mm/min) estimated for each lead with the slope coefficient from a linear regression of ST depression on time (Fig. 1B); and 5) time to 1 mm ST depression (or peak exercise time if ST depression was always less than 1 mm) calculated as the time at which ST segment depression exceeded 1 mm in any lead for two consecutive measurements. Because the data are from a crossover trial the analysis compared the differences between the nitrate and placebo measurement within an individual. Unless otherwise stated, the comparison used the leads with the greatest amount of ST depression, as defined by the measurement of interest. Additional leads were included in the analysis in rank order by the amount of ST depression. The ST segment depression at longest equivalent sub-maximal exercise was defined as the longest sub-maximal time completed in both tests.
The effect of nitrate was compared to placebo using a paired t-test. The evaluation of the different measures of ST segment depression considered the potential bias, the magnitude and statistical significance of the estimated treatment effect, and the sample size that would be required in a trial using that measure. Sample size requirements were calculated as the square of the inverse of the observed effect size (ratio of the average to the standard deviation [SD]), and standardized against the sample size required if treatment effects are measured by ST depression at longest equivalent sub-maximal exercise. Confidence intervals for the relative sample size requirements were calculated from the percentiles of 10,000 bootstrapped samples from the study data set (6). The preferred measure would be unbiased, would require the smallest sample size, and, because nitrates are known to reduce ischemia, would have the smallest p value.
Comparison of statistics
Nitrate treatment was found to reduce all measures of ST segment depression by statistically significant amounts (p < 0.05) except “the time to 1 mm ST segment depression” (Table 1). A clinical trial in which treatment effects were measured by “the time to 1 mm ST depression” would require approximately 15 times the number of subjects when compared to a trial in which effects were measured from ST depression at equivalent sub-maximal exercise. Both total exercise time and ST depression at peak exercise detected the nitrate effect. However, their sensitivity was not as great as ST depression at equivalent sub-maximal exercise, and this is reflected in a substantially larger sample size. The rates of increase in ST depression and ST depression at sub-maximal exercise were similar in their abilities to detect the nitrate effect and in their sample size requirements.
Measurements from multiple leads and times
The comparisons described above and in Table 1 are based on measurements from a single lead with the greatest ST depression. Detection of the nitrate effect was greatest during the last 30 s of exercise and was progressively less at earlier times (Table 2). To assess the value of averaging over several leads or times, the relative sample size required for a clinical trial based on each measurement of ST depression was compared (Fig. 2). Averaging over more time points reduced variability, but this benefit was outweighed by the smaller absolute difference in ST segment depression. Averaging over two or three leads resulted in a small improvement in detection of the nitrate effect for all statistics except ST segment depression at the longest equivalent stage of sub-maximal exercise. Because the longest sub-maximal time is within 30 s of terminating exercise on one of the tests, its level could influence the decision to terminate exercise. This may introduce bias and in part explain its apparent efficiency.
The above evaluations were made by comparing the lead with the greatest “ST depression” on the nitrate test with the lead with the greatest “ST depression” on the placebo test, even if they were two different leads. Although from a biological perspective it seems reasonable to compare the same lead, this introduced bias due to regression to the mean, or the tendency of an extreme value to be less on repeat measurement. This bias was present for all the summary measures assessed and is illustrated for “time to 1 mm ST depression” and “the rate of increase in ST depression” in Table 3. The “nitrate effect” was biased upward when measured with matching to the lead with the greatest ST segment depression in the placebo test. It was biased downward when matching used the nitrate lead. In the standard analysis for crossover trials, lead matching based on the first test (sometimes nitrate, sometimes placebo) eliminated the bias in the treatment difference but inflated the standard deviation so that p values were too large. Bias was eliminated and the SD was smallest when treatment differences were measured using the leads with the greatest ST depression in each test, that is, without lead matching (Table 3).
Treadmill or bicycle exercise testing is widely used for the diagnosis and management of ischemic heart disease and for assessment of antianginal therapy. In almost all studies that assess myocardial ischemia the end points are “time to 1 mm or 1.5 mm ST depression,” “time to onset of angina” and “total exercise time.” In the majority of studies the method used to determine “time to 1 mm ST depression” is not described in detail. The current study systematically assessed limitations of these end points: bias due to regression to the mean, bias related to the termination decision, and poor precision due to random error. The bias and efficiency properties identified in the current study are likely to apply to other studies where treatment effect or disease progression is measured by exercise testing. The following properties are of particular concern and are discussed separately.
Bias due to lead matching
Random error introduces bias if leads are matched because of regression to the mean, or the tendency of an extreme measure to become less extreme upon repeat measurement. In the current study there was a more than twofold difference in the apparent effect of nitrate treatment depending on whether matching was to the nitrate or placebo lead with the greatest rate of increase in ST depression. Regression to the mean will give the impression of improvement if the lead with the greatest ST depression on first exercise is chosen for a matched comparison, and important treatment effects may be overestimated or go undetected. In a crossover design, bias is balanced if matching is based on treatment order, but variance is increased because for each subject there is some regression to the mean. The problem can be overcome by comparing the unmatched leads with the greatest ST segment depression on each test.
Bias related to the termination decision
The ability to measure a treatment effect increases with the duration of exercise. It is therefore important to continue exercise until limited by symptoms or safety. Standard indications for terminating exercise are subjective. They include fatigue, shortness of breath, increasing chest pain, excessive ST segment changes, fall in systolic blood pressure, and physician discretion. The alternative of using objective measures such as predicted maximum heart rate or maximum oxygen consumption (7) is more likely to stop exercise before maximum information on ST segment change is obtained, reducing the ability to detect a treatment effect. In contrast, exercise time and ST segment depression measured at peak exercise are influenced by the subjective criteria used to terminate exercise. Both “time to 1 mm ST depression,” if not measured at the end of exercise, and the rate of increase in ST depression calculated by linear regression are less influenced by a single measurement made at the end of exercise.
Limitations of “time to 1 mm ST depression”
The “time to 1 mm ST segment depression” is widely used because it is easy to understand, it describes both ST depression and exercise time in one measurement, and it is usually not influenced by the termination decision. However, in the current study “time to 1 mm ST depression” was a poor measure of the “nitrate effect.” There are several reasons for this. First, the accurate timing of 1 mm ST depression is limited when measured from ECGs recorded at intervals. Some investigators have overcome this limitation by using interpolation or by recording the ECG continuously. Second, there is significant random variation in the measured level of ST depression. In clinical practice, crude averaging is used to reduce the effects of random variation, but this is usually subjective, likely to be inaccurate and prone to bias. Third, the time to 1 mm ST depression may not be optimal for detection of the treatment effect, which is greatest during the last 30 s of exercise. Information on the level of ST segment depression during this period is needed to increase statistical power. Fourth, random variation can cause ST segment depression to exceed 1 mm for a single isolated measure, something that happened in 33% (14/42) of exercise tests in the current study. Finally, some individuals do not reach 1 mm ST depression during exercise (5/42 tests in the current study). In clinical studies these subjects may be excluded or, as above, peak exercise time may be substituted. Both of these approaches introduce bias. More sophisticated methods such as survival analysis for repeated measurements (8) are required to remove this bias.
Use of linear regression to describe increase in ST depression
The limitations of “time to 1 mm ST depression” are reduced by using linear regression to describe the rate of increase in ST depression with time. Including information from multiple measurements reduces random error, and less weight is given to a single measurement that might bias the result. With regression estimates the recording interval for the ECG is less important, and information beyond 1 mm ST depression is used, thereby increasing sensitivity to treatment effects. Also, regression estimates can be calculated regardless of whether 1 mm ST depression is reached. To aid interpretation, the rate of increase in ST depression can be expressed as the “time to 1 mm ST depression.” This produces a familiar statistic, but it can require extrapolation outside of the data set and produce unreasonable durations. Regression methods used in this study fit a straight line to the relationship between ST segment depression and time, which was a reasonable assumption in this data set (e.g., Fig. 1B). Even if the relationship is somewhat curved, the slope from the linear regression model is sensitive to increased levels of ST depression at the end of the test, and, as in this study, is likely to provide a reliable measure of treatment effects.
The ST segment depression at equivalent sub-maximal exercise
For a crossover comparison, this measure needed the lowest sample size to detect the “nitrate effect.” However, its accuracy relies on comparison of ST segment depression measured at the same exercise time on two tests; thus, it cannot be calculated for a single exercise test such as that obtained in an unpaired study. In contrast, the rate of increase in ST segment depression can be measured from any exercise test and can be used in either paired or unpaired analyses.
Averaging to reduce random error
Averaging over several leads or times has the potential to reduce random error. The trade-off is that, whereas some leads and time intervals show ischemic changes, others only add noise. In the current study, averaging ST segment changes over two or three leads improved sensitivity to the nitrate treatment effect for most statistics. Conversely, for comparison of equivalent stages of exercise, averaging ST segment changes for times earlier than the longest equivalent sub-maximal stage did not improve detection of the treatment effect. This was because the greatest ST segment changes and the largest treatment effect occurred during the last 30 s of exercise. This suggests that efforts to improve the accuracy of exercise testing should focus on increasing information from the two or three leads with the greatest ST segment depression during the last period of exercise. Previous studies have demonstrated improved efficiency of treadmill exercise testing by using a multiple-lead ECG system (9).
Further studies are needed to confirm that conclusions based on this data set can be extended to other situations, including assessment of other antianginal treatments, the diagnosis of ischemic heart disease, and objective measurement of disease progression. In a study on the “warm-up response” in subjects with ischemic heart disease (10), the rates of increase in ST depression and ST depression at equivalent sub-maximal exercise were also the most sensitive measures.
Signal-averaged measurements and visual analysis of ST depression have similar accuracy for diagnosis of coronary artery disease (11). Computer-averaged estimates of ST segment depression are, however, more useful for estimating the rate of increase in ST segment depression, which is calculated from linear regression of multiple measurements during exercise. The measures of ST depression used in the current study were not adjusted for heart rate or ST slope because the role of these adjustments is uncertain. In the QUEXTA study, a large systematic analysis of exercise testing in a population with reduced workup bias, neither ST/heart rate index nor ST integral, which includes information on ST slope, improved diagnostic accuracy (11).
Our analysis was based on ECGs printed every 30 s during exercise and at peak exercise. It is theoretically possible to increase the frequency of ECGs and therefore the precision of ST segment analysis. Review of a hard copy, which takes 10 to 15 s to print, was undertaken to confirm that ST segment shift was not due to inclusion of an ectopic beat or wandering baseline (12). The majority of exercise studies are based on analyses of ECGs recorded every 30 to 60 s or less frequently (13). An alternative is to use real time or continuous ECG recording to determine time to 1 mm ST depression. However, in most reports clear criteria are not given for choosing the time of 1 mm ST depression from a continuous recording. Because the measured level of ST depression varies over time, regression methods will also be appropriate for measurements from continuous ECG recordings.
The treadmill protocol is based on the standard Bruce protocol, which is the most widely used treadmill exercise protocol employed both in the United States (14) and Europe (2). The protocol was modified with smaller and more frequent increments in work so that ST segment changes increase more predictably in proportion to the duration of exercise. This type of protocol has been recommended for assessment of treatment effects (7,15,16), but an objective comparison of the protocol used in this study with other exercise protocols has not been undertaken. Estimation of the rate of increase in ST depression during exercise may be more variable for protocols such as the Bruce, which have larger increments in work.
Random variability and bias limit the value of the treadmill exercise test for the detection of coronary artery disease, for assessment of treatment effects, and for objectively measuring disease progression. In this study, simple methods for avoiding bias and reducing random variation are described. The rate of increase in ST segment depression calculated using linear regression and the amount of ST depression at equivalent sub-maximal exercise are the most sensitive measures. They should be calculated without lead matching to avoid bias due to regression to the mean. Application of these methods may be of particular value in clinical trials where a more precise estimate of ST segment changes allows reliable detection of treatment effects with a smaller study population.
☆ Dr. Kay was supported by an Emily Johnstone Research Grant from the Department of Medicine, University of Otago, Dunedin, New Zealand. The study was supported by the Health Research Council, Auckland, New Zealand.
- American College of Cardiology
- American Heart Association
- Received December 20, 1999.
- Revision received March 27, 2000.
- Accepted June 1, 2000.
- American College of Cardiology
- Ritchie J.L,
- Gibbons R.J,
- Cheitlin M.D,
- et al.
- ESC Working Group on Exercise Physiology PaE
- Efron B,
- Tibshirani R
- Myers J,
- Froelicher V.F
- Clayton D
- Chaitman B.R,
- Bourassa M.G,
- Wagniart P,
- Corbara F,
- Ferguson R.J
- Kay I.P,
- Kittleson J.M,
- Stewart R.A.H
- Benhorin J,
- Pinsker G,
- Moriel M,
- Gavish A,
- Tzivoni D,
- Stern S
- Redwood D.R,
- Rosing D.R,
- Goldstein R.E,
- et al.
- Patel D.J,
- Mulcahy D,
- Norrie J,
- et al.