Author + information
- Received August 31, 2000
- Revision received January 22, 2001
- Accepted February 1, 2001
- Published online May 1, 2001.
- Neil J Weissman, MD, FACC∗,* (, )
- Julio A Panza, MD, FACC†,
- John F Tighe Jr., MD, FACC‡,
- Susan T Perras, RN, MSN§,
- Harvey Kushner, PhD§ and
- John S Gottdiener, MD, FACC∥
- ↵*Reprint requests and correspondence:
Dr. Neil J. Weissman, Cardiovascular Research Institute, Suite 4B-1, Washington Hospital Center, 110 Irving Street, NW, Washington, DC 20010
We sought to determine the specificity of two different methods for assessing change in aortic (AR), mitral (MR) and tricuspid (TR) valvular regurgitation.
Echocardiographic imaging with Doppler is the standard noninvasive diagnostic tool for assessing valvular structure and function. Change can be assessed using either independent evaluations (serial) or using a side-by-side comparison.
Subjects were from the placebo arm of a randomized, double-blind, clinical trial. Three echocardiograms over 10 months were performed. An initial and three-month echocardiogram were read as independent groups, blinded to all parameters except sequence. The initial and 10-month echocardiograms were read side-by-side, blinded to all parameters including sequence.
Two hundred nineteen predominantly healthy, obese, white, middle-aged women had initial and three-month echocardiograms (acquisition interval 105 ± 28 days) evaluated by the serial method (mean 167 ± 61 days between interpretations). The same subjects had the initial and 10-month studies (acquisition interval 303 ± 27 days) compared side-by-side. The specificity of the serial versus side-by-side method for determining change in MR grade was 55.8% versus 93.2% (p < 0.001); TR: 63.8% versus 97.6% (p < 0.001) and AR: 93.7% versus 97.6 (p = 0.08). Notably, most of the change occurred in a range (none versus physiologic/mild) that has limited clinical significance. Furthermore, the percentage of echocardiograms interpreted as nonevaluable was lower with the side-by-side method for MR (5.0% vs. 16.0%, p = 0.06), TR (4.6% vs. 15.5%, p < 0.001) and AR (4.1% vs. 12.3%, p = 0.002).
The side-by-side method of assessing change in valvular regurgitation appears to be the more reliable method with a higher specificity and minimal data loss.
Echocardiographic imaging with Doppler is the standard noninvasive diagnostic tool to assess valvular structure and function (1–4). Because of the safety and wide availability of echocardiography, it is routine to evaluate change in echocardiograhic parameters of valvular function over time. However, the interpretation method used to determine “change” varies in both clinical practice and in research trials.
There are two predominant methods for assessing change in valvular function. In the first method, echocardiograms are read independently, and the results from the two independent reads are used to determine and quantify the degree of change. This method of performing serial reads is analogous to the clinical practice of reading an echocardiogram and then comparing it with a prior report. In the second method, pairs of tapes are examined simultaneously and the magnitude and direction of any change is identified. This side-by-side method measures change directly. In echocardiographic research trials and clinical practice, it is not clear which method is more accurate and reliable.
One method to evaluate a diagnostic test for reliability is to assess specificity, that is, a negative test result when no condition exists. We, therefore, sought to determine the specificity of serial versus side-by-side methods of echocardiographic interpretation for change in valvular regurgitation. As part of a large scale, multicenter, clinical trial, we had the opportunity to evaluate these paradigms over a 10-month period in a group of patients in which no change in valvular regurgitation was expected.
Patients in this study were selected from the placebo arm of a randomized, double-blind, parallel group, placebo-controlled clinical trial comparing anorexigens to placebo.
All patients were healthy, obese men and women, 18 years of age or older, who underwent echocardiography using a standardized acquisition protocol at three time points over a 10-month period. The two follow-up echocardiograms were performed approximately 3 and 10 months after the initial echocardiogram. The study design and demographics of the original trial have been reported previously (5). Echocardiograms from both treated and placebo patients were read without knowledge of treatment assignment or other patient identifiers. All studies were conducted in full compliance with all federal, state and local regulations pertaining to human research and Good Clinical Practice Guidelines, had prior Institutional Review Board approval and written informed consent from each subject.
The purpose of this study was to assess healthy subjects in whom no change in valvular regurgitation was expected over a 10-month period. Therefore, only patients in the placebo arm without any substantial change in weight (>10 kg) or blood pressure (>25 mm Hg systolic or >20 mm Hg diastolic) during the 10-month period were selected. In addition, any patient with a new cardiovascular event, cardiovascular symptom, abnormal electrocardiogram or new murmur was excluded. Of the 404 patients in the placebo arm of the original study, 219 patients had all three echocardiograms and met all these entry criteria and constitute the subjects used in the analysis for this study.
Echocardiographic assessment and evaluation of change
The primary end points of this study were change in aortic (AR), mitral (MR) and tricuspid (TR) valvular regurgitation assessed by the serial and side-by-side echocardiographic interpretations.
Each subject had three clinical visits that included a detailed medical history and a brief physical examination. All echocardiograms were performed locally using a standardized imaging protocol proximal to the time of these clinical visits. Views included the parasternal long-axis, parasternal short-axis at the aortic, mitral and midpapillary levels and apical four-chamber, two-chamber and long-axis views. The degree of regurgitation was assessed by color Doppler in multiple views using real-time videotapes. Technical inability to evaluate one valve did not preclude examination of the remaining valve. Blinding was accomplished by masking all identifying information, including date of acquisition before videotape arrival at the central echocardiogram reading laboratory. All echocardiographic interpretations were performed at a central laboratory by four, Level III (American Society of Echocardiography) cardiologists (N.J.W., J.A.P., J.F.T., J.S.G.) (5,6).
For the serial reads, the echocardiograms were analyzed for degree of regurgitation. Aortic regurgitation, MR and TR were visually rated as none, physiologic/trace, mild, moderate, severe or nonevaluable using standard criteria (7,8). The initial evaluation was performed for the entire study population before the evaluation of the subsequent three-month echocardiogram. The second echocardiogram was evaluated independently with awareness of sequence but without knowledge of individual patient results from the first echocardiogram. After reading all of the serial echocardiograms, change in regurgitation was calculated as the difference in grade between the initial and three-month echocardiogram.
For the side-by-side reads, the echocardiograms were viewed in a paired fashion and interpreted specifically for change in regurgitation. The initial and 10-month echocardiographic pairs for each patient had a unique study number, and each echocardiogram was labeled as right (R) or left (L), using a randomization schedule. These pairs were viewed with side-by-side VCRs and monitors (9). Each echocardiogram was first evaluated for comparability and then for change. Comparability of each valve took into account both instrumentation (machine type and probe) and technical/acquisition (probe placement, gain settings, time and extent of interrogation) parameters. The presence or absence of change in regurgitation of at least one grade for each valve (aortic, mitral and tricuspid) was then determined. If there was a change, the magnitude of change was recorded with respect to the right and left videotape as a one, two or three grade change.
Training before study reading
We standardized the echocardiography review criteria among all cardiology reviewers by performing training and testing sessions for both methodologies. For the serial read training sessions, all cardiologists together reviewed twenty tapes and discussed the standardized criteria for grading each echocardiographic parameter. Twenty tape pairs were reviewed for the side-by-side training sessions, and criteria for change were discussed.
Reader agreement of valvular regurgitation
Readers were randomly assigned study tapes. For both serial and side-by-side interpretation methods, a random sample of echocardiograms were read for valvular regurgitation by two readers to evaluate inter-reader agreement, and each reader reread a random sample of echocardiograms to evaluate intra-reader agreement (5). For the serial readings, additional echocardiograms were evaluated by a second randomly selected independent reviewer when specific conditions were met, which included AR ≥ mild, MR ≥ moderate or TR ≥ moderate (5). For the side-by-side readings, a second randomly selected independent reviewer evaluated echocardiograms if a change in grade of regurgitation was identified or if the paired echocardiograms were assessed as noncomparable. For both methods, all discrepant readings were resolved by consensus. In the consensus process, the two cardiologist readers met together, reviewed the valve and came to agreement on the interpretation. This consensus process was ongoing throughout the study. For the initial and three-month independent serial reads there were 47/438 (10.7%) consensus reads, and for the side-by-side there were 30/219 (13.7%) consensus reads (p = 0.3).
For the serial reads, change in regurgitation was calculated as the difference in grade between the initial and subsequent study. For the side-by-side reads, the change in regurgitation was directly assessed. The nonparametric sign test was used to test for bias in change of regurgitation within each method. McNemar test was used to compare the paired proportion of “no change” between the serial versus the side-by-side method and also to compare the paired proportion of valves classified as “evaluable” between the two methods. The exact binomial p values were calculated for all tests. Inter- and intra-reader agreement was measured using percentage exact agreement and Cohen’s unweighted kappa statistic with all categories including the category of nonevaluable/noncomparable. The asymptotic standard errors and 95% confidence intervals are also provided for in the kappa coefficients. All statistical analyses were performed using SAS version 6.12 and StatXact, and all p values are two-sided.
The 219 patients were predominately healthy, obese, white middle-aged women (Table 1). There was a mean of 105 ± 28 days between acquisition of the initial and subsequent three-month echocardiograms used for the serial method and a mean of 167 ± 61 days between the interpretation of these studies. During this period, there were no statistically significant changes in either systolic (1 ± 11 mm Hg, p = 0.2) or diastolic blood pressure (0.5 ± 8 mm Hg, p = 0.4); however, there was an increase in body weight (0.8 ± 4 kg, p < 0.01).
There was a mean of 303 ± 27 days between acquisition of the initial and 10-month echocardiograms used for the side-by-side method. Similarly, there were no significant changes in either systemic systolic (1 ± 11 mm Hg, p = 0.2) or diastolic blood pressure (0.4 ± 7 mm Hg, p = 0.4), but there was an increase in body weight during this period (1.0 ± 6 kg, p = 0.02).
We found no association between the change in blood pressure and the change in regurgitation for either the serial or side-by-side comparisons. This may be due, in part, to the study design in which subjects with significant blood pressure change between echocardiographic visits were excluded from the analyses.
The serial comparison between the grade of regurgitation identified on the initial echocardiogram to that identified on the subsequent echocardiogram is presented in Table 2. In a serial comparison for change, the valve had to be evaluable on both echocardiograms in order to determine increase, decrease or no change. For AR, there were 192/219 (87.7%) valves that could be evaluated and compared. Change could not be calculated for 27/219 (12.3%) patients because AR was interpreted as nonevaluable in at least one of the two reads. There were 12/192 (6.3%) patients for which there was a change in the degree of AR. Of these 12 cases, there were eight with an increase in grade of AR and four cases with a decrease in grade (sign test, p = 0.39). We note that there were only two cases with a two-grade change, and these occurred where the initial reading was “none.”
For the side-by-side method, there were 210/219 (95.9%) pairs of echocardiograms that could be evaluated and compared for AR (Table 3). Studies were less frequently recorded as nonevaluable for change when using the side-by-side method than when using the serial method (4.1% vs.12.3%, McNemar test, p = 0.002) (Fig. 1A). Of the evaluable pairs read side-by-side, 205/210 (97.6%) were interpreted as no change, three as a one-grade increase and two as a one grade decrease (sign test, p = 1.0). There were no two-grade changes observed for AR with the side-by-side method. Although the proportion of AR interpreted as changed did not differ statistically between the serial and side-by-side methods, there was a trend for less change with the side-by-side method (6.3% vs. 2.4%, McNemar test, p = 0.08) (Fig. 1B).
For MR, change could not be calculated using the serial method for 22/219 (10.0%) because MR was interpreted as nonevaluable in at least one of the two reads. There were 87/197 (44.2%) patients for which there was a change in the degree of MR, and, of these, 5/87 (5.7%) were two grades in magnitude. Of the 87 cases, 59 had an increase and 28 had a decrease (sign test: p < 0.001). The increase was predominantly seen in those initially read as “none” (51 of 59 or 86.4%), and the decrease was predominantly seen in those initially read as trace or mild (26 of 28 or 92.9%). These one-grade changes are an effect of a regression to the mean, where the mean of the data set is between “none” and “trace” MR.
Studies were less frequently recorded as nonevaluable and noncomparable for change when using the side-by-side method than when using the serial method (5.0% vs. 10.0%, McNemar test: p = 0.06). Of the evaluable pairs of echocardiograms read side-by-side for change, 194/208 (93.2%) were interpreted as no change, seven cases as an increase and seven as a decrease (sign test, p = 1.0) (Table 3). There was a statistically significant difference in the proportion of MR interpreted as changed between the serial and side-by-side methods (44% vs. 6.8%, McNemar test, p < 0.001).
For TR, change using the serial method could not be calculated for 34 (15.5%) patients. There were 67/185 (36.2%) patients for which there was a change in the degree of TR. Of the 67 cases, there were 41 with an increase in grade of TR and 26 with a decrease (sign test, p = 0.09). Again, the increases were predominately in those initially read as none, and the decreases are in those initially read as trace/mild.
Similar to AR and MR, TR was less frequently recorded as nonevaluable and noncomparable for change when using the side-by-side method than when using the serial method (4.6% vs.15.5%, McNemar test, p < 0.001). Of the evaluable pairs read for change, 204/209 (97.6%) were interpreted as no change, three as an increase and two as a decrease (sign test, p = 1.0). There was a statistically significant difference in the proportion of TR “change” between the serial and side-by-side methods (36.2% vs. 2.4%, McNemar test, p < 0.001).
Inter-and intra-reader agreement was measured for the serial reads and for the side-by-side reads. Percentage exact agreement and kappa coefficients are listed in Table 4. The agreement coefficients for serial reads are based on the interpretation of grade for individual tapes. It is not possible to calculate agreement coefficients for calculated change from the serial reads. In contrast, for side-by-side reads, the coefficients are based on reading change between pairs of tapes. Therefore, the percent exact agreement and kappa values from serial reads cannot be compared with those from the side-by-side method.
This study demonstrates that side-by-side echocardiographic interpretation has greater specificity for determining change in valvular regurgitation than serial interpretations. In addition, side-by-side assessment results in reduction of lost data. The determination of no change in the amount of valvular regurgitation (specificity) is affected by several sources of variability. In this study, the side-by-side method had a lower variability than the serial reading method, which may be due, in part, to reduction of variability due to “regression to the mean” with multiple serial reads. In addition, a higher percentage of echocardiograms could be evaluated for change using the side-by-side method as compared with the serial method. The loss of data was at least double for the serial reads because two sets of reads were necessary to evaluate change, rather than a single read specifically for change. However, we note that most of the change occurred in a range (none vs. physiologic/mild) that has limited clinical significance.
While echocardiography is commonly used to assess change in cardiac structure or valvular function in both clinical practice and clinical trials, previous studies have not focused on the comparison of serial and side-by-side echocardiographic interpretations. Szlachcic et al. (10)performed two M-mode echocardiograms and chest X-rays in 22 patients, three months apart, to determine the inter-test variability. Both the one-month and three-month M-mode echocardiograms were interpreted both independently (serial method) and simultaneously (side-by-side method). They found a much lower variability of the echocardiographic measurements when the studies were read side-by-side. The coefficient of variation was approximately 50% smaller for measurements of the left atrium, left ventricular cavity dimension and left ventricular wall thickness. The results of their study are similar to ours when using repeat echocardiograms to assess differences in the serial and side-by-side methods of interpretation in a population where no change is expected. However, their analysis was limited only to M-mode echocardiography, while our results include interpretation of regurgitation using color Doppler.
Other types of imaging technologies such as radiology, magnetic resonance imaging and mammography often evaluate sequential studies using the side-by-side method. This method of comparison is a highly sensitive tool for identifying small variations or trends in observed differences (11).
Utility of assessing specificity
Specificity has been previously used to assess the utility of evaluating sequential diagnostic assessments. With a similar study design, Bonilla et al. (12)examined prostate volume measurements in a group of placebo patients unlikely to show any change over 12 months. They found an increase in specificity when a single reader was blinded to time sequences compared with multiple readers not blinded to sequence. We also chose specificity as the only practical measure of determining the best method of interpreting change in valvular regurgitation. Several other useful measures of a diagnostic test that could have been used are sensitivity, positive predictive value and negative predictive value. However, in order to assess these other measures, a gold standard for regurgitation is necessary. While there are measures of valvular regurgitation that may be used as a gold standard, they are impractical for a large-scale, multicenter clinical trial. Variability measures are also assessments of a diagnostic test. The standard measures of variability are intra- and inter-reader agreement. Unfortunately, simply looking at the percent exact agreement and kappa values for serial reads and comparing them with the percent exact agreement kappa values for the side-by-side reads is flawed. The intra- and inter-reader agreement in the serial reads is an assessment of the reproducibility of reading a single echocardiogram for degree of regurgitation, while the intra- and inter-reader agreement for the side-by-side method is an assessment of the reproducibility of determining change. Therefore, unfortunately, it is not useful to compare percent exact agreement and kappa values for these two methods. Thus, specificity is both a practical and clinically important evaluation of a diagnostic test’s ability to determine change over time.
Implications for clinical practice
It is common practice in radiology to interpret individual patient studies side-by-side. However, echocardiograms are often interpreted independently of prior studies. This may be due to practical limitations of not having the initial echocardiogram readily available for comparison. As a concession, many will compare the interpretation of the current echocardiogram to the report of the prior echocardiogram. The data from this study suggest that comparisons between two echocardiograms for change in valvular regurgitation should be performed with the echocardiograms displayed side-by-side. If such a practice is adopted, the likelihood of reporting a change in regurgitation, when none truly exist, will be reduced. While side-by-side interpretation may be cumbersome and time-consuming with videotape storage, it should be commonplace and easy with digital display (13).
Implications for clinical research
Most clinical trials that use echocardiography to assess change in cardiovascular structure or function use the serial method of interpretation (10). There may be several reasons why serial reads are more common in clinical trials than side-by-side reads. Often there is a need to report the results of the initial echocardiogram before acquisition of the follow-up studies. In other cases, the results from the initial echocardiogram may prompt the collection of subsequent studies. Most commonly, the clinical trial is designed with serial, independent echocardiographic interpretations with the intent that independent interpretations would provide statistical independence and the minimization of bias. If the purpose of the clinical trial is to determine regurgitation grade and calculation of prevalence in one group versus another, then independent interpretation may be better. However, when the purpose of the trial is to determine if there is a change in regurgitation over time, then the side-by-side method is preferable. The side-by-side method will yield a higher number of interpretable echocardiographic pairs, a lower rate of false positive changes and elimination of regression to the mean.
Regression to the mean is a significant problem in epidemiologic and clinical studies, including clinical trials using echocardiography (14). Whether assessing regurgitation, left ventricular hypertrophy or left atrial size, it is more likely that independent serial reads will show a shift of the measurements to the mean value, regardless of whether true change in that value occurred (15). The data from this study demonstrated that, for a population with a low prevalence of regurgitation, there would be a high likelihood (up to 57%) of misclassification in regurgitation from “none” to “trace” using the serial method. Since the patients with no regurgitation on the initial interpretation can only increase and cannot decrease, there is an inherent potential bias in serial interpretations. This may result in an apparent increase in severity of regurgitation when none has actually occurred. The same would hold true for a population with a high proportion of severe regurgitation, which would have an inherent bias to show decrease when no true decrease occurred.
In the Coronary Artery Risk Development in Young Adults (CARDIA) study, Gardin (16)found that variability in the measurements of left ventricular mass measured on the same studies five years apart was four times higher than measurements repeated on the same studies in temporal proximity. Therefore, the serial method is likely to degenerate in accuracy as the time interval between serial reads is increased. If true, the side-by-side method would become even more valuable with longer-term follow-up.
Since the goal of this study was to determine specificity of change in regurgitation, we purposely selected patients who were unlikely to have change during the study period. Because of the selection of patients without active cardiovascular disease, the preponderance of the cases in this study were categorized as none or trace/physiologic, and no cases were graded as “severe.”
This study involved the evaluation of three echocardiograms that were obtained over a 10-month period. The comparison for change using the serial method and the side-by-side method could not use the same follow-up echocardiogram because we took advantage of an ongoing clinical trial that provided the assurance that the readers were blinded to any expectation of change or no change in valvular regurgitation (i.e., also blind to active treatment or placebo). It is unknown if true change did occur in any of these patients. However, the serial method compared studies acquired over a shorter period of time and reported more change than in the side-by-side method where studies were obtained 10 months apart. Thus, if a real change were to have taken place in any patient between the initial and three-month echocardiograms, it should have been observed between the initial and the 10-month echocardiograms. Since there was less change over the longer interval, the use of two different follow-up echocardiograms should not affect the conclusion of the study.
The improved specificity of the side-by-side method could come at the expense of decreased sensitivity to detect true change, although there is no data in this study to address that issue. It is also possible that change would have been reported less frequently in the serial method if the independent interpretations were performed with a quantitative assessment of regurgitation. Although this study did not perform quantitative echocardiographic interpretations, several other serial echocardiographic studies using quantitation have reported problems with regression to the mean and high test-retest variability (15,16).
The authors thank Jan Kitzen, PhD, for his assistance in the preparation of this manuscript. Dr. Julio A. Panza, whose work in this study, including the preparation of the manuscript, was completed in his private capacity.
☆ Supported by a research grant from the Wyeth-Ayerst Research Division of Wyeth Laboratories, Philadelphia, Pennsylvania.
- aortic regurgitation
- mitral regurgitation
- tricuspid regurgitation
- Received August 31, 2000.
- Revision received January 22, 2001.
- Accepted February 1, 2001.
- American College of Cardiology
- Klein A.L,
- Davison M.B,
- Vonk G,
- Tajik A.J
- Monsuez J.J,
- Papon B.J,
- Hakim N,
- Schremmer B,
- Le Gall J.R
- Weissman N.J,
- Tighe J.F,
- Gottdiener J,
- Gwynne J.T
- Helmcke F,
- Nanda N.C,
- Hsiung M.C,
- et al.
- Perry G.J,
- Helmcke F,
- Nanda N.C,
- Byard C,
- Soto B
- Szlachcic J,
- Massie B.M,
- Greenberg B,
- Thomas D,
- Cheitlin M,
- Bristow J.D
- Davis C.E
- Herpin D,
- Demange J
- Gardin J.M