Author + information
- Received June 20, 2012
- Revision received August 24, 2012
- Accepted September 4, 2012
- Published online January 8, 2013.
- Paaladinesh Thavendiranathan, MD, MSc,
- Andrew D. Grant, MD,
- Tomoko Negishi, MD,
- Juan Carlos Plana, MD,
- Zoran B. Popović, MD, PhD and
- Thomas H. Marwick, MD, PhD, MPH⁎ ()
- ↵⁎Reprint requests and correspondence:
Dr. Thomas H. Marwick, Menzies Research Institute of Tasmania, 17 Liverpool Street, Hobart, TAS 7000, Australia
Objectives The aim of this study was to identify the best echocardiographic method for sequential quantification of left ventricular (LV) ejection fraction (EF) and volumes in patients undergoing cancer chemotherapy.
Background Decisions regarding cancer therapy are based on temporal changes of EF. However the method for EF measurement with the lowest temporal variability is unknown.
Methods We selected patients in whom stable function in the face of chemotherapy for breast cancer was defined by stability of global longitudinal strain (GLS) at up to 5 time points (baseline, 3, 6, 9, and 12 months). In this way, changes in EF were considered to reflect temporal variability of measurements rather than cardiotoxicity. A comprehensive echocardiogram consisting of 2-dimensional (2D) and 3-dimensional (3D) acquisitions with and without contrast administration was performed at each time point. Stable LV function was defined as normal GLS (≤−16.0%) at each examination. The EF and volumes were measured with 2D-biplane Simpson's method, 2D-triplane, and 3-dimensional echocardiography (3DE) by 2 investigators blinded to any clinical data. Inter-, intra-, and test-retest variability were assessed in a subgroup. Variability was assessed by analysis of variance and compared with Levene's or t test.
Results Among 56 patients (all female, 54 ± 13 years of age), noncontrast 3D EF, end-diastolic volume, and end-systolic volume had significantly lower temporal variability than all other methods. Contrast only decreased the temporal variability of LV end-diastolic volume measurements by the 2D biplane method. Our data suggest that a temporal variability in EF of 0.06 might occur with noncontrast 3DE due to physiological differences and measurement variability, whereas this might be >0.10 with 2D methods. Overall, 3DE also had the best intra- and inter-observer as well as test-retest variability.
Conclusions Noncontrast 3DE was the most reproducible technique for LVEF and LV volume measurements over 1 year of follow-up.
- 3D echocardiography
- interobserver test re-test variability
- interobserver variability
- longitudinal variability
Sequential measurement of ejection fraction (EF) is used in a variety of conditions, perhaps most commonly in the assessment of potential cardiotoxicity from chemotherapy or immune therapy in patients with malignancies (1). Cardiotoxicity is most commonly defined as a reduction of the left ventricular (LV) EF of ≥5% to <55% with symptoms of heart failure or an asymptomatic reduction of the LVEF of ≥10% to <55% (2). Because decisions regarding cessation of lifesaving therapy (3) are based on changes in EF values (4,5), it is important that EF measurement should not only be accurate but also have the lowest temporal variability such that a change in EF truly represents cardiotoxicity.
Echocardiography remains the most common modality for EF measurements, but the method for EF measurement with the lowest temporal variability is unknown. Although 3-dimensional echocardiography (3DE) has been shown to be more accurate than 2-dimensional echocardiography (2DE) for both ventricular volume and EF measurements when compared with cardiac magnetic resonance imaging (6,7), no head-to-head comparison of 3DE with 2D methods for temporal variability is available. Also, because 3DE acquisition and post-processing are still not routine, 2D methods including the biplane (2DBi) and triplane (2DTri) Simpson's methods are more commonly used in routine echocardiography. We sought to identify which of the commonly used echocardiographic techniques had the lowest temporal variability for EF and ventricular volumes, on the basis of multiple echocardiograms done over 1 year in women with breast cancer receiving chemotherapy who were clinically stable and had normal global longitudinal strain at each visit. The quantification methods compared include: 1) 2DBi; 2) 2DTri; and 3) 3D full volume acquisition. All methods were compared with and without LV opacification with contrast agents.
All patients referred to our echocardiography laboratory for LV function assessment before or during chemotherapy were prospectively enrolled in a database approved by the Institutional Review Board from between May 2010 and October 2011. We selected patients in whom stable function in the face of chemotherapy for breast cancer was defined by stability of global longitudinal strain (GLS) at up to 5 time points (baseline, 3, 6, 9, and 12 months). In this way, changes in EF were considered to reflect temporal variability of measurements rather than cardiotoxicity.
Inclusion criteria consisted of having had: 1) all echocardiography studies performed with Vivid 7 or E9 (GE Healthcare, Milwaukee, Wisconsin) ultrasound systems; 2) images for EF quantification along with high frame rate 2D acquisitions for speckle strain analysis at every follow-up; and 3) normal 2D systolic average global systolic longitudinal strain (GLS) at every follow-up study (defined as GLS ≤−16%). Exclusion criteria consisted of: 1) a GLS value of >−16% at any time point; 2) baseline or interval diagnosis of coronary artery disease, other nonchemotherapy-induced cardiomyopathy, or more than moderate mitral or aortic regurgitation; 3) clinical diagnosis of heart failure during follow-up; and 4) initiation of cardiac medications such as beta-blockers during the follow-up period.
LV function by 2DE and 3DE
As part of the chemotherapy protocol, all patients receive a complete echocardiogram with and without contrast administration and myocardial strain assessment. Apical 2- and 4-chamber and triplane acquisitions were obtained in each patient with optimization of image quality during acquisition. A 3D full volume dataset of the ventricle was obtained with gated (4 beats) acquisition with sector size, depth, and the number of heart beats optimized to obtain the highest possible volume rates. All acquisitions were then repeated after boluses of contrast administration (Definity, Bristol-Myers Squibb Medical Imaging, New York, New York, or Optison, GE Healthcare) with the mechanical index adjusted to between 0.15 and 0.3 and image settings optimized (8). All 2D post-contrast acquisitions were performed before 3D post-contrast acquisitions.
The LV volumes and EF were measured offline with EchoPAC (GE Healthcare) for all acquisitions except 3D with contrast where Tomtec (4D LV-Function 220.127.116.11, Unterschleissheim, Germany) analysis package was used. The 2DBi measurements were performed by manual contouring of the 4- and 2-chamber views, whereas 2DTri measurements required manual contouring of 3 long-axis views of the LV. The 3D noncontrast EF was measured with a semi-automated technique. Basal and apical guide points were placed in 1 long-axis image in the end-diastolic and end-systolic frames. Endocardial contours were then automatically generated and displayed on multiple long- and short-axis cine images for verification and manual adjustment as necessary. For the 3D EF with contrast, 3 long-axis images were manually contoured in end-diastole and end-systole. The contouring was then verified on long- and short-axis cine images and modified as necessary to ensure optimal endocardial tracking. The EF and volume measurements for each time point were measured blinded to the prior measurement.
LV systolic average GLS
Apical 3-, 2-, and 4-chamber high frame rate grayscale acquisitions (40 to 80 frames/s) were obtained with commercially available equipment (Vivid 7 or E9, GE Healthcare). Measurement of GLS was performed as part of the routine clinical study on the ultrasound system or offline with EchoPAC (GE Healthcare) and verified by a physician. GLS was measured for 3-, 2-, and 4-chamber views separately, and all 18 myocardial segments were averaged to obtain the GLS. Segments that were inadequately tracked were excluded from the analysis. Our strain cutoff −16% was based on calculation of approximately 2SD from the mean on the basis of known normal strain values from a previous publication (9).
Interobserver and intraobserver reproducibility and interobserver test-retest variability of the EF and volume measurements for all 6 methods were tested by 2 observers (P.T., A.G.) for 10 patients at 2 different time points (baseline and 3 months). Before performing these analyses the 2 observers first agreed on the method of contouring for all 6 methods (contrast and noncontrast studies) and practiced on 5 separate cases together. Each observer measured each study twice with each measurement performed at 2 separate time periods (2 to 4 weeks apart) in a random manner to avoid any memory of measurement between time points. For interobserver and intraobserver variability a total of 20 studies (10 from baseline, and 10 from follow-up) were used. For the interobserver test-retest variability, measurement 1 and 2 for each patient by observer 1 was compared with measurements 1 and 2 at time point 2 by the second observer. The interobserver test-retest variability provides an estimate of the expected variability in EF and volumes in an echocardiography lab where the measurements might be performed by 2 different sonographers/physicians at different time points during the follow-up period. All measurements were made by each observer blinded to previous measurements and to the measurements of the other observer.
Continuous data are expressed as mean ± SD, whereas categorical data are expressed as frequency or percentage. All absolute changes in EF values are represented as decimals (e.g., 10% absolute EF change is represented as 0.10), whereas any relative change in EF is represented as a percentage. Echocardiograms were categorized into 1 of 5 time periods: 1) before initiation of therapy or the first available study; 2) 2 to ≤3 months; 3) 3 to ≤ 6 months; 4) 6 to ≤9 months; and 5) 9 to ≤12 months after initiation of therapy or the first echocardiogram. All EF, volume, and strain measurements were normally distributed as determined by the Kolmogorov-Smirnov test. To assess whether there was a significant temporal variability in GLS in our population, we used Linear Mixed Model analysis with unstructured covariance for random effects (10) with strain as a dependent variable, patients as random effect, and time (in days) as a covariate. One-way analysis of variance (ANOVA) was performed to obtain the mean squared error for each technique studied. The square root provided the standard error of the measurement (SEM) in the EF and volume measurements for each method over the measurement period. The 95% confidence interval (CI) for each SEM was calculated with the formula:where XU2 is the upper-tail value of chi-square for df = n − 1 with area α/2 to its right and XL2 is the lower-tail value with area α/2 to its left. In addition to the SEM the coefficient of variation (COV) and 95% CI (calculated as described in the preceding text) for the temporal variability for each method was also calculated. Levene's test was used to test for differences in the SEM and COV between the techniques. Because 5 comparisons of SEM and COV were made (3D noncontrast technique vs. others) Bonferroni correction was used with p < 0.01 considered to be statistically significant. The minimal change in EF or volumes over time beyond which 2 measurements could be considered to be different was calculated as 2× the SEM for any technique.
Intraobserver and interobserver variability were determined with 2-way ANOVA approach described by Eliasziw et al. (11) with observers treated as random factors. With this method the interobserver variability includes both the variability among measurements of observers and the variability within measurements of observers (11). In addition, the interobserver test-retest variation was also calculated with 2-way ANOVA. This measure consists of variability within observers, among observers, and over time. For these 3 measurements, with the formula in the following text, the 95% CI was calculated as a measure of the minimal difference between 2 measurements beyond which the measures can be considered to be truly different (11–13).
The differences in the calculated observer variability were compared with the t test described by Tong et al (13). Because 5 comparisons were made (3D noncontrast vs. others) Bonferroni correction was used with a p < 0.01 considered to be statistically significant. All statistical analysis was performed with SPSS (version 19.0.0, SPSS, Chicago, Illinois).
A total of 88 patients with at least 3 echocardiograms were screened; 32 were excluded due to at least 1 abnormal strain value during follow-up (>−16%) or development of other reasons for LV dysfunction (e.g., coronary artery disease and viral myocarditis) or initiation of beta-blockers during the follow-up period. Demographic data of the 56 included patients and cancer therapy received are summarized in Table 1. Patient follow-up ranged from a total of 3 visits 3 months apart to as many as 5 visits (baseline, 3, 6, 9, and 12 months).
The GLS values at each follow-up period are summarized in Table 1. A total of 39 of 4,248 segments (<1.0%) were not used in the strain calculation due to poor myocardial tracking. The average GLS for all patients and all time points was −19.6 ± 2.0% and ranged from −16.0% to −24.9%. There was no significant temporal variability in strain comparing all patients for all available time points (p = 0.79).
Temporal variability of EF
The EF measurements were performed on 236 echocardiographic studies. However, due to occasional refusal of contrast agent, missed acquisition of 1 or more methods of EF assessment in some studies, or poor image quality, we were able to perform the following number of measurements: 235 2DBi, 186 2DBi with contrast, 225 2DTri, 177 2DTri with contrast, 215 3D, and 174 3D with contrast. The median (95% CI) frame/volume rates for 2DBi, 2DBi with contrast, 2DTri, 2DTri with contrast, 3D, and 3D with contrast were 74 (71 to 74), 71(64 to 74), 47 (46 to 47), 46 (45 to 46), 38 (35 to 41), and 30 (26 to 33), respectively. The EF by each technique for the entire follow-up period for all patients is shown in Figure 1, whereas the mean ± SD values for each follow-up time point are illustrated in Figure 2.
The temporal variability in EF measurements of each technique is illustrated in Figure 3, with numerical SEM and 95% CI values shown in the lower panel in Figure 3. Among the techniques, noncontrast 3DE had the lowest temporal variability compared with all the other EF methods. Contrast increased the temporal variability for 2DBi (p = 0.02), 2DTri (p = 0.17), and 3D (p < 0.001). However, among the 3 contrast-enhanced methods, 3D had the lowest temporal variability although only significantly different in comparison with the contrast 2DTri technique (p = 0.001) and a trend toward improvement in comparison with the 2DBi technique (p = 0.09). The temporal variability of EF measurements for each method is provided as COV in Online Table A.
Temporal variability of LV volume
The end-diastolic volume (EDV) and end-systolic volume (ESV) measurements by each technique for the entire follow-up period are illustrated in Figure 1, and the mean values at each time point are illustrated in Figure 2. The temporal variability in EDV and ESV for each technique is illustrated in Figure 4, with numerical SEM and 95% CI values shown in the lower panel in Figure 4. Noncontrast 3DE had the lowest temporal variability for both EDV and ESV and was significantly lower than all the other methods compared. Administration of contrast agents decreased the temporal variability in EDV measurements for the 2DBi method (p = 0.02) but increased it for the 3D method (p < 0.001). Contrast did not improve the ESV measurements for any of the techniques. Among the contrast-enhanced techniques when the 3D method was compared with the other 2 methods, there was no difference in the temporal variability for both EDV (p = 0.12 vs. 2DBi, p = 0.19 vs. 2DTri) and ESV (p = 0.20 vs. 2DBi, p = 0.31 vs. 2DTri). The temporal variability of volume measurements for each method is provided as COV in Online Table A.
Interobserver and intraobserver variability
The interobserver and intraobserver variability for all 6 methods expressed as SEM and 95% CI for both EF and volumes are summarized in Tables 2 and 3.⇓ For EF measurements, noncontrast 3DE had the lowest intraobserver and interobserver variability with minimal detectable change in EF of 0.048 and 0.075, respectively. Among the contrast-enhanced techniques 3DE had the lowest interobserver and intraobserver variability. For EDV and ESV the intraobserver variability was the lowest for the noncontrast 3D technique, with minimal detectable change of 14.3 and 8.1 ml, respectively. For interobserver variability, the 2DBi with contrast method showed marginally smaller variability compared with noncontrast 3D. For both the 2DBi and 2DTri techniques, contrast decreased the interobserver and intraobserver variability, but this was not seen with the 3D technique.
The overall interobserver test-retest variability for EF and volumes for each method is also summarized in Tables 2 and 3, respectively. For EF and ESV measurements, noncontrast 3DE had the lowest variability, with minimal detectable EF and ESV differences of 0.060 and 14 ml, respectively. For EDV measurement, 3D with contrast had the lowest variability followed by 3D without contrast. The minimal detectable changes for EDV with these methods were 30.1 and 34.8 ml, respectively.
This study provides a comparison of temporal and observer variability in EF and volume measurements with commonly used echocardiographic methods in patients with stable ventricular systolic function undergoing chemotherapy for breast cancer. In addition, this work also provides an estimate of the minimal change in EF and volumes above which 2 measurements should be considered to significantly differ in excess of temporal and observer variability. Noncontrast 3DE was most consistently superior for temporal and observer variability for both EF and volume measurements. Contrast administration did not improve variability in measurements of EF or volumes in this population. Our data indicate that a change in EF as high as 0.05 to 0.06 can be seen over time with noncontrast 3D EF, due physiological changes, acquisition differences, and observer variability. In comparison this variability can be close to 0.10 to 0.13 with 2D techniques.
Sequential follow-up by echocardiography
In patients with cancer, multiple studies are repeated during chemotherapy to monitor changes in EF as a marker of cardiotoxicity, and this and other indications for repeat testing are considered appropriate practice (14). However, the wide CIs with 2DE reported in previous studies raises concern about erroneously stopping chemotherapy due to changes in EF that occur only due to variability on repeat testing. Indeed, the utility of 2DE for sequential follow-up of patients for LV volumes and function has been disappointing, with 11% being the smallest change in EF that can be recognized with 95% confidence (15). A significant proportion of this variability has been attributed to test-retest variability (15).
The recommendation to perform sequential echocardiograms in these patients would be strengthened if measures of LV function were sufficiently robust to more faithfully reflect variations in physiological differences between the repeated studies. Cross-sectional studies have shown 3DE to have less variation between studies than 2DE in comparison with magnetic resonance imaging (6,7). Only 1 study has illustrated more reproducible LV measurements with 3DE on temporal follow-up on the basis of 2 time points in patients with coronary artery disease (16). To our knowledge, this is the first study to compare 2D and 3D methods with and without contrast administration in a longitudinal study with multiple repeated echocardiograms to determine the method with the lowest temporal variability.
In this study, controlling for nonphysiological changes in EF and volumes with strain and clinical parameters, noncontrast 3DE was superior to 2DE methods and contrast 3DE with respect to temporal variability in EF and volumes. The superiority compared with 2DE methods is a reflection that 3DE does not make any geometric assumptions for EF and volume measurements and is less affected by acquisition differences from 1 scan to the next, as often seen with 2DE (7,17). In addition noncontrast 3DE volumes and EF were measured with a semi-automated method where automatically generated contours are only modified if necessary. This is in contrast to manual endocardial contouring in all other methods where there can be significant differences in the interpretation of endocardial borders. The improved reproducibility of semi-automated versus manual contouring has been previously illustrated by others both with 2DE and 3DE (18,19). Specifically, this latter reason likely accounts for the higher variability of contrast 3DE compared with noncontrast 3DE in this study. Additional reasons for lack of improvement with contrast 3DE include lower volume rates of acquisitions (i.e., 30 vs. 38 VPS) as well as limitations of the software used for contrast 3DE analysis with respect to freedom in endocardial contour adjustment.
Contrast administration has been shown to improve accuracy and observer reproducibility with 2DE in patients with sub-optimal echocardiographic windows (20,21). However, other studies have shown the lack of additional benefit over harmonic imaging with only limited improvement or even worsening in reproducibility of EF measurements in some patient populations (22–24). The latter might be attributed to blooming and attenuation artifacts that hinder delineation of structures such as the mitral valve, resulting in variability in contouring (23). With 3DE, previous work suggested improved accuracy with contrast administration, even with good quality images (7). However, the lack of improvement in temporal variability in EF or volume measurements with contrast administration except for 2DBi EDV likely reflects the population studied. With harmonic imaging most of our patients had adequate acoustic windows and would not have met traditional criteria for contrast administration. This is also supported by the fact that we were able to analyze 235/236 2D images for 2DBi EF calculations. Unfortunately we were unable to identify a subgroup of patients in our study cohort who had at least 3 consecutive studies that would meet criteria for contrast administration to perform a subgroup analysis.
In our study, the 2DTri method did not perform better than the 2DBi method, for several reasons. The triplane method was more susceptible to off-axis views of the ventricle due to the difficulty in optimizing 3 separate views in a single acquisition. Second, the endocardial border delineation of the triplane images was inferior to the 2D images used for biplane calculations. Also, the frame rates for the triplane acquisition (47/s) were significantly lower than those of 2D images used for biplane measurements (74/s).
Interobserver and intraobserver variability
Previous studies have also shown 3DE to have lower interobserver and intraobserver variability probability, due to some of the factors discussed in the preceding text (6,16). Our findings are congruent with these studies. Another important variability is that seen between 2 studies at different time points with measurements performed by different individuals (interobserver test-retest variability). This is of direct clinical importance, because patients are likely imaged by different sonographers at different time points and the post-processing is performed by different individuals. In this study 3DE had the lowest interobserver test-retest variability with minimal detectable difference in EF of 0.060. This was significantly higher for the 2D techniques.
The average systolic GLS and clinical criteria were used as a marker of stability of ventricular function in these patients. Although, 2D speckle tracking-based strain measurement has limitations, the usefulness of strain measurements as a marker of subclinical ventricular dysfunction has been illustrated in several studies (25), including in patients receiving chemotherapy (26,27). Alternative study designs, including using sequential reference measurements with cardiac magnetic resonance, were considered less feasible.
We were unable to obtain contrast enhanced images and all 6 EF measurement methods of all patients at all time points. This was due to the refusal of some patients to have IV access or the inability to obtain IV access during echocardiography or patient discomfort necessitating selective image acquisition. Likewise, although our data are based on a small patient population, the critical variable is the total number of echocardiographic studies.
All our EF and volume acquisitions and post-processing were performed with specific vendor equipment and software. Therefore our findings cannot be directly extrapolated to other 3D acquisition and post-processing methods. However, we do not feel that the longitudinal reproducibility data are reflective of a single vendor but rather the semi-automated 3D measurements. Finally, this study seeks to define reliability and reproducibility rather than to validate the agreement or the diagnostic accuracy of EF and volume measurements against an external reference standard. The accuracy of 2D and 3D methods has been well-described (7).
The results of this study confirm that the 95% CI for 2D EF (approximately 0.10 variability) is analogous to the >0.10 change in EF that defines cardiotoxicity in asymptomatic patients (2). That noncontrast 3D EF measurements have lower temporal variability suggesting that this approach should be considered in patients with good acoustic windows. To achieve this degree of temporal variability with 3DE, an automated or semi-automated endocardial contouring method should be used. Although these data are based on patients receiving chemotherapy, our findings have broader implications for patients in whom clinical decisions or prognostic markers are based on changes in EF and ventricular volumes, including patients with valvular diseases, heart failure, and those undergoing surgical or percutaneous revascularization.
For supplementary tables and text, please see the online version of this article.
The authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- Abbreviations and Acronyms
- 2-dimensional biplane
- 2-dimensional echocardiography
- 2-dimensional triplane
- 3-dimensional echocardiography
- confidence interval
- coefficient of variation
- end-diastolic volume
- ejection fraction
- end-systolic volume
- average systolic global longitudinal strain
- left ventricle/ventricular
- standard error of measurement
- Received June 20, 2012.
- Revision received August 24, 2012.
- Accepted September 4, 2012.
- American College of Cardiology Foundation
- Yeh E.T.,
- Bickford C.L.
- Martin M.,
- Esteva F.J.,
- Alba E.,
- et al.
- Moja L.,
- Tagliabue L.,
- Balduzzi S.,
- et al.
- Verma S.,
- Ewer M.S.
- Jenkins C.,
- Moir S.,
- Chan J.,
- Rakhit D.,
- Haluska B.,
- Marwick T.H.
- Marwick T.H.,
- Leano R.L.,
- Brown J.,
- et al.
- Fitzmaurice G.M.,
- Laird N.M.,
- Ware J.H.
- Eliasziw M.,
- Young S.L.,
- Woodbury M.G.,
- Fryday-Field K.
- Douglas P.S.,
- Garcia M.J.,
- Haines D.E.,
- et al.
- Otterstad J.E.
- Jenkins C.,
- Bricknell K.,
- Chan J.,
- Hanekom L.,
- Marwick T.H.
- Cannesson M.,
- Tanabe M.,
- Suffoletto M.S.,
- et al.
- Yu E.H.,
- Sloggett C.E.,
- Iwanochko R.M.,
- Rakowski H.,
- Siu S.C.
- Thomson H.L.,
- Basmadjian A.J.,
- Rainbird A.J.,
- et al.
- Dedobbeleer C.,
- Rai M.,
- Donal E.,
- Pandolfo M.,
- Unger P.