Author + information
- Received July 20, 2011
- Revision received January 4, 2012
- Accepted January 31, 2012
- Published online May 15, 2012.
- Jennifer L. Dorosz, MD⁎,⁎ (, )
- Dennis C. Lezotte, PhD†,
- David A. Weitzenkamp, PhD†,
- Larry A. Allen, MD, MHS⁎ and
- Ernesto E. Salcedo, MD⁎
- ↵⁎Reprints requests and correspondence:
Dr. Jennifer L. Dorosz, University of Colorado Denver, 12631 East 17th Avenue, Mail Stop B130, Aurora, Colorado 80045
Objectives The primary aim of this systematic review is to objectively evaluate the test performance characteristics of three-dimensional echocardiography (3DE) in measuring left ventricular (LV) volumes and ejection fraction (EF).
Background Despite its growing use in clinical laboratories, the accuracy of 3DE has not been studied on a large scale. It is unclear if this technology offers an advantage over traditional two-dimensional (2D) methods.
Methods We searched for studies that compared LV volumes and EF measured by 3DE and cardiac magnetic resonance (CMR) imaging. A subset of those also compared standard 2D methods with CMR. We used meta-analyses to determine the overall bias and limits of agreement of LV end-diastolic volume (EDV), end-systolic volume (ESV), and EF measured by 3DE and 2D echocardiography (2DE).
Results Twenty-three studies (1,638 echocardiograms) were included. The pooled biases ± 2 SDs for 3DE were −19.1 ± 34.2 ml, −10.1 ± 29.7 ml, and − 0.6 ± 11.8% for EDV, ESV, and EF, respectively. Nine studies also included data from 2DE, where the pooled biases were −48.2 ± 55.9 ml, −27.7 ± 45.7 ml, and 0.1 ± 13.9% for EDV, ESV, and EF, respectively. In this subset, the difference in bias between 3DE and 2D volumes was statistically significant (p = 0.01 for both EDV and ESV). The difference in variance was statistically significant (p < 0.001) for all 3 measurements.
Conclusions Three-dimensional echocardiography underestimates volumes and has wide limits of agreement, but compared with traditional 2D methods in these carefully performed studies, 3DE is more accurate for volumes and more precise in all 3 measurements.
- ejection fraction
- left ventricular volumes
- real-time three-dimensional echocardiography
- systematic review
The assessment of left ventricular (LV) volume and ejection fraction (EF) is vital to the practice of cardiology. These measures are used to inform prognosis in most cardiac patient populations, determine treatment decisions for a variety of therapies, and function as eligibility criteria in many clinical trials (1–5). Despite their importance, there is no consensus on the best method to routinely measure EF and volumes.
Two-dimensional echocardiography (2DE) is the most ubiquitous tool for assessing LV size and systolic function. Measurement of EF is the most common reason for referring a patient for an echocardiogram (6). This test is noninvasive, portable, inexpensive, radiation free, and quick. It does not, however, provide reliable, reproducible, and accurate measures of EF or volumes (7). Traditionally, 2DE has used the method of disks to calculate LV volumes based on areas in only 2 imaging planes (6). This method is subject to errors due to foreshortening, poor endocardial definition, narrow echocardiographic windows, and assumptions about LV shape.
Because it is able to image the entire heart in multiple planes and provides excellent endocardial definition, cardiac magnetic resonance (CMR) imaging has been used as the gold standard for measuring LV volumes and EF (8). Nevertheless, the expense, limited availability, and incompatibility with metallic hardware make CMR impractical for widespread clinical use.
Three-dimensional echocardiography (3DE) uses recently developed matrix array echocardiographic probes to image the entire heart in ≤8 beats/min. As opposed to older 3D methods, “live” 3D datasets provide volumes with minimal post-processing. Some echocardiographic laboratories have embraced this technology for routine care and charge for it using a new billable Current Procedural Terminology code for 3DE (9). Yet, it is still unclear if 3DE offers an advantage over 2DE, and, if so, in which patients and with which specific techniques.
In this era when new technology, especially in imaging, has escalated medical costs, advances like 3DE should be thoroughly evaluated before recommending their large-scale use (10). We, therefore, undertook this systematic review of the 3DE published reports to evaluate its performance compared with CMR and its utility over traditional 2D methods in calculating EF and volumes.
To determine the test performance characteristics of 3DE in measuring LV volumes and EF, we included all studies that compared 3DE with CMR in adult patients. Most used Pearson's correlations between CMR and 3DE-derived LV end-diastolic volume (EDV), end-systolic volume (ESV), and EF. As correlation coefficients alone are misleading, we required that acceptable studies contain a Bland-Altman (BA) analysis of agreement for at least 1 of the 3 measurements (11). Although not required for inclusion, many studies also compared 3DE with traditional 2D methods and provided BA analysis on inter- and intraobserver variability. We excluded studies that only investigated freehand (rather than live or “real-time”) 3DE, did not include CMR as a gold standard, only included children, only evaluated the right (rather than the left) ventricle, only reported on LV mass (rather than volumes or EF), or were only published in abstract form. In addition, we excluded early studies done with probes or software that are not currently commercially available.
We searched 7 electronic information sources for studies published between January 1, 1990 and September 14, 2011: MEDLINE, Cardiosource Clinical Trials, the Cochrane Central Register of Controlled Trials, the Cochrane Health Technology Assessment Database, the International Standard Randomized Controlled Trial Number Register, National Institutes of Health ClinicalTrials.gov, and UpToDate Online. We used variations and combinations of the following search terms: three-dimensional echocardiography, real-time three-dimensional echocardiography, heart ventricles, and cardiac magnetic resonance imaging. We limited the search to humans, adults, and English language. In addition to database searches, we reviewed the references of included studies and other relevant review articles to obtain a comprehensive list of included studies (Table 1). The database searches were performed twice and reviewed by 2 authors (J.L.D., E.E.S.).
The quantitative data collected were the differences in EDV, ESV, and EF calculations determined by 3DE and CMR and expressed as the BA bias and limits of agreement. Where reported, we also recorded the BA differences for 2DE with CMR as well as intra- and interobserver analyses. To assess the applicability of 3DE in a general population of patients referred for echocardiography, we were particularly interested in the types of patients included in each study. Thus, we collected the distributions of disease types, sex, average ages, and the average sizes of the ventricles studied. To assess the general feasibility of 3DE, we noted in each study the method of patient selection and number of patients excluded due to poor image quality. We also logged the type of platform and methods used to obtain volumes.
Baseline patient characteristics for each study were re-weighted according to the number of patients in the sample and used to compute a pooled estimate of average age and EDV (expressed as mean ± SD). Our analysis included summary estimates of BA statistics to assess agreement of the 2 competing procedures. We used BA analytical methods because they are more informative than Pearson's correlation coefficients in comparing methods where the main goal is to assess accuracy rather than to determine if 2 competing procedures have associated outcomes (11). BA methods only provide estimates in lieu of statistical tests and level of significance like more traditional clinical trials. Like the BA methodology, our synthesis strategy included weighted summary statistics. Thus, for pooled estimates of 3DE and 2DE performance compared with CMR, we simply computed appropriate summary statistics (weighted sums and sums of squares) across individual studies, which were combined to produce pooled estimates of BA biases and limits of agreement for each of the 3 measurements. We found that the reported BA limits of agreement were inconsistently computed across the studies (which may have used 1, 1.96, or 2 SDs of the reported bias). For the purposes of our meta-analysis, we converted each study's reported estimates and expressed our limits of agreement as bias ±2 SDs. For intraobserver and interobserver variability, we included studies that calculated the mean differences between the observations divided by their average and reported them as mean difference with or without limits of agreement. Again, these average differences were pooled to compute an overall measure of the observer difference and limits of agreement (also expressed as 2 SDs). Significant differences between 3DE and 2DE biases were tested using the paired t test. To determine significant differences in the variances, we used Levene's test for the homogeneity of variances (35). A p value <0.05 was considered statistically significant. SAS version 9.2 (SAS Institute, Cary, North Carolina) was used for all calculations and statistical testing.
Search results and patient population
Electronic and manual searches from reference lists produced 189 citations, of which 88 were excluded as irrelevant based on title and keywords. The remaining 101 were reviewed in detail by reading the abstract or paper in full. Of these, 78 were excluded for a priori defined criteria (Fig. 1), leaving 23 included articles, comprising 1,174 patients (Table 1) (12–34). Nine of these reported results on separate patient groups or did the 3DE analysis twice using different methods, resulting in 34 distinct protocols with 1,638 separate 3DE analyses. Each of these substudies was treated as a separate data source in this meta-analysis (Table 2). These patients had a wide spectrum of cardiac abnormalities. Across studies, the following disease states were represented: 10% normal controls, 45% coronary artery disease, 9% dilated cardiomyopathy, 4% systolic dysfunction (not otherwise characterized), 4% valve disease, 8% hypertrophic cardiomyopathy, 3% congenital heart disease, 1.4% hypertension, and 16% other or not reported. The majority (72%) were men. The weighted age was 58.0 ± 13.7 years, and EDV by CMR was 184.9 ± 61.8 ml.
All studies excluded patients who were not in sinus rhythm at the time of their echocardiograms. Due to incompatibility with CMR, patients with internal pacemakers or defibrillators or severe claustrophobia were also excluded. Most studies assessed echocardiographic image quality and excluded patients with suboptimal images (Table 1). Among studies that reported the number of exclusions, a total of 88 patients (7.5%) were excluded; 6 studies (representing 123 patients) did not report on the number of excluded patients. One study used contrast imaging and included only those with poor images (31). Five studies recruited a variety of patients consecutively without exclusions for image quality (12,14,15,18,25). Of these, Jenkins et al. (12) reported that 20% of patients had technically difficult images; Pouleur et al. (14) noted that 16%, 35%, 32%, and 17% had excellent, good, moderate, and fair images, respectively.
3DE methods used in the selected studies
All studies except 2 used a Philips Medical Systems 2 to 4 MHz 3D matrix array transducer (Andover, Massachusetts). One study used the General Electric 2.5 MHz transducer (Horton, Norway) (33), and one used the Siemens Medical Systems Acuson SC2000 transducer (Mountain View, California) (34). For analysis of the 3D datasets, there were 2 general methods used to calculate volumes (Fig. 2). The first was manual tracing of equally spaced individual long- or short-axis slices at end-systole and end-diastole. To correspond with CMR standards, papillary muscles and trabeculations were treated as part of the LV cavity. The number of slices was chosen by the operator and ranged from 2 to 12; however, according to studies that specifically assessed the quality of data based on number of planes, a minimum of 8 planes were needed to achieve the best results (16,25). The second method required the user to identify 3 to 5 points at the apex and mitral annuls in the 2- and 4-chamber end-diastolic and end-systolic views. From these points, the software used automated border-detection to create a 3D endocardial shell of the entire ventricle from which volumes were calculated. There were 4 principal software platforms (QLab, Philips Medical Systems, Andover, Massachusetts; TomTec Imaging systems, Munich, Germany; Echopak, General Electric Vingmed Ultrasound, Horton, Norway; and Argus, Siemens Medical Solutions, Mountain View, California). Our selection of studies used a variety of slices versus mesh techniques on all 4 platforms (Table 2). Four studies were done to specifically compare the slices versus mesh techniques (16,22,29,30). Only 2 studies used contrast for LV opacification (12,31). As reported by Jenkins et al. (22), the average time to trace 12 slices was 10.5 ± 1 min, whereas 4 ± 0.3 minutes were required to identify the points for the mesh technique. This compared with 1.5 ± 0.45 min to trace 2D images for the method of discs.
3DE and CMR agreement results
Table 2 and Figure 3 show the BA bias and limits of agreement for EDV, ESV, and EF from the 34 studies. The overall pooled bias ± 2 SDs were −19.1 ± 34.2 ml, −10.1 ± 29.7 ml, and −0.6 ± 11.8% for EDV, ESV, and EF, respectively. To better approximate a typical population seen in an echocardiographic laboratory, we performed a subanalysis that included the 6 substudies (368 echocardiograms) that did not exclude patients for image quality. In this case, the BA biases ±2 SDs were −28.8 ± 38 ml, −17.9 ± 34 ml, and 0.3 ± 15.5% for EDV, ESV, and EF, respectively. As the quicker mesh-based technique is also more likely to be used routinely in echocardiographic laboratories, we also performed a subanalysis with the 4 studies (512 echocardiograms) that compared the mesh versus slices methods (16,19,29,30). The pooled BA biases ±2 SDs for the mesh technique compared with CMR were −22.7 ± 29 ml, −12.1 ± 25 ml, and −0.4 ± 9.1%; for slices, they were −12.6 ± 25 ml, −8.1 ± 21 ml, and 1.0 ± 9.0% for EDV, ESV, and EF, respectively.
3DE and 2DE agreement results
Nine articles (14 substudies) reported comparisons of both 3DE and 2DE with CMR (12,16,17,19,22,23,25,26,32). Table 3 and Figure 4 show the BA estimates for these studies. For 2DE, the pooled BA biases ± 2 SDs were −48.2 ± 55.9 ml, −27.7 ± 45.7 ml, and 0.1 ± 13.9% for EDV, ESV, and EF, respectively. The 3DE pooled BA biases ±2 SDs in those same studies were −15.7 ± 31.0 ml, −9.6 ± 25.8 ml, and 0.0 ± 9.2% for EDV, ESV, and EF, respectively. The differences in biases between 3DE and 2DE were statistically significant for volumetric measurements (p = 0.01 for both EDV and ESV), but not for EF (p = 0.42). The difference in width of the variances was statistically significant for all 3 measurements (p < 0.001).
Interobserver and intraobserver variability
The pooled results for EDV from the 21 substudies (541 echocardiograms) that reported BA analysis for 3DE interobserver and intraobserver variability are shown in Figure 5 (12,13,15,19,21,23,24,26–30,33,34). Of these, 4 studies (110 echocardiograms) also reported results for 2DE (12,19,23,26). For 3DE, mean percent differences (mean ± 2 SDs) were 5.8 ± 12.54 and 3.9 ± 8.5, for interobserver and intraobserver variability, respectively. For 2DE, the mean percent differences were 4.8 ± 21.1 and 0.2 ± 19.6. The difference in width of the variances between 3DE and 2DE was statistically significant (p < 0.0001) for both interobserver and intraobserver variability.
Under controlled study settings in patients with adequate image quality, 3DE offers better accuracy and precision in measuring LV volumes and better precision in measuring LV EF compared with 2DE. Although 3DE shows promise in providing the accessibility of echocardiography and the multiplanar imaging of CMR, this nascent technology still has limited spatial and temporal resolution compared with CMR as evidenced by clinically significant biases and limits of agreement.
How good is 3DE?
From individual studies and summary results, it is clear that 3DE, like 2DE, consistently underestimates LV volumes (but not EF). More importantly, there is substantial variability in 3DE calculations, such that to achieve 95% confidence intervals of true values, one would have to allow for ±34 ml for EDV, ±30 ml for ESV, and ±12% for EF, numbers that would change management for many patients. Furthermore, these results represent images selected for higher quality, and consequently underestimate the expected variability of routine practice. For example, an analysis of those studies that accepted all 3D datasets, regardless of image quality, increased the 95% confidence interval to ±38 ml for EDV, ±34 ml for ESV, and ±15% for EF.
Which patient populations or methods alter the performance of 3DE?
Although 3DE does not make assumptions about the LV shape, it fares worse in sicker patients with large ventricles, as seen in the study by Chukwo et al. (16), who reported results on controls separately from those with myocardial infarcts. Very large ventricles may not fit within the scanning sector allotted by the probe (32,34). With an average EDV of 195 ml, our pooled patient population is one with significant LV remodeling, which may account, in part, for the wide confidence interval we found. LV contrast may improve these results; however, in the study by Caiani et al. (who only studied patients with poor images), there was still a large confidence interval (31), and Jenkins et al. showed only a minimal clinical benefit with LV opacification (12). As described in the preceding text, there were 2 very different methods of obtaining volumes from the 3D datasets. As the mesh technique involves substantially less analysis time compared with tracing slices, it is more likely to be used routinely. Although some studies that directly compared the 2 methods reported better results with tracing slices (16,22), others found that the quicker mesh technique was more accurate (29,30). Our pooled analysis of the 4 studies demonstrated little difference in the bias and variances between the mesh and slices methods, which may validate the quicker mesh technique. Only 1 study directly compared 2 separate vendors and found that the volumes (but not EF) were more accurate in one vendor (28). This study was small and limited to a single center, so more information might be needed before concluding that one platform is superior to the others.
Does 3DE offer an advantage over 2DE?
Despite its limitations, 3DE may be superior to 2D techniques. For volumes, 3DE under-represents true values about 50% less and has one-half the 95% confidence interval compared with 2DE. However, more clinical decisions are based on EF. With EF, there is no difference in the bias between 3DE and 2DE, and the difference in the variance is modest (±4.7%). The benefit of 3DE can be appreciated by evaluating the intraobserver and interobserver variability, where 3DE demonstrates much lower variance on both tests. Low observer variability is particularly important in a real-world echocardiographic laboratories, with a variety of readers and sonographers and for patients undergoing serial examinations to determine clinical worsening.
Strengths and limitations of this meta-analysis
Our pooled patient sample included a large number of patients with a wide variety of cardiac diseases. Although we did include a few studies that were limited to specific patient groups (17,20,32), most recruited consecutive patients representing the spectrum of patients seen in everyday practice. To ascertain the overall bias of 3DE as seen in Figure 3, we combined all the included studies, regardless of technique and platform. We did this to demonstrate the overall bias and variability of the technology as it performs with the variety of methods used among different clinical and research laboratories. We recognize, however, that technique and software may matter, which is why we also conducted a subanalysis on those studies that included both the mesh and slices methods. Many studies reported the analysis separately for the different techniques, and we chose to consider these substudies as individual data sources. Although this strategy means that some, but not all, patients were represented twice, it allows the variety of 3D methods to be included. The disadvantage of this approach is that it gives more weight to some patient groups; however, there were subgroup analysis in enough studies, most of which had a variety of patients, that this approach did not significantly change the patient mix. As the aim of this meta-analysis was to evaluate 3DE performance and not patient characteristics, it was more important to include all the different 3DE analyses. We also demanded that 3DE be compared with CMR for determination of “true” values. We realize that CMR itself is not perfect, because it also has errors related to border-detection and controversy on the inclusion of basal LV planes. Despite these potential sources of error, CMR is most often used as the gold standard, both clinically and in research studies. As with any meta-analysis, ours could also have reported bias, because only those studies showing positive results for 3DE might have been published. However, funnel plot analysis of our citations demonstrated no publication bias (36).
Three-dimensional echocardiography underestimates true LV volumes and EF and has a substantial degree of variance, especially in patients with poor images or large ventricles. In these patient groups, or when the results are critical and contradict other clinical data, a degree of skepticism is warranted. With these caveats in mind, 3DE offers an advantage over 2DE in providing better accuracy, precision, and reproducibility for volume measurements. The advantage in measuring EF, however, is limited to a modest increase in precision.
Dr. Allen is a consultant for Amgen. Dr. Salcedo serves as a consultant with Philips Medical Systems. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- Abbreviations and Acronyms
- two-dimensional echocardiography
- three-dimensional echocardiography
- cardiac magnetic resonance
- end-diastolic volume
- ejection fraction
- end-systolic volume
- left ventricular
- Received July 20, 2011.
- Revision received January 4, 2012.
- Accepted January 31, 2012.
- American College of Cardiology Foundation
- Bonow R.O.,
- Carabello B.A.,
- Chatterjee K.,
- et al.
- Packer M.,
- Fowler M.B.,
- Roecker E.B.,
- et al.
- Lang R.M.,
- Bierig M.,
- Devereux R.B.,
- et al.
- ↵American Society of Echocardiography Coding and Reimbursement Newsletter. Accessed July 15, 2011, http://www.asecho.org/files/public/CodingnewsJan09.pdf. December 2009.
- Jenkins C.,
- Moir S.,
- Chan J.,
- Rakhit D.,
- Haluska B.,
- Marwick T.H.
- Pouleur A.-C.,
- le Polain de Waroux J.-B.,
- Pasquet A.,
- et al.
- Mor-Avi V.,
- Jenkins C.,
- Kühl H.P.,
- et al.
- Sugeng L.,
- Mor-Avi V.,
- Weinert L.,
- et al.
- Jacobs L.D.,
- Salgo I.S.,
- Goonewardena S.,
- et al.
- Chan J.,
- Jenkins C.,
- Khafagi F.,
- Du L.,
- Marwick T.H.
- Gutiérrez-Chico J.L.,
- Zamorano J.L.,
- Pérez de Isla L.,
- et al.
- Jenkins C.,
- Bricknell K.,
- Hanekom L.,
- Marwick T.H.
- Soliman O.I.I.,
- Krenning B.J.,
- Geleijnse M.L.,
- et al.
- Kühl H.P.,
- Schreckenberg M.,
- Rulands D.,
- et al.
- Macron L.,
- Lim P.,
- Bensaid A.,
- et al.
- Chang S.A.,
- Lee S.C.,
- Kim E.Y.,
- et al.
- Levene H.
- Begg C.B.