Author + information
- Received April 3, 2015
- Revision received June 30, 2015
- Accepted July 27, 2015
- Published online September 29, 2015.
- Christian Knackstedt, MD∗,
- Sebastiaan C.A.M. Bekkers, MD, PhD∗,
- Georg Schummers†,
- Marcus Schreckenberg†,
- Denisa Muraru, MD, PhD‡,
- Luigi P. Badano, MD, PhD‡,
- Andreas Franke, MD§,
- Chirag Bavishi, MD, MPH‖,
- Alaa Mabrouk Salem Omar, MD, PhD‖ and
- Partho P. Sengupta, MD, DM‖∗ ()
- ∗Department of Cardiology, Maastricht University Medical Centre, Maastricht, the Netherlands
- †TomTec Imaging Systems GmbH, Unterschleissheim, Germany
- ‡Department of Cardiac, Thoracic and Vascular Sciences, University of Padua, Padua, Italy
- §Department of Cardiology, KRH Klinikum Siloah, Hannover, Germany
- ‖Zena and Michael A. Wiener Cardiovascular Institute and the Marie-Josée and Henry R. Kravis Center for Cardiovascular Health, Mount Sinai School of Medicine, New York, New York
- ↵∗Reprint requests and correspondence:
Dr. Partho Sengupta, Zena and Michael A. Wiener Cardiovascular Institute, Marie-Josée and Henry R. Kravis Center for Cardiovascular Health, Mount Sinai School of Medicine, One Gustave L. Levy Place, New York, New York 10029-6574.
Background Echocardiographic determination of ejection fraction (EF) by manual tracing of endocardial borders is time consuming and operator dependent, whereas visual assessment is inherently subjective.
Objectives This study tested the hypothesis that a novel, fully automated software using machine learning-enabled image analysis will provide rapid, reproducible measurements of left ventricular volumes and EF, as well as average biplane longitudinal strain (LS).
Methods For a total of 255 patients in sinus rhythm, apical 4- and 2-chamber views were collected from 4 centers that assessed EF using both visual estimation and manual tracing (biplane Simpson’s method). In addition, datasets were saved in a centralized database, and machine learning-enabled software (AutoLV, TomTec-Arena 1.2, TomTec Imaging Systems, Unterschleissheim, Germany) was applied for fully automated EF and LS measurements. A reference center reanalyzed all datasets (by visual estimation and manual tracking), along with manual LS determinations.
Results AutoLV measurements were feasible in 98% of studies, and the average analysis time was 8 ± 1 s/patient. Interclass correlation coefficients and Bland-Altman analysis revealed good agreements among automated EF, local center manual tracking, and reference center manual tracking, but not for visual EF assessments. Similarly, automated and manual LS measurements obtained at the reference center showed good agreement. Intraobserver variability was higher for visual EF than for manual EF or manual LS, whereas interobserver variability was higher for both visual and manual EF, but not different for LS. Automated EF and LS had no variability.
Conclusions Fully automated analysis of echocardiography images provides rapid and reproducible assessment of left ventricular EF and LS.
The quantification of left ventricular (LV) size, geometry, and function represents the most frequent indication for an echocardiographic study and is pivotal for patient evaluation (1). Although LV volumes and ejection fractions (EFs) can be measured using different imaging modalities, 2-dimensional echocardiography continues to be the most commonly utilized technique in clinical practice. Despite the existing recommendations for the use of 3-dimensional (3D) echocardiography (2) and the reported variability of the biplane disc-summation method, which can be as high as 14% (3), the use of 2-dimensional echocardiography has continued to grow. However, the additional time and inconsistencies in manual tracing of the endocardial borders have led to the continued use of visual assessment of EF in busy echocardiographic laboratories (4).
The inherent subjectivity in visual assessment of EF and inconsistencies in manual estimation could potentially be overcome with the use of image processing algorithms that allow fully automated measurement of EF (5,6). Moreover, automated measurements could also yield longitudinal strain (LS), a sensitive marker of cardiac function (7,8). Therefore, in the present study, we used novel, fully automated software for measuring EF and LS from biplane views of the LV and compared the values with visually-estimated EF, manually-traced EF, and manually-traced LS. We hypothesized that automated EF and LS measurements would provide values similar to manual measurements, with increased precision.
A total of 4 international cardiovascular centers (Department of Cardiology, Maastricht University Medical Centre, the Netherlands [MAA]; Department of Cardiac, Thoracic and Vascular Sciences, University of Padua, Italy [PAD]; Department of Cardiology, KRH Klinikum Siloah, Hannover, Germany; and Zena and Michael A. Wiener Cardiovascular Institute and the Marie-Josée and Henry R. Kravis Center for Cardiovascular Health, Mount Sinai School of Medicine, New York, New York) participated in this study. All of the participating centers were asked to provide datasets containing apical 4-chamber (4-C) and 2-chamber (2-C) views of patients who had undergone clinically-indicated standard transthoracic echocardiography using commercially available systems. Datasets from the 4 local centers were blinded and stored digitally in DICOM 3.0 format. In addition to serving as a local laboratory from which basal datasets were collected, MAA also served as the reference center at which all echocardiographic studies (including those from MAA itself) were transferred and stored in a dedicated prototype image database and review platform. This platform was used to perform further analyses, such as heart cycle selection and manual tracing of endocardial contours. At MAA, all images were reviewed to confirm that 1 4-C and 1 2-C view had been recorded with at least 1 complete heart cycle. Cardiac cycles were reviewed to avoid large variations in sinus cycle length, supraventricular or ventricular extrasystoles, and respiratory variations. The most suitable cycle was selected and analyzed visually for EF estimation, manually by Simpson’s method, and by the automated software. It is important to note that to reflect the beat-to-beat variations encountered in the real world, the selection of cardiac cycles at the reference center was not tied to the cardiac cycles previously selected at the local centers.
A total of 280 datasets from patients in sinus rhythm were initially retrieved; 25 were excluded because of missing DICOM data (n = 2), insufficient image quality (n = 1), repeated evaluation at 2 points in time (n = 6), missing visual assessment of EF (n = 6), missing data (n = 5), or missing calibration information in the DICOM file for automated analysis (n = 5). Accordingly, the study consisted of 255 datasets that were included in the final analysis.
It is important to note that whereas manual and visual assessments of LV volumes and EF were performed at local centers and reassessed in the reference center, for regulatory purposes, automated volumes and EF and manual and automated LS were only performed at the reference center.
Conventional EF analyses at the local centers
At each center, an expert investigator (level 3 training in echocardiography; C.K., A.F., L.B., P.S.) was asked to first qualitatively evaluate the EF by visual assessment (visual EF), followed by manual tracing of the endocardial border at end-diastole and -systole to measure LV volumes and EF (manual EF) by the biplane-modified Simpson's method (2). Subsequently, all deidentified images were sent to the reference center at MAA, together with the information about manually-traced LV volumes and EF, visually-estimated EF, and demographic and clinical characteristics.
Analyses at the reference center
Manual and visual analyses
One of the expert investigators at MAA blinded to the analyses from the local centers reviewed all datasets and manually traced the endocardial contours at end-systole and -diastole on 2- and 4-C views in all study patients. End-diastole was defined at the peak of the electrocardiographic R-wave and/or 1 frame before mitral valve closure. End-systole was defined as 1 frame before mitral valve opening. To ensure adequate blinding of the manual evaluation and the visual estimation of EF, values of LV volumes and EF from the manual tracing were automatically transferred to the database and were not visible to the investigator. LS was also calculated and averaged as the relative change in the length of the endocardial border from end-diastole to -systole by manually tracking the endocardial borders from 4- and 2-C views. The image quality of all studies from all centers was classified as: 1) good; 2) average; 3) poor; or 4) not analyzable.
Fully automated analysis
Fully automated measurements of LV volumes were performed using AutoLV (TomTec-Arena 1.2, TomTec Imaging Systems, Unterschleissheim, Germany), a computer vision vendor-independent software package that applies a machine-learning algorithm for DICOM images (9). AutoLV can provide biplane end-diastolic, end-systolic, and stroke volumes and EF, as well as LS.
To measure the time spent using AutoLV, a standardized workflow was used. The dataset of a patient was uploaded to the review station and opened. Then, the investigator activated the AutoLV option, which was followed by a request to identify the 4- and 2-C views. The algorithm would then run the automated endocardial border detection and identify the end-diastole and -systole for both views (Figure 1).
Interobserver, intraobserver, and beat-to-beat variability
To measure intraobserver and interobserver variability, a subset of 20 patient datasets was randomly selected and resubmitted to MAA and PAD to be reanalyzed using the same protocol. Physicians at both centers were blinded to the original EF and LS results. Only 1 cardiac cycle was provided to ensure that all investigators analyzed the same heartbeat.
To derive interobserver variability, 1 physician from each center was asked to recalculate manual EF and manual LS and to allocate a visual assessment of the EF for all 20 patient datasets. Two investigators (C.K. and L.B.) were asked to perform these measurements twice to assess intraobserver variability.
The same subset of 20 patient datasets was also used for beat-to-beat variability. A second cardiac cycle was used to derive manual as well as automated EF and LS and was compared with measurements obtained from the original cardiac cycle.
The investigation conformed to the principles outlined in the Declaration of Helsinki, and each participating center conformed to local ethical regulations.
Normal distributions of variables were checked before analysis. Continuous variables were presented as mean ± SD and categorical variables were presented as n (%). Agreement between various echocardiographic measures was performed using intraclass correlation coefficients (ICCs) and Bland-Altman analysis. The paired Student t test was used to compare the mean values between 2 groups. For all statistical tests, a 2-tailed p value <0.05 was considered statistically significant. Statistical analysis was performed using SPSS for Windows, version 19.0 (SPSS, Inc., Chicago, Illinois) and Stata, version 11 (Stata Corp., College Station, Texas).
Table 1 summarizes demographic and clinical data, and Table 2 summarizes volumes and EF measurements at all study sites by all methods. Although the average image quality was reasonably good (1.6 ± 0.7), and both manual measurement and visual estimation of EF were possible in all patients, fully automated measurements of LV volumes and LS were not feasible in 5 patients (feasibility = 98%). The average time for obtaining the EF by automated analysis was 8 ± 1 s/patient (analyzed in 20 patients).
Assessment of different methods of determining EF in the reference versus local centers
Using ICC, good correlations were seen between all conventional EF methods and automated EF (Table 3, Figure 2). Bland-Altman analysis showed that the bias and limits of agreement were relatively lower when the same methods for EF assessment were compared at the local and the reference centers (local visual EF vs. reference visual EF: mean bias −0.6%; 95% confidence interval [CI]: −2.4% to 1.2% [Figure 2B]; local manual EF vs. reference manual EF: mean bias 0.9%, 95% CI: −0.6% to 2.4% [Table 3, Figure 2D]), than when different methods were compared at the local and the reference centers (local visual EF vs. reference manual EF: mean bias −2.2%; 95% CI: −3.5% to −0.8%; local manual EF vs. reference visual EF: mean bias 2.4%; 95% CI: 1.5% to 3.3%) (Table 3). Bias and levels of agreement within the local and reference centers were also relatively high (local visual EF vs. local manual EF: mean bias 2.7%; 95% CI: 1.6% to 3.8%; reference visual EF vs. reference manual EF: mean bias −1.8%; 95% CI: −2.9% to −0.8%) (Table 3).
Bland-Altman analyses between the conventional methods and the automated EF showed that the bias and levels of agreement were wider when visual EF at the local centers was compared with the automated EF (mean bias −2.5%; 95% CI: −3.9% to −1.1%) (Table 3), whereas manual EF measurements at the local centers were similar to the corresponding automated EF values, with a negligible bias (mean bias 0.2%; 95% CI: −0.9% to 1.3%) (Table 3). These observations were confirmed in the reference center, as there was a relatively higher bias for visual EF compared with the automated EF (mean bias −2.2%; 95% CI: −3.3% to −1.1%) (Table 3, Figure 2F), whereas the values for the manual and automated EF were relatively similar (mean bias of −0.3%; 95% CI: −1.5% to 0.9%) (Table 3, Figure 2H).
We also checked for >10% absolute difference between automated and manual EF at the reference center and for the manual measurements at the local centers and reference center. These observations were restricted to 147 cases, excluding the cases at MAA where the measurements were not repeated. A >10% difference between the automated and the manual measurements at the reference center occurred in 31 cases (21.1%) and between manual measurements at the local and reference centers in 35 cases (23.85; p = 0.576). There was no variability seen for automated measurements.
In addition, when patient datasets were classified according to image quality, correlations for automated EF were preserved for visual and manual EF in patient datasets with good (ICC = 0.83, 0.84; 95% CI: 0.76 to 0.88 and 0.76 to 0.88; respectively, p < 0.001) and with moderate image quality (ICC = 0.83, 0.84; 95% CI: 0.76 to 0.88 and 0.77 to 0.89; respectively, p < 0.001), whereas correlations marginally worsened when the image quality was poor (ICC = 0.79, 0.63; 95% CI: 0.54 to 0.9 and 0.2 to 0.83; p < 0.001, p = 0.006, respectively).
It is important to note that, because MAA served both as a local center and as the reference center, and to avoid local site measurement bias, 108 datasets provided from MAA were excluded from all ICC and Bland-Altman analyses when comparing visual and manual assessments between the reference and local centers.
At the reference center, the manual tracking-derived LS was 21 ± 6% and automated LS was 20 ± 6%. Although significant differences between the absolute values was found (p = 0.02), ICCs and Bland-Altman analysis suggested that there was good correlation and agreement between the manual and the automated LS (ICC: 0.83; bias: 0.7%; 95% CI: 0.1% to 1.3%).
A strong correlation was seen when comparing automated LS with automated EF (r = 0.92; p < 0.001). A simplified regression equation derived from the automated measurements (EF = 2 × automated LS + 20) yielded EF values that showed similar correlations with both automated and manual EF (ICC = 0.96 and 0.77, respectively; both p < 0.001).
Intraobserver, interobserver, and beat-to-beat variability
Table 4 summarizes intraobserver and interobserver variability measurements for visual, manual, and automated EF, as well as for manual LS derived from MAA and PDA. In general, the intraobserver variability was not statistically significantly different, except for visual estimations at MAA (p < 0.001) and manual LS estimations at PAD (p = 0.03) (Table 4). Bland-Altman analysis showed that the bias and levels of agreement for intraobserver variability in both centers were larger for visual EF than for manual EF or manual LS. In contrast, interobserver variability was significantly different for both visual (p = 0.001) and manual EF (p < 0.001), whereas it was not different for LS (p = 0.539) (Table 4). Bland-Altman analysis also showed that the bias and levels of agreement for interobserver variability were larger for visual and manual EF compared with manual LS (Table 4). Automated EF and LS measurements had no variability because machine analysis revealed the same pattern recognition and similar measurement on repeated assessments. Finally, beat-to-beat variability was −0.96 ± 3.52% for automated EF, 2.7 ± 8.16% for manual EF, −0.19 ± 1.31% for automated LS, and 1.09 ± 3.29% for manual LS.
The main finding of this study was that a fully automated measurement of EF is technically feasible, can be performed within a few seconds, and is comparable to manual tracking (Central Illustration). Furthermore, the automated image analysis yields information beyond LV volumes and EF, such as LS, and (unlike all conventional methods of EF and LS assessment) does not include any variability.
The clinical need for automated assessment of LV function
As with most techniques, echocardiography is associated with a steep learning curve (10), and the overall proficiency and expertise of the practitioner is directly related to the volume performed at a given center (11). Overall, 100 echocardiographic examinations are required for attaining expertise, including accurate determination of EF (12). However, the dramatic increase in the use of echocardiography and the accompanying growth in new echocardiographers have exceeded the ability for adequate training.
EF is an important diagnostic and prognostic echocardiographic marker that is used to decide and monitor treatment options in patients with heart failure. However, because of the large variability in EF measurements that can occur at different centers, it is possible that therapies in one-fifth of patients may be confounded when decisions are made on the basis of EF (13). Moreover, newer modalities, such as strain imaging, impose more challenges in the echocardiography field (14). Consequently, there has been renewed interest in automated software tools that can facilitate assessment of LV function with the least possible variability among echocardiographers and between different centers (14,15).
Automation shortens time required for assessing LVEF
Automated determination of EF is not a new idea and has been applied to different imaging modalities, including echocardiography (6,16–19). However, technical limitations have prevented automated EF from being adopted in clinical practice (20). Recent improvements in automated boundary detection have circumvented some of these limitations (6,17–19). In contrast to older methods, which were dependent on image quality and gain settings (21–23), newer algorithms that rely primarily on speckle tracking and artificial intelligence in tracking techniques were recently reported to be feasible in several studies (5,6,17,18,24–27). However, the time needed for the automated analysis and the levels of agreement with reference methods in these studies were variable (Table 5). In addition, some studies either used a semiautomated EF determination or required manual corrections that resulted in variable results and increased measurement time (Table 5).
In comparison, the new software algorithm used in the current study differs from the previous approaches, as the user only identifies 2- and 4-C views followed by activation of the automated EF evaluation (Figure 1). The program subsequently detects and contours the endocardium and cardiac cycle and determines the EF and corresponding measurements. Importantly, this process only took 8.1 ± 1 s/patient dataset, thus showing greater time efficiency than all previously described approaches (Table 5).
Reproducibility of techniques for assessing LVEF
Overall, the decision to rely on either manual or visual determination of EF is a matter of longstanding debate (28,29). More recently, even with the advent of better images due to improved software and hardware, there are still conflicting results. Variability is a particularly important issue when considering the use of manual EF, and is of greatest concern when considering visual EF because of the subjective nature of the assessment. McGowan et al. (30) recently reviewed results from studies that evaluated EF using different techniques. They reported that interobserver variability in previous studies ranged from 9% to 21% for manually-tracked EF and from 8% to 17% for visually-assessed EF, with an intraobserver variability that ranged from 6% to 13% for Simpson’s method and from 11% to 13% for the visual assessment. More recent studies have reported similar variability for manual and visual EF (6,17,27,31,32). In addition, Kaufmann et al. (13) demonstrated that EF, measured either manually or by visual assessment, has a good correlation between 2 centers (r2 = 0.63), but with a wide level of agreement (bias = 0.2, levels of agreement = −17.4% and +17.8%) and relatively large variability between the centers (14.1 ± 10.9%). They also mentioned that these differences were not primarily due to varying image quality (13). Overall, the intraobserver and interobserver differences, bias, and levels of agreement reported in our study for visual and manual EF are consistent with the published data, confirming the problem of variability between centers in the visual and manual EF assessments. In contrast, automated EF had no data variability, which is consistent with most intelligent tracking techniques and which gives it a great advantage over conventional methods in addressing the current problem of EF variability between sonographers and centers.
Assessment of LS
Assessment of LS is a robust method for assessing LV systolic function (7). Several studies have shown that LS provides incremental diagnostic and prognostic value, and is a more sensitive marker of LV systolic function than EF (33–39). However, variability in LS values has been a source of concern (13), and has been largely attributed to the use of different software from different vendors (40). The European Association of Cardiovascular Imaging–American Society of Echocardiography Industry Task Force therefore initiated a standardization process, which initially focused on standardization of LS. In their recent report, LS measurements using software packages from 7 different ultrasound machine vendors and 2 software-only vendors were comparable and had better reproducibility than EF (41). The current study similarly shows that manually measured LS is more robust and less variable than manually traced and visual EF. Our study also confirms the feasibility of the automated measurement of LS (42,43), which has the distinct advantage of having no interobserver and intraobserver variability.
First, studies have reported better estimation of LV function using 3D echocardiography (44). There are also 3D tracking methods that apply automated algorithms (5,19,25,26,45), and 3D strain is a feasible method that can provide both global and regional evaluations with 1 analysis (45). The 3D techniques were not evaluated in this study and may have the potential for more accurate assessments (45). Second, in the present study, the interinstitutional variability of EF was evaluated; however, the variability of LS was only assessed at the reference center because the automated LS assessment protocol has not been approved for clinical use by the Food and Drug Administration and, therefore, could not be implemented at all centers. Third, because visual estimations of EF were done in the same settings as the manual assessments at local centers, it is possible that visual EF might have been biased by the manual measurements. However, the evaluation of intraobserver and interobserver variability was completely blinded and no adjustment was possible, as the measured EF was not displayed when the investigators tracked the images manually. Similarly, at the central laboratory, visual assessment was performed without knowledge of the Simpson measurement. Fourth, patients with atrial fibrillation and other arrhythmias were excluded from the study. Thus, the performance of the software in patients with arrhythmias and irregular heartbeats remains to be tested in further studies.
Finally, although intraobserver, interobserver, and beat-to-beat variability were measured, normal interstudy variability was not addressed in the current study, and thus, future studies should take this into consideration.
Machine learning-enabled echocardiography image analysis for fully automated assessment of LV volumes and EF is feasible and gives precise results within seconds that are comparable to manual determination. Furthermore, this new technique provides additional information on quantitative variables, such as LS, which provide further incremental assessment of LV systolic function beyond EF.
COMPETENCY IN SYSTEMS-BASED PRACTICE: Echocardiographic assessment of LVEF and longitudinal strain may be useful to guide clinical management, but manual measurements are time-consuming and variable. The development of automated methods to assess LV function could standardize these measurements to facilitate clinical research and enhance patient care.
TRANSLATIONAL OUTLOOK: Future studies should evaluate whether automated LV assessments in longitudinal patient care and clinical trials of therapeutic interventions improve the consistency and value of serial echocardiographic assessments compared with visual assessment and manual measurements.
Dr. Knackstedt has received travel grants and research software from TomTec Imaging Systems. Mr. Schummers is an employee of TomTec Imaging Systems. Mr. Schreckenberg is an employee at TomTec Imaging Systems. Dr. Badano has received a research equipment grant from TomTec Imaging Systems. Dr. Sengupta has received research software support from TomTec Imaging Systems; has served as an advisor to Saffron Technology, TeleHealthRobotics, and Heart Test Laboratories; and has served as a consultant to Edwards Lifesciences. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose. Drs. Knackstedt and Bekkers contributed equally to this work. Pravin Shah, MD, served as the Guest Editor for this paper.
- Abbreviations and Acronyms
- ejection fraction
- average biplane longitudinal strain
- left ventricle/ventricular
- Maastricht University Medical Centre
- University of Padua
- Received April 3, 2015.
- Revision received June 30, 2015.
- Accepted July 27, 2015.
- American College of Cardiology Foundation
- Wong M.,
- Johnson G.,
- Shabetai R.,
- et al.,
- for the V-HeFT VA Cooperative Studies Group
- Hoffmann R.,
- Barletta G.,
- von Bardeleben S.,
- et al.
- Thavendiranathan P.,
- Popović Z.B.,
- Flamm S.D.,
- et al.
- Barbosa D.,
- Heyde B.,
- Dietenbeck T.,
- et al.
- Cannesson M.,
- Tanabe M.,
- Suffoletto M.S.,
- et al.
- Voigt J.U.,
- Pedrizzetti G.,
- Lysyansky P.,
- et al.
- Bishop C.M.
- Picano E.,
- Lattanzi F.,
- Orlandini A.,
- et al.
- Kaufmann B.A.,
- Min S.Y.,
- Goetschalckx K.,
- et al.
- Blondheim D.S.,
- Friedman Z.,
- Lysyansky P.,
- et al.
- Liel-Cohen N.,
- Tsadok Y.,
- Beeri R.,
- et al.
- Shibayama K.,
- Watanabe H.,
- Iguchi N.,
- et al.
- Mignotte M.,
- Meunier J.,
- Tardif J.C.
- Karagiannis S.E.,
- Roelandt J.,
- Qazi M.,
- et al.
- Hoffmann R.,
- von Bardeleben S.,
- ten Cate F.,
- et al.
- Kalam K.,
- Otahal P.,
- Marwick T.H.
- Bière L.,
- Donal E.,
- Terrien G.,
- et al.
- Rhea I.B.,
- Uppuluri S.,
- Sawada S.,
- et al.
- Dahou A.,
- Bartko P.E.,
- Capoulade R.,
- et al.
- Farsalinos K.,
- Daraban A.M.,
- Ünlü S.,
- Thomas J.D.,
- Badano L.P.,
- Voigt J.U.
- Wierzbowska-Drabik K.,
- Hamala P.,
- Roszczyk N.,
- et al.
- Kühl H.P.,
- Schreckenberg M.,
- Rulands D.,
- et al.