Author + information
- Received May 12, 2016
- Revision received July 27, 2016
- Accepted August 17, 2016
- Published online November 29, 2016.
- Sukrit Narula, BSa,
- Khader Shameer, PhDb,
- Alaa Mabrouk Salem Omar, MD, PhDa,c,
- Joel T. Dudley, PhDb and
- Partho P. Sengupta, MD, DMa,∗ ()
- aZena and Michael A. Weiner Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York
- bInstitute of Next Generation Healthcare, Department of Genetics and Genomic Sciences, Mount Sinai Health System, New York, New York
- cDepartment of Internal Medicine, Medical Division, National Research Center, Cairo, Egypt
- ↵∗Reprint requests and correspondence:
Dr. Partho P. Sengupta, Icahn School of Medicine, Mount Sinai School of Medicine, One Gustave L. Levy Place, PO Box 1030, New York, New York 10029.
Background Machine-learning models may aid cardiac phenotypic recognition by using features of cardiac tissue deformation.
Objectives This study investigated the diagnostic value of a machine-learning framework that incorporates speckle-tracking echocardiographic data for automated discrimination of hypertrophic cardiomyopathy (HCM) from physiological hypertrophy seen in athletes (ATH).
Methods Expert-annotated speckle-tracking echocardiographic datasets obtained from 77 ATH and 62 HCM patients were used for developing an automated system. An ensemble machine-learning model with 3 different machine-learning algorithms (support vector machines, random forests, and artificial neural networks) was developed and a majority voting method was used for conclusive predictions with further K-fold cross-validation.
Results Feature selection using an information gain (IG) algorithm revealed that volume was the best predictor for differentiating between HCM ands. ATH (IG = 0.24) followed by mid-left ventricular segmental (IG = 0.134) and average longitudinal strain (IG = 0.131). The ensemble machine-learning model showed increased sensitivity and specificity compared with early-to-late diastolic transmitral velocity ratio (p < 0.01), average early diastolic tissue velocity (e′) (p < 0.01), and strain (p = 0.04). Because ATH were younger, adjusted analysis was undertaken in younger HCM patients and compared with ATH with left ventricular wall thickness >13 mm. In this subgroup analysis, the automated model continued to show equal sensitivity, but increased specificity relative to early-to-late diastolic transmitral velocity ratio, e′, and strain.
Conclusions Our results suggested that machine-learning algorithms can assist in the discrimination of physiological versus pathological patterns of hypertrophic remodeling. This effort represents a step toward the development of a real-time, machine-learning–based system for automated interpretation of echocardiographic images, which may help novice readers with limited experience.
By 2030, 40.3% of the U.S. population is projected to have some form of cardiovascular disease (1). There is growing interest in precision medicine techniques that can deliver individually adapted medical care by linking genetic pre-disposition, biomarkers, and imaging modalities for refining cardiac risk assessment (2–4). One of the computational approaches that can help to implement precision medicine in cardiology is machine learning: a collection of statistical learning and modeling techniques that can learn from established data and can make predictions on unseen or new data (5). Machine learning has been used for performing complex classification tasks in cardiology, including classification of constrictive pericarditis and restrictive pericarditis (6), classification of arrhythmia (7), quantitative prognosis of mortality in patients with heart failure (8), and for risk stratification in patients undergoing percutaneous coronary intervention (9). Noninvasive imaging is the gatekeeper in the management of cardiovascular diseases, and the use of quantitative imaging data-driven phenotypic differentiation is an active area of investigation with opportunities for developing precision phenotyping models and algorithms (10,11).
Machine learning offers the potential to improve the accuracy and reliability of echocardiography, which is central to modern diagnosis and management of heart disease (12). Clinical utility depends entirely on the skill of users who are trained in image acquisition, analysis, and interpretation. Automated machine-learning systems may aid in the interpretation of a high volume of cardiac ultrasound images, reduce variability, and improve diagnostic accuracy, particularly for novice users with limited experience (2). This investigation, therefore, explored the development and validation of an ensemble machine-learning framework applied to speckle-tracking echocardiography (STE) data toward the goal of fully automated assessment of left ventricular (LV) morphology and function. Specifically, our machine-learning approach integrated 3 separate approaches into 1 algorithm: support vector machines (13), artificial neural networks (14), and random forests (15). STE by itself uses techniques that provide automated endocardial border detection, to which the current study applied the machine-learning algorithms for clinical decision making in terms of differentiating pathological remodeling, seen in hypertrophic cardiomyopathy (HCM) and physiological hypertrophy in athletes (ATH) as a clinical model to investigate the potential use of machine-learning techniques.
We identified a convenience cohort of 139 male subjects from the imaging core lab database at our institution (77 verified ATH cases and 62 verified HCM cases) (Figures 1 and 2). All subjects were in sinus rhythm. ATH had undergone screening echocardiograms as active, competitive professionals from the United Football League. The clinical diagnosis of HCM was made by the phenotypic presentation of unexplained LV hypertrophy with septal wall thickness >15 mm in the absence of known cardiovascular or systemic disease (16). We also included patients with gray-zone HCM (13 to 15 mm), who were additionally required to have cardiac magnetic resonance imaging demonstrating fibrosis on delayed gadolinium-enhanced images and/or a family history of HCM (17). Thus, echocardiographic diagnosis of HCM was supported by a positive genetic test or family history in 8 cases (13%); phenotypic confirmation was made by cardiac magnetic resonance in 39 cases (63%), of whom 27 (45%) were positive for fibrosis or myocardial enhancement on delayed gadolinium-enhanced cardiac magnetic resonance. The remaining 25 patients (32%) had the classical feature of asymmetric septal hypertrophy (>15 mm) on echocardiography. All subjects were in sinus rhythm, without known coronary artery disease, diabetes mellitus, conduction disturbances, or decreased ejection fractions (<50%). Patients were excluded if they had poor echocardiographic images and/or inadequate visualization of the left ventricle. The majority of HCM (90%) was nonobstructive. After the initial model was built, a secondary age- and phenotype-matched analysis was performed in which the 6 patients (10%) with any LV mid-cavity or outflow gradients were excluded. Local institutional ethics committee approved the study, and echocardiographic DICOM (Digital Imaging and Communications in Medicine) images were retrieved for further analysis.
All 2-dimensional (2D) echocardiographic studies were performed with a commercially available system equipped with a 2.5-MHz multifrequency phased array transducer. Digital routine gray-scale 2D loops from apical 4-chamber views with 3 consecutive beats were obtained. LV wall thickness, end-diastolic volume, end-systolic volume, and ejection fraction were calculated using the biplane Simpson method. The pulsed-wave Doppler-derived transmitral early diastolic velocity (E) and the late diastolic atrial contraction wave velocity (A) were calculated, and the ratio E/A was calculated. All measurements were made in ≥3 consecutive cardiac cycles and average values were used for the final analyses.
Two-dimensional cardiac performance analysis software (TomTec Imaging Systems, Munich, Germany) was used for frame-by-frame movement assessment of the stable patterns of LV speckles in the apical 4-chamber view. The LV endocardial borders were traced at the end-diastolic frame, identified as 1 frame before mitral valve closure at end diastole. LV speckle tracking was then performed during the cardiac cycle and the average as well as segmental instantaneous myocardial longitudinal and radial mechanical and geometric variables were obtained.
The geometric variables included LV volume, minimum LV diameter, maximum LV diameter, and change in volume over time. The mechanical variables included average velocity and segmental velocity, as well as velocity, longitudinal strain (LS), LS rate, radial strain, and radial strain rate at the level of the apex, middle, and base of the LV. Raw data were normalized to a percentage of systolic duration, with the onset of the QRS interval being 0% and aortic valve closure the equivalent to 100%. Temporal normalization was performed on the variables using Forsythe, Malcolm, and Moler cubic spline interpolation to obtain estimates for speckle-tracking variables at 20%, 40%, 50%, 60%, and 80% intervals during systole.
We systematically extracted mechanical variables available from STE for the systolic phase of the cardiac cycle [(20 different speckle-tracking–derived variables) × (6 fixed systolic time points) = 120 variables], as well as the best predictive time point in systole. A key challenge for classification is to identify the set of variables that are most relevant for classifying the 2 conditions. The variable importance was ascertained using information gain (IG) criterion and ranked using the Ranker method (18). Briefly, IG is defined as a metric of the effectiveness of a feature in classifying the training data. Information gain is measured as the amount by which the “entropy,” or simply “inhomogeneity” of the class, decreases upon inclusion of an additional feature. The individual contribution of each feature to perform the classification task is assessed and ranked according to information gain of different features.
Model building and predictive analyses
We examined models with repeated 10-fold cross-validation (10 repeats), which partitions the original sample into 10 disjoint subsets, uses 9 of those subsets in the training process, and then makes predictions about the remaining subset. To avoid biased predictions, we averaged model performance metrics across test folds.
We trained an ensemble classifier by using the results of a set of constituent classifiers by taking a (weighted) vote of their individual predictions (see the Online Appendix, Online Table 1, for further detail). This ensemble-learning predictive model comprised the following 3 algorithms: artificial neural networks, random forest, and support vector machines (Central Illustration, Online Figure 1). Briefly, an artificial neural network (19) is an algorithm designed to learn in a manner that is modeled after biological neural circuits. The function takes in a large number of inputs, transforms the input data using a set of weights that learns to recognize the patterns in the dataset, and then provides the output accordingly (Online Figure 1A). Random forest (20) is a decision-tree–based method constructed by creating a series of decision trees from bootstrapped training samples. The decision tree split is a random process, where a new division of data is constructed, rather than using the full set of predictors (Online Figure 1B). The support vector machine (21) seeks to find the widest gap in the multidimensional data space and classifies the cases based on where they are located in relation to the gap (Online Figure 1C). Furthermore, we applied a radial basis kernel to create a nonlinear boundary without having to deal with the computational difficulties of drastic feature space enlargement. Integrating the results of 3 machine-learning algorithms can provide additional assurance of validity in relatively smaller datasets (Online Appendix).
The model took predictions from each individual learning algorithm as input to build an ensemble-learning predictive model that classified each case (HCM or ATH) (Online Figure 1D) using a “majority-voting” scheme. The validation results from 10 experimental models (10-fold cross-validation) were then combined to provide a measure of the overall performance. The initial model performance was assessed for all ATH and HCM subjects (Figures 1 and 2). A subsequent subgroup analysis was performed for ATH patients with LV hypertrophy (>1.3-cm septal wall thickness) and younger HCM patients with no LV obstruction who were age-matched with ATH patients (<35 years of age). This subgroup analysis provided further confirmation of the approach and verified the applicability of the machine-learning model in clinically relevant cohorts.
Categorical variables were expressed as percentages and were compared using a chi-square test. Continuous variables were expressed as mean ± SD and compared using the independent sample Student t test, or the Mann-Whitney U test if not normally distributed. Comparisons of model performance with conventional echocardiographic parameters were assessed with a Fisher exact test. Variable importance was derived using information gain criterion and ranked using the Ranker method and implemented using the Weka workbench. True positive rate, false positive rate, precision, recall, F-measure, and receiver-operator characteristic curve were obtained as model metrics. The model outputs were then compared using area under the curve obtained from the receiver-operator characteristic curve. Data pre-processing was carried out in Microsoft Excel (version 12.0, Redmond, Washington) and custom Perl scripts were used to prepare input files for machine learning. Statistical analyses were performed using R (version 3.1.2, R Foundation, Vienna, Austria): a language and environment for statistical computing. Machine-learning algorithms and feature selection methods were implemented using R packages and Weka (9–11). A p value of <0.05 was considered statistically significant.
The demographic, clinical, and echocardiographic data for both groups are summarized in Table 1. Subjects with HCM were older (p < 0.001), had a higher body surface area (p < 0.001), and a higher systolic blood pressure (p = 0.0467). Conversely, ATH had bigger LV cavities (p < 0.001), lower ejection fraction (p < 0.001), higher E/A (p < 0.001), and higher e′ (p < 0.001).
On assessing contribution of different speckle-tracking–derived parameters (features) in differentiating the 2 groups, volume was the best predictor for differentiating HCM from ATH (IG = 0.24) followed by mid-LV segmental LS, average LS, and mid-LV segmental radial strain rate (IG = 0.134, 0.131, and 0.111, respectively). Other STE variables did not show any predictive power with IG of 0.
The above-mentioned features combined together in the model were next tested for the diagnostic ability at different time points in systole (cubic spline interpolation-derived measurements in systole at intervals separated by 10% of systolic progression from end diastole to end systole). The classification improved as systole progressed, with highest diagnostic value seen at end systole (100% systole; area under the curve = 0.795) for differentiating the 2 groups (Figure 3). The sensitivity and specificity for the optimal cutpoints of the area-under-the-curve curves were compared for the model and a few conventional echocardiographic variables. The model showed increased sensitivity and specificity relative to E/A (p < 0.01), early diastolic tissue velocity (p < 0.01), and LS (p = 0.04) (Table 2).
To understand how model refinement can be improved by clinically relevant demographics matching, the model was verified for an age-matched subset of 21 cases with nonobstructive HCM who were younger than 35 years and 25 athletes had a septal thickness >13 mm, which is comparable to that of HCM. The demographic, clinical, and echocardiographic data for both groups are summarized in Table 1. The model continued to exhibit diagnostic value with 96% sensitivity and 77% specificity for differentiating the 2 conditions (Table 2). The model was also compared with conventional parameters. The model showed equal sensitivities, but increased specificity relative to E/A (p < 0.01), average early diastolic tissue velocity (p < 0.01), and strain (p = 0.02) (Table 2).
The principal findings of this study include the following: 1) the use of a clinical model of HCM and ATH provided a robust demonstration of a machine-learning framework for the differentiation of clinically similar phenotypes in ATH and HCM; and 2) the integration of STE-based parameters in a machine-learning model demonstrated diagnostic ability comparable to conventional 2D echocardiographic- and Doppler-derived parameters used in clinical practice.
Echocardiographic differentiation between HCM and ATH
Adaptive changes in ATH hearts usually involves eccentric LV hypertrophy, especially in endurance athletes in activities such as long-distance running and swimming, where an increased LV thickness is coupled with increased chamber dimensions (22). However, in strength athletes, such as weight lifters, concentric hypertrophy may be seen, where hypertrophy is accompanied by a less notable increase in chamber dilation (23,24). LV hypertrophy in ATH may closely resemble HCM (25) with patients with septal thickness in the gray zone (13 to 15 mm) being particularly difficult to differentiate (26). This distinction is crucial, given that HCM is the most common cause of sudden death in North American athletes (27) and necessitates use of other features beyond the degree of LV hypertrophy to distinguish the 2 conditions.
Individual parameters such as family history, LV cavity size (<45 mm in favor of HCM), left atrial enlargement, unusual conduction patterns seen on an electrocardiogram, and response to deconditioning may help differentiate the 2 conditions (25). Similarly, Doppler parameters of LV diastolic dysfunction and filling pressures are described to be abnormal in HCM, whereas they are normal or even supernormal in ATH (22). More advanced measures of LV mechanical function using STE-derived parameters such as LS, radial strain, and tissue velocities have been reported to be attenuated in patients with HCM in comparison with ATH (28). However, integrating such a large spectrum of variables can be clinically challenging, particularly for an untrained reader. Whereas the quest for the single most accurate biomarker for disease differentiation may be less burdensome, clinical features or measurements in isolation are often unreliable and too simplistic for advanced clinical differentiation. Our analysis suggested that machine learning can integrate multiple variables such as several high-dimensional STE parameters for building advanced algorithms for differentiating ATH from HCM. Furthermore, despite the integration of more complex mathematical modeling into this process, the end result of the decision system can be simply understood by clinicians. Finally, we demonstrated that the diagnostic accuracy of a machine-learning model was comparable with analysis of functional data using manually read Doppler, tissue Doppler, or conventional longitudinal strain assessments that require considerable clinical expertise. Interestingly, the machine-learning automation vindicated traditional variables, such as peak LS and strain rate parameters, validating potential scalability and clinical utility of a machine-learning framework.
Advantages of machine-learning algorithms
Machine-learning methods do not discount the value of conventional echocardiographic assessment to differentiate cardiovascular diseases; rather, it offers opportunities to simplify the diagnostic process by providing a contemporary solution of integrating the increasing number of imaging parameters in a clinical imaging database (29). A trained eye often automatically integrates multiple attributes of interests in a moving image without statistical reasoning and these perceptual cues jointly contribute to clinical differentiation of patterns. Similarly, mathematical models in machine learning provide a platform to integrate multiple facets of information that may be potentially more useful for standardization of data interpretation. Firstly, high-volume data generated from cardiac imaging can be integrated in a multiparametric approach for pattern recognition and imaging-data–based disease phenotype characterization. Secondly, precision can be built into the data-driven diagnostic systems such that automated models can be used to classify an individual case; despite the complexity, the process can be automated and performed conveniently for a busy clinician. For example, we recently described a machine-learning approach that allows a very fast “within an average of 8 s” automatic calculation of STE-derived variables, such as longitudinal strain, with “zero” variability (30). We envision that automated algorithms in the near future will involve machine-learning frameworks for rapid diagnosis with high precision. Moreover, such algorithms may be delivered to the care provider in real time using cloud-computing–based solutions for avoiding delays in clinical diagnosis and therapeutic interventions. The potential value of machine-learning algorithms in personalized approaches cannot be discounted given the complexity of cardiovascular phenotypes and associated comorbidities (31).
The machine-learning framework discussed herein opens up promising frontiers for data-driven echocardiographic diagnostic and phenotyping systems. Incorporating machine-learning techniques in cardiology workflows introduces several challenges. First, the study comparisons and models were driven by a relatively small sample size; thus, further studies should be done on larger samples before applying the machine-learning algorithms considered herein. Machine learning with <100 samples per class may be prone to overfitting; therefore, we used ensemble-learning methods to overcome such overfitting (Online Appendix). Second, our algorithm was trained and evaluated on a particular cohort with specific demographic and clinical characteristics. Increased heterogeneity of our data samples will improve the generalizability of this framework.
Finally, our model was only assessed using limited space and time points in 2D echocardiographic images. For example, the data were driven by only the systolic phases and only the apical 4-chamber view and, thus, do not represent global values. Further studies should include all apical as well as short-axis views. Also, data with higher spatial and temporal resolution, 3D echocardiograms, as well as using data from other imaging modalities can yield improved results. As more samples become available, deep-learning models could also be used to attain near-perfect accuracy rates for the prediction models.
We envisage that predictive analytics, data modeling, and precision phenotyping of cardiovascular disease manifestations using echocardiographic data could help echocardiographers and new users of cardiac ultrasound to efficiently analyze and process large volumes of cardiac ultrasound data. In addition, as the capacity of health systems to generate imaging data grows precipitously, capabilities for handling this growth in information quantity must be devised and constructed. Development of and integration of these smart interpretation systems with paradigms such as telemedicine and software that enable automated endocardial border tracking can allow smart systems for cardiac diagnoses to pervade through even the most resource-burdened areas.
COMPETENCY IN PATIENT CARE AND PROCEDURAL SKILLS: Machine-learning algorithms can automate assessment of physiological STE data with diagnostic accuracy comparable to conventional 2D Doppler echocardiography.
TRANSLATIONAL OUTLOOK: With further experience in a wider diversity of settings, automated machine-learning systems could reduce variability and improve the diagnostic accuracy of STE in routine clinical practice.
The authors thank Drs. Mayank Kansal and Todd Hurst for providing data related to the ATH patients.
For supplemental Methods as well as a table and a figure, please see the online version of this paper.
The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript. Drs. Shameer and Dudley have received the grants from National Institutes of Health: National Institute of Diabetes and Digestive and Kidney Diseases (R01DK098242); National Cancer Institute (U54CA189201); Illuminating the Druggable Genome; Knowledge Management Center sponsored by National Institutes of Health Common Fund; National Cancer Institute (U54-CA189201-02); National Center for Advancing Translational Sciences (UL1TR000067); and Clinical and Translational Science Award. Dr. Dudley has received consulting fees or honoraria from Janssen Pharmaceuticals, GlaxoSmithKline, AstraZeneca, and Hoffman-La Roche; is a scientific advisor to LAM Therapeutics; and holds equity in NuMedii Inc., Ayasdi Inc., and Ontomics, Inc. Dr. Sengupta is a consultant for TeleHealthRobotics, Heart Test Labs, and Hitachi-Aloka Ltd. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose. Mr. Narula and Dr. Shameer contributed equally to this work. Presented at the American Society of Echocardiography Annual Scientific Sessions 2015 for Arthur Weyman Young Investigator Award. Mr. Narula was declared as the winner of the competition. P.K. Shah, MD, served as Guest Editor-in-Chief for this paper.
- Abbreviations and Acronyms
- early-to-late diastolic transmitral velocity ratio
- hypertrophic cardiomyopathy
- information gain
- longitudinal strain
- left ventricular
- speckle-tracking echocardiography
- Received May 12, 2016.
- Revision received July 27, 2016.
- Accepted August 17, 2016.
- American College of Cardiology Foundation
- Heidenreich P.A.,
- Albert N.M.,
- Allen L.A.,
- et al.
- Darcy A.M.,
- Louie A.K.,
- Roberts L.W.
- Shameer K.,
- Badgeley M.A.,
- Miotto R.,
- Glicksberg B.S.,
- Morgan J.W.,
- Dudley J.T.
- Vapnik V.N.
- Sengupta P.P.,
- Huang Y.M.,
- Bansal M.,
- et al.
- Ortiz J.,
- Ghefter C.G.,
- Silva C.E.,
- Sabbatini R.M.
- Mwangi B.,
- Soares J.C.,
- Hasan K.M.
- Marwick T.H.,
- Schwaiger M.
- Rochester N.,
- Holland J.H.,
- Haibt L.H.,
- Duda W.L.
- Gersh B.J.,
- Maron B.J.,
- Bonow R.O.,
- et al.
- Sheikh N.,
- Papadakis M.,
- Schnell F.,
- et al.
- Frank E.,
- Hall M.,
- Trigg L.,
- Holmes G.,
- Witten I.H.
- Haykin S.
- Pelliccia A.,
- Maron B.J.,
- De Luca R.,
- et al.
- Maron B.J.
- Maron B.J.,
- Pelliccia A.
- Chandra N.,
- Bastiaenen R.,
- Papadakis M.,
- Sharma S.
- Kansal M.M.,
- Lester S.J.,
- Surapaneni P.,
- et al.
- Nagueh S.F.
- Knackstedt C.,
- Bekkers S.C.,
- Schummers G.,
- et al.
- Scruggs S.B.,
- Watson K.,
- Su A.I.,
- et al.