Author + information
To analyze the data about coronary heart disease (CHD) in an existing database based on the big data platform, and to build a mathematical model between the clinical indicators and the types of CHD to realize the prediction of CHD types. Machine learning methods were used to give reasonable treatment schemes to assist doctors' decision.
(1) The clinical data of 5000 patients with CHD were extracted from an existing database. The data included BMI, diabetes, blood pressure, age, sex, smoking history, hereditary disease history, pain position and duration, paroxysm frequency, disease types and side effects, etc. Hadoop, and other relative big data processing software were used to construct a clinical big data processing platform to analyze the data. (2) A contrast experiment was designed to test the new proposed classifier. In experiment group, BP neural network and naive Bayesian algorithm were used to analyze the type of CHD respectively. The final classification result of the new classifier was obtained via a Boolean decision maker based on probability which inputs came from both BP network and Naive Bayes classifier. In control group, logistic regression algorithm was used to calculate the prediction results. In 5000 samples, 75% were chosen to train the new classifier, and the other 25% defined as experiment samples were taken as the input for both experiment group and control group. According to results of the two groups, ROC curves was drawn and the two areas under each curve (AUC) were compared. (3) The threshold for exploration neighborhood was set according to the type of CHD. For patients in the same neighborhood, the weighted Pearson correlation coefficient was used to calculate the similarity between the un-treatment patient and the cured patient on the big data processing platform, so as to give the reasonable treatment plan such as medicine, PCI, and CABG. Three experts analyzed the plan and evaluated whether it was reasonable or not.
(1) In the samples, the patients with stable angina accounted for 32.3%, unstable angina 29.6%, myocardial infraction 15.1%, silent myocardial ischemia 14.2%, and ischemic cardiomyopathy 8.8%. The proportion of patients with hypertension was 68.4%, diabetes 22.8%, and dyslipidemia 35.3%. The highest incidence rate of side effects after taking nitrates, β blockers or CCB was occurred in 71-80y group (P=0.029). (2) The ROC of the new classifier in the experiment group had higher AUC value than that from logistic regressions with ration 1.057. (3) Big data platform could process 3 time data more than ever to train the new classifier without extra time consumption. (4) 84.7% of treatment plans predicted by using the weighted Pearson correlation coefficient was considered as reasonable by the experts.
It can improve the prediction accuracy by using the classifier combined BP neural network and naive Bayesian algorithm with the help of big data platform. Based on similarity analysis, the new system can effectively help doctors make decisions.