Author + information
- Jon Dukea,b,
- Monica Chasea,b,
- Nate Poznanski-Ringa,b,
- Joel Martina,b,
- Rachel Fuhra,b and
- Arnaub Chatterjeea,b
Both clinical care and observational research benefit from systems that can efficiently identify patients with peripheral arterial disease (PAD). Electronic health records (EHR) can be useful in finding PAD patients via diagnostic and procedure codes (e.g., ICD-9), yet such approaches are challenging. Much of the information critical to detecting PAD is captured as free text, either in physician notes or vascular laboratory reports. Natural language processing (NLP) is a technique for extracting information from unstructured documents. We hypothesized that NLP could increase the identification of PAD patients in an EHR compared with using structured data alone.
We tested our hypothesis by developing and evaluating an NLP-based PAD detection model and comparing its output with that of an established code-based PAD definition (Kullo 2010). Using a combination of expert input and machine learning techniques, we developed an NLP algorithm to detect 5 clinical concepts highly suggestive of PAD: claudication, rest pain, diminished pulses, limb ischemia, and history of PAD. We also extracted ankle-brachial indices from lab reports, capturing patients with values < 0.9. We ran algorithms on all patients seen in two major health systems in Indianapolis between 2009 and 2014. To measure NLP performance, the authors manually reviewed 500 matching documents for presence or absence of PAD.
Significantly more patients were identified using the NLP-based algorithm than through structured data (42,070 vs 9,592, p<0.001). Of 43,811 total PAD patients in our population, NLP identified 96% compared with 22% found using structured data alone. The specificity of the NLP algorithm was 98%, with sources of error including negation errors and alternate etiologies for PAD symptoms (e.g., congenital anomaly).
While there are limitations, identification of PAD patients using NLP was significantly more robust (4-fold) compared to use of structured data alone. A more expansive NLP algorithm, incorporating radiologic and arteriogram findings, would yield an even larger cohort. NLP approaches could be used for both PAD quality improvement and research initiatives.
Poster Area, South Hall A1
Saturday, April 02, 2016, 10:00 a.m.-10:45 a.m.
Session Title: Vascular Medicine: Aortic and Peripheral Artery Diseases
Abstract Category: 44. Vascular Medicine: Non Coronary Arterial Disease
Presentation Number: 1110-217
- 2016 American College of Cardiology Foundation