Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR

Gondy Leroy, Yang Gu, Sydney Pettygrove, Margaret Kurzius-Spencer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Using electronic health records of children evaluated for Autism Spectrum Disorders, we are developing a decision support system for automated diagnostic criteria extraction and case classification. We manually created 92 lexicons which we tested as features for classification and compared with features created automatically using word embedding. The expert annotations used for manual lexicon creation provided seed terms that were expanded with the 15 most similar terms (Word2Vec). The resulting 2,200 terms were clustered in 92 clusters parallel to the manually created lexicons. We compared both sets of features to classify case status with a FF\BP neural network (NN) and C5.0 decision tree. For manually created lexicons, classification accuracy was 76.92% for the NN and 84.60% for C5.0. For the automatically created lexicons, accuracy was 79.78% for the NN and 86.81% for C5.0. Automated lexicon creation required a much shorter development time and brought similarly high quality outcomes.

Original languageEnglish (US)
Title of host publicationNatural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Proceedings
EditorsFlavius Frasincar, Ashwin Ittoo, Elisabeth Metais, Le Minh Nguyen
PublisherSpringer-Verlag
Pages34-37
Number of pages4
ISBN (Print)9783319595689
DOIs
StatePublished - 2017
Event22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017 - Liege, Belgium
Duration: Jun 21 2017Jun 23 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10260 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017
Country/TerritoryBelgium
CityLiege
Period6/21/176/23/17

Keywords

  • Autism spectrum disorders
  • Classification
  • Clustering
  • EHR
  • Electronic health records
  • NLP
  • Natural language processing
  • Word embedding

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR'. Together they form a unique fingerprint.

Cite this