Optimal SVM parameter selection for non-separable and unbalanced datasets

Peng Jiang, Samy Missoum, Zhao Chen

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

This article presents a study of three validation metrics used for the selection of optimal parameters of a support vector machine (SVM) classifier in the case of non-separable and unbalanced datasets. This situation is often encountered when the data is obtained experimentally or clinically. The three metrics selected in this work are the area under the ROC curve (AUC), accuracy, and balanced accuracy. These validation metrics are tested using computational data only, which enables the creation of fully separable sets of data. This way, non-separable datasets, representative of a real-world problem, can be created by projection onto a lower dimensional sub-space. The knowledge of the separable dataset, unknown in real-world problems, provides a reference to compare the three validation metrics using a quantity referred to as the “weighted likelihood”. As an application example, the study investigates a classification model for hip fracture prediction. The data is obtained from a parameterized finite element model of a femur. The performance of the various validation metrics is studied for several levels of separability, ratios of unbalance, and training set sizes.

Original languageEnglish (US)
Pages (from-to)523-535
Number of pages13
JournalStructural and Multidisciplinary Optimization
Volume50
Issue number4
DOIs
StatePublished - Oct 2014

Keywords

  • Cross validation
  • Non-separable and unbalanced datasets
  • Support vector machines
  • Validation metrics

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Optimal SVM parameter selection for non-separable and unbalanced datasets'. Together they form a unique fingerprint.

Cite this