Abstract
We present two methods for feature selection in high throughput transcriptomic data,in which the subsets of selected variables (the genes) are optima of a multi-objectivefunction. In the clinical trials, the number of embedded patient cases is never higher thanin the hundreds, while the number of gene expressions measured for each patient ishigher than tens of thousands. These trials aim to better understand the biology of thephenotypes at the genomic level, and to better predict the phenotypes in order to giveeach patient the best treatment.Our first method states that the gene subsets are the optima of a bi-objectivefunction. This function is a tradeoff between the size of the gene subset and thediscrimination of the phenotypes, expressed as the inter-class distance. Because the geneselection stage is independent of the prediction model, it is a filter method of featureselection. The second method aims to select gene subsets that will optimize theperformance of a specific prediction model. It is a wrapper approach of the featureselection problem. The optimal gene subsets are computed by a line search optimizationheuristic which maximizes the performances of a linear discriminant analysis.Using public datasets in oncology we compared our results to those of the mainprevious methods. Our optimization approach of the gene subset selection almost alwaysreturned subsets that were significantly smaller than those of the previous methods, theperformance of our predictors almost always being higher, and being more robust. In thetwo methods we searched the space of gene subsets for optima of an explicit multi-objective function. Meta-heuristic methods are well suited to address these optimizationproblems, specifically in high dimensional spaces.
Original language | English (US) |
---|---|
Title of host publication | Heuristics |
Subtitle of host publication | Theory and Applications |
Publisher | Nova Science Publishers, Inc. |
Pages | 311-324 |
Number of pages | 14 |
ISBN (Print) | 9781624176371 |
State | Published - Feb 2013 |
Externally published | Yes |
Keywords
- Bioinformatics
- DNA microarray analysis
- Feature selection
- Oncology
- Optimization
ASJC Scopus subject areas
- General Mathematics