We present two methods for feature selection in high throughput transcriptomic data, in which the subsets of selected variables (the genes) are optima of a multi-objective function. In the clinical trials, the number of embedded patient cases is never higher than in the hundreds, while the number of gene expressions measured for each patient is higher than tens of thousands. These trials aim to better understand the biology of the phenotypes at the genomic level, and to better predict the phenotypes in order to give each patient the best treatment. Our first method states that the gene subsets are the optima of a bi-objective function. This function is a tradeoff between the size of the gene subset and the discrimination of the phenotypes, expressed as the inter-class distance. Because the gene selection stage is independent of the prediction model, it is a filter method of feature selection. The second method aims to select gene subsets that will optimize the performance of a specific prediction model. It is a wrapper approach of the feature selection problem. The optimal gene subsets are computed by a line search optimization heuristic which maximizes the performances of a linear discriminant analysis. Using public datasets in oncology we compared our results to those of the main previous methods. Our optimization approach of the gene subset selection almost always returned subsets that were significantly smaller than those of the previous methods, the performance of our predictors almost always being higher, and being more robust. In the two methods we searched the space of gene subsets for optima of an explicit multi-objective function. Metaheuristic methods are well suited to address these optimization problems, specifically in high dimensional spaces.
|Original language||English (US)|
|Title of host publication||Mathematical Research Summaries|
|Publisher||Nova Science Publishers, Inc.|
|Number of pages||2|
|State||Published - Jan 1 2017|
ASJC Scopus subject areas