TY - GEN
T1 - Feature subset selection for inferring relative importance of taxonomy
AU - Ditzler, Gregory
AU - Rosen, Gail
N1 - Publisher Copyright:
Copyright © 2014 ACM.
PY - 2014/9/20
Y1 - 2014/9/20
N2 - Examining the bacterial or functional differences between multiple habitats/populations/phenotypes plays an important role in making inferences about the roles that the taxonomy and functional profiles can take on in microbial ecology. It is therefore important to the field of comparative metagenomics, using & diversity, that methods or algorithms can detect the importance of particular subsets of variables that best differentiate the multiple phenotypes in the data. Given todays genomic data deluge efficient methods that can carry out these inferences cannot be understated enough. We assume observations are collected from a multitude of different environments (e.g., males vs. females, control vs. stimulus, etc.), and each observation is comprised of hundreds or thousands of different taxa/functional features (i.e., 16S or whole genome shotgun). Our goal in this work is to examine the role, assumptions, and inferences that feature subset selection can provide the field of microbial ecology and comparative metagenomics. Specifically we examine feature subset selection algorithms using embedded and filter approaches to infer taxa importance on data collected from the human gut microbiome We compare several widely adopted approaches from machine learning including greedy algorithms and l1 regularization methods, as well as some software tools provided with QIIME, on data collected from the American Gut Project and other canonical studies of the human gut microbiome. We find that there are very few OTUs that carry information in regards to predicting the sex of a gut sample, and that Bacteroidetes is quite frequently found in the top ranked OTUs.
AB - Examining the bacterial or functional differences between multiple habitats/populations/phenotypes plays an important role in making inferences about the roles that the taxonomy and functional profiles can take on in microbial ecology. It is therefore important to the field of comparative metagenomics, using & diversity, that methods or algorithms can detect the importance of particular subsets of variables that best differentiate the multiple phenotypes in the data. Given todays genomic data deluge efficient methods that can carry out these inferences cannot be understated enough. We assume observations are collected from a multitude of different environments (e.g., males vs. females, control vs. stimulus, etc.), and each observation is comprised of hundreds or thousands of different taxa/functional features (i.e., 16S or whole genome shotgun). Our goal in this work is to examine the role, assumptions, and inferences that feature subset selection can provide the field of microbial ecology and comparative metagenomics. Specifically we examine feature subset selection algorithms using embedded and filter approaches to infer taxa importance on data collected from the human gut microbiome We compare several widely adopted approaches from machine learning including greedy algorithms and l1 regularization methods, as well as some software tools provided with QIIME, on data collected from the American Gut Project and other canonical studies of the human gut microbiome. We find that there are very few OTUs that carry information in regards to predicting the sex of a gut sample, and that Bacteroidetes is quite frequently found in the top ranked OTUs.
UR - http://www.scopus.com/inward/record.url?scp=84920749750&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84920749750&partnerID=8YFLogxK
U2 - 10.1145/2649387.2660824
DO - 10.1145/2649387.2660824
M3 - Conference contribution
AN - SCOPUS:84920749750
T3 - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
SP - 673
EP - 679
BT - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PB - Association for Computing Machinery
T2 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014
Y2 - 20 September 2014 through 23 September 2014
ER -