TY - JOUR
T1 - Support vector machines with adaptive Lq penalty
AU - Liu, Yufeng
AU - Helen Zhang, Hao
AU - Park, Cheolwoo
AU - Ahn, Jeongyoun
N1 - Funding Information:
The authors would like to thank two anonymous reviewers for their constructive comments and suggestions. Yufeng Liu's research was partially supported by the National Science Foundation Grant DMS-0606577 and the UNC Junior Faculty Development Award. Hao Helen Zhang's research was partially supported by the National Science Foundation Grants DMS-0405913 and DMS-0645293.
PY - 2007/8/15
Y1 - 2007/8/15
N2 - The standard support vector machine (SVM) minimizes the hinge loss function subject to the L2 penalty or the roughness penalty. Recently, the L1 SVM was suggested for variable selection by producing sparse solutions [Bradley, P., Mangasarian, O., 1998. Feature selection via concave minimization and support vector machines. In: Shavlik, J. (Ed.), ICML'98. Morgan Kaufmann, Los Altos, CA; Zhu, J., Hastie, T., Rosset, S., Tibshirani, R., 2003. 1-norm support vector machines. Neural Inform. Process. Systems 16]. These learning methods are non-adaptive since their penalty forms are pre-determined before looking at data, and they often perform well only in a certain type of situation. For instance, the L2 SVM generally works well except when there are too many noise inputs, while the L1 SVM is more preferred in the presence of many noise variables. In this article we propose and explore an adaptive learning procedure called the Lq SVM, where the best q > 0 is automatically chosen by data. Both two- and multi-class classification problems are considered. We show that the new adaptive approach combines the benefit of a class of non-adaptive procedures and gives the best performance of this class across a variety of situations. Moreover, we observe that the proposed Lq penalty is more robust to noise variables than the L1 and L2 penalties. An iterative algorithm is suggested to solve the Lq SVM efficiently. Simulations and real data applications support the effectiveness of the proposed procedure.
AB - The standard support vector machine (SVM) minimizes the hinge loss function subject to the L2 penalty or the roughness penalty. Recently, the L1 SVM was suggested for variable selection by producing sparse solutions [Bradley, P., Mangasarian, O., 1998. Feature selection via concave minimization and support vector machines. In: Shavlik, J. (Ed.), ICML'98. Morgan Kaufmann, Los Altos, CA; Zhu, J., Hastie, T., Rosset, S., Tibshirani, R., 2003. 1-norm support vector machines. Neural Inform. Process. Systems 16]. These learning methods are non-adaptive since their penalty forms are pre-determined before looking at data, and they often perform well only in a certain type of situation. For instance, the L2 SVM generally works well except when there are too many noise inputs, while the L1 SVM is more preferred in the presence of many noise variables. In this article we propose and explore an adaptive learning procedure called the Lq SVM, where the best q > 0 is automatically chosen by data. Both two- and multi-class classification problems are considered. We show that the new adaptive approach combines the benefit of a class of non-adaptive procedures and gives the best performance of this class across a variety of situations. Moreover, we observe that the proposed Lq penalty is more robust to noise variables than the L1 and L2 penalties. An iterative algorithm is suggested to solve the Lq SVM efficiently. Simulations and real data applications support the effectiveness of the proposed procedure.
KW - Adaptive penalty
KW - Classification
KW - Shrinkage
KW - Support vector machine
KW - Variable selection
UR - http://www.scopus.com/inward/record.url?scp=34547234238&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547234238&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2007.02.006
DO - 10.1016/j.csda.2007.02.006
M3 - Article
AN - SCOPUS:34547234238
SN - 0167-9473
VL - 51
SP - 6380
EP - 6394
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
IS - 12
ER -