Abstract
A common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection. Experiments were performed on three text datasets generated from a Wikipedia dump. Amount of retained information was estimated by classification accuracy. Even though the method is parametric, we show that, as opposed to other methods, once its parameter is chosen it can be applied to a number of similar problems (e.g. one value can be used for various datasets originating from Wikipedia). For a constant value of the parameter, dimensionality was reduced by from 78% to 90%, depending on the data set. Relative accuracy drop due to feature removal was less than 0.5% in those experiments.
Citations
-
2
CrossRef
-
0
Web of Science
-
2
Scopus
Authors (2)
Cite as
Full text
full text is not available in portal
Keywords
Details
- Category:
- Conference activity
- Type:
- publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
- Title of issue:
- Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. - Part 1 strony 319 - 325
- Language:
- English
- Publication year:
- 2013
- Bibliographic description:
- Rzeniewicz J., Szymański J.: Selecting Features with SVM// Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. - Part 1/ : Springer, 2013, s.319-325
- DOI:
- Digital Object Identifier (open in new tab) 10.1007/978-3-642-41822-8_40
- Verified by:
- Gdańsk University of Technology
seen 97 times