Abstrakt
A common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection. Experiments were performed on three text datasets generated from a Wikipedia dump. Amount of retained information was estimated by classification accuracy. Even though the method is parametric, we show that, as opposed to other methods, once its parameter is chosen it can be applied to a number of similar problems (e.g. one value can be used for various datasets originating from Wikipedia). For a constant value of the parameter, dimensionality was reduced by from 78% to 90%, depending on the data set. Relative accuracy drop due to feature removal was less than 0.5% in those experiments.
Cytowania
-
2
CrossRef
-
0
Web of Science
-
2
Scopus
Autorzy (2)
Cytuj jako
Pełna treść
pełna treść publikacji nie jest dostępna w portalu
Słowa kluczowe
Informacje szczegółowe
- Kategoria:
- Aktywność konferencyjna
- Typ:
- publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
- Tytuł wydania:
- Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. - Part 1 strony 319 - 325
- Język:
- angielski
- Rok wydania:
- 2013
- Opis bibliograficzny:
- Rzeniewicz J., Szymański J.: Selecting Features with SVM// Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. - Part 1/ : Springer, 2013, s.319-325
- DOI:
- Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1007/978-3-642-41822-8_40
- Weryfikacja:
- Politechnika Gdańska
wyświetlono 101 razy