Abstract
Typical approaches to string comparing marks two strings as either different or equal without taking into account any similarity measures. Being able to judge similarity is however required for spelling error corrections, as we want to find the best match for a given word. In this paper we present a bi2quadro-grams method for spelling errors correction. The method proposed uses different n-grams dimension for the source (checked) and target (from the dictionary) words. For different types of errors proper weights were introduced. This way an increase in the quality and performance of the algorithm can be observed and the method becomes dedicated to the task of spelling errors correction. The results obtained so far suggest that the method is a viable solution competitive to other currently used approaches. The paper presents the proposed method, test suite and experimental results. Some discussion is also presented.
Citations
-
0
CrossRef
-
0
Web of Science
-
1
Scopus
Authors (4)
Cite as
Full text
- Publication version
- Accepted or Published Version
- License
- Copyright (2018 IEEE)
Keywords
Details
- Category:
- Conference activity
- Type:
- publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
- Title of issue:
- 2018 11th International Conference on Human System Interaction (HSI) strony 356 - 361
- Language:
- English
- Publication year:
- 2018
- Bibliographic description:
- Boiński T. M., Zimnicki A., Kujawski J., Draszawka K.: Evaluating Asymmetric N-Grams as Spell-Checking Mechanism// 2018 11th International Conference on Human System Interaction (HSI)/ : , 2018, s.356-361
- DOI:
- Digital Object Identifier (open in new tab) 10.1109/hsi.2018.8431345
- Bibliography: test
-
- C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval. Cambridge University Press, 2008.
- T. Boiński and A. Chojnowski, "Towards facts extraction from text in Polish language," in INnovations in Intelligent SysTems and Applications (INISTA), 2017 IEEE International Conference on. IEEE, 2017, pp. 13-17. open in new tab
- J. Szymański and W. Duch, "Semantic memory knowledge acquisition through active dialogues," in Neural Networks, 2007. IJCNN 2007. International Joint Conference on. IEEE, 2007, pp. 536-541. open in new tab
- J. Szymański and T. Boiński, "Improvement of Imperfect String Match- ing Based on Asymmetric n-Grams," in Computational Collective Intel- ligence. Technologies and Applications. Springer, 2013, pp. 306-315. open in new tab
- R. Hamming, "Error detecting and error correcting codes," Bell System technical journal, vol. 29, no. 2, pp. 147-160, 1950. open in new tab
- V. I. Lcvenshtcin, "Binary codes capable of correcting deletions, inser- tions, and reversals," in Soviet Physics-Doklady, vol. 10, no. 8, 1966.
- C. Sulzberger, "Efficient Implementation of the Levenshtein-Algorithm," http://www.levenshtein.net/, 2009, [Online: 27.02.2018].
- F. J. Damerau, "A technique for computer detection and correction of spelling errors," Commun. ACM, vol. 7, pp. 171-176, March 1964. [Online]. Available: \url{http://doi.acm.org/10.1145/363958.363994} open in new tab
- A. Boguszewski, J. Szymański, and K. Draszawka, "Towards increasing f-measure of approximate string matching in o (1) complexity," in Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on. IEEE, 2016, pp. 527-532. open in new tab
- S. L. Hantler, M. M. Laker, J. Lenchner, and D. Milch, "Methods and apparatus for performing spelling corrections using one or more variant hash tables," 2017, uS Patent 9,552,349.
- R. Udupa and S. Kumar, "Hashing-based approaches to spelling cor- rection of personal names," in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010, pp. 1256-1265.
- K. Draszawka and J. Szymański, "Analysis of denoising autoencoder properties through misspelling correction task," in Conference on Computational Collective Intelligence Technologies and Applications. Springer, 2017, pp. 438-447. open in new tab
- A. M. Robertson and P. Willett, "Applications of n-grams in textual information systems," Journal of Documentation, vol. 54, no. 1, pp. 48- 67, 1998. open in new tab
- P. Majumder, M. Mitra, and B. Chaudhuri, "N-gram: a language inde- pendent approach to ir and nlp," in International conference on universal knowledge and language, 2002. open in new tab
- K. Atkinson, "GNU Aspell," http://aspell.net/, 2011, [Online: 28.02.2018]. open in new tab
- G. Navarro, R. Baeza-Yates, E. Sutinen, and J. Tarhio, "Indexing methods for approximate string matching," IEEE Data Engineering Bulletin, vol. 24, no. 4, pp. 19-27, 2001.
- Wikipedia, "Wikipedia:Lists of common misspellings,"
- Sources of funding:
-
- Statutory activity/subsidy
- Verified by:
- Gdańsk University of Technology
seen 116 times