Evaluating Asymmetric N-Grams as Spell-Checking Mechanism - Publication - Bridge of Knowledge

Search

Evaluating Asymmetric N-Grams as Spell-Checking Mechanism

Abstract

Typical approaches to string comparing marks two strings as either different or equal without taking into account any similarity measures. Being able to judge similarity is however required for spelling error corrections, as we want to find the best match for a given word. In this paper we present a bi2quadro-grams method for spelling errors correction. The method proposed uses different n-grams dimension for the source (checked) and target (from the dictionary) words. For different types of errors proper weights were introduced. This way an increase in the quality and performance of the algorithm can be observed and the method becomes dedicated to the task of spelling errors correction. The results obtained so far suggest that the method is a viable solution competitive to other currently used approaches. The paper presents the proposed method, test suite and experimental results. Some discussion is also presented.

Citations

  • 0

    CrossRef

  • 0

    Web of Science

  • 1

    Scopus

Cite as

Full text

download paper
downloaded 53 times
Publication version
Accepted or Published Version
License
Copyright (2018 IEEE)

Keywords

Details

Category:
Conference activity
Type:
publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
Title of issue:
2018 11th International Conference on Human System Interaction (HSI) strony 356 - 361
Language:
English
Publication year:
2018
Bibliographic description:
Boiński T. M., Zimnicki A., Kujawski J., Draszawka K.: Evaluating Asymmetric N-Grams as Spell-Checking Mechanism// 2018 11th International Conference on Human System Interaction (HSI)/ : , 2018, s.356-361
DOI:
Digital Object Identifier (open in new tab) 10.1109/hsi.2018.8431345
Bibliography: test
  1. C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval. Cambridge University Press, 2008.
  2. T. Boiński and A. Chojnowski, "Towards facts extraction from text in Polish language," in INnovations in Intelligent SysTems and Applications (INISTA), 2017 IEEE International Conference on. IEEE, 2017, pp. 13-17. open in new tab
  3. J. Szymański and W. Duch, "Semantic memory knowledge acquisition through active dialogues," in Neural Networks, 2007. IJCNN 2007. International Joint Conference on. IEEE, 2007, pp. 536-541. open in new tab
  4. J. Szymański and T. Boiński, "Improvement of Imperfect String Match- ing Based on Asymmetric n-Grams," in Computational Collective Intel- ligence. Technologies and Applications. Springer, 2013, pp. 306-315. open in new tab
  5. R. Hamming, "Error detecting and error correcting codes," Bell System technical journal, vol. 29, no. 2, pp. 147-160, 1950. open in new tab
  6. V. I. Lcvenshtcin, "Binary codes capable of correcting deletions, inser- tions, and reversals," in Soviet Physics-Doklady, vol. 10, no. 8, 1966.
  7. C. Sulzberger, "Efficient Implementation of the Levenshtein-Algorithm," http://www.levenshtein.net/, 2009, [Online: 27.02.2018].
  8. F. J. Damerau, "A technique for computer detection and correction of spelling errors," Commun. ACM, vol. 7, pp. 171-176, March 1964. [Online]. Available: \url{http://doi.acm.org/10.1145/363958.363994} open in new tab
  9. A. Boguszewski, J. Szymański, and K. Draszawka, "Towards increasing f-measure of approximate string matching in o (1) complexity," in Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on. IEEE, 2016, pp. 527-532. open in new tab
  10. S. L. Hantler, M. M. Laker, J. Lenchner, and D. Milch, "Methods and apparatus for performing spelling corrections using one or more variant hash tables," 2017, uS Patent 9,552,349.
  11. R. Udupa and S. Kumar, "Hashing-based approaches to spelling cor- rection of personal names," in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010, pp. 1256-1265.
  12. K. Draszawka and J. Szymański, "Analysis of denoising autoencoder properties through misspelling correction task," in Conference on Computational Collective Intelligence Technologies and Applications. Springer, 2017, pp. 438-447. open in new tab
  13. A. M. Robertson and P. Willett, "Applications of n-grams in textual information systems," Journal of Documentation, vol. 54, no. 1, pp. 48- 67, 1998. open in new tab
  14. P. Majumder, M. Mitra, and B. Chaudhuri, "N-gram: a language inde- pendent approach to ir and nlp," in International conference on universal knowledge and language, 2002. open in new tab
  15. K. Atkinson, "GNU Aspell," http://aspell.net/, 2011, [Online: 28.02.2018]. open in new tab
  16. G. Navarro, R. Baeza-Yates, E. Sutinen, and J. Tarhio, "Indexing methods for approximate string matching," IEEE Data Engineering Bulletin, vol. 24, no. 4, pp. 19-27, 2001.
  17. Wikipedia, "Wikipedia:Lists of common misspellings,"
Sources of funding:
  • Statutory activity/subsidy
Verified by:
Gdańsk University of Technology

seen 116 times

Recommended for you

Meta Tags