Crowdsourcing-Based Evaluation of Automatic References Between WordNet and Wikipedia - Publication - Bridge of Knowledge

Search

Crowdsourcing-Based Evaluation of Automatic References Between WordNet and Wikipedia

Abstract

The paper presents an approach to build references (also called mappings) between WordNet and Wikipedia. We propose four algorithms used for automatic construction of the references. Then, based on an aggregation algorithm, we produce an initial set of mappings that has been evaluated in a cooperative way. For that purpose, we implement a system for the distribution of evaluation tasks, that have been solved by the user community. To make the tasks more attractive, we embed them into a game. Results show the initial mappings have good quality, and they have also been improved by the community. As a result, we deliver a high quality dataset of the mappings between two lexical repositories: WordNet and Wikipedia, that can be used in a wide range of NLP tasks. We also show that the framework for collaborative validation can be used in other tasks that require human judgments.

Citations

  • 4

    CrossRef

  • 0

    Web of Science

  • 3

    Scopus

Cite as

Full text

download paper
downloaded 1629 times
Publication version
Accepted or Published Version
License
Copyright (World Scientific Publishing Company)

Keywords

Details

Category:
Articles
Type:
artykuł w czasopiśmie wyróżnionym w JCR
Published in:
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING no. 29, edition 03, pages 317 - 344,
ISSN: 0218-1940
Language:
English
Publication year:
2019
Bibliographic description:
Szymański J., Boiński T.: Crowdsourcing-Based Evaluation of Automatic References Between WordNet and Wikipedia// INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING. -Vol. 29, iss. 03 (2019), s.317-344
DOI:
Digital Object Identifier (open in new tab) 10.1142/s0218194019500141
Bibliography: test
  1. J. Szymański, H. Krawczyk, and M. Deptula, Retrieval with semantic sieve, in Intel- ligent Information and Database Systems, ser. Lecture Notes in Computer Science, A. Selamat, N. Nguyen, and H. Haron, Eds. Springer Berlin Heidelberg, 2013, vol. 7802, pp. 236-245. open in new tab
  2. T. Berners-Lee, J. Hendler, O. Lassila et al., The semantic web, Scientific american, vol. 284, no. 5, pp. 28-37, 2001. open in new tab
  3. Y. Ding, D. Fensel, M. Klein, and B. Omelayenko, The semantic web: yet another hip? Data & Knowledge Engineering, vol. 41, no. 2-3, pp. 205-227, 2002. open in new tab
  4. A. Maedche and S. Staab, Ontology learning for the semantic web, Intelligent Sys- tems, IEEE, vol. 16, no. 2, pp. 72-79, 2001. open in new tab
  5. K. Goczy la, T. Grabowska, W. Waloszek, and M. Zawadzki, The knowledge April 11, 2018 11:53 WSPC/INSTRUCTION FILE cooperWiki-WN open in new tab
  6. Crowdsourcing Based Evaluation of Automatic References 25 open in new tab
  7. cartography-a new approach to reasoning over description logics ontologies, SOF- SEM 2006: Theory and Practice of Computer Science, pp. 293-302, 2006. open in new tab
  8. H. Sun, W. Fan, W. Shen, and T. Xiao, Ontology-based interoperation model of collaborative product development, Journal of Network and Computer Applications, vol. 35, no. 1, pp. 132-144, 2011. open in new tab
  9. D. Vallet, M. Fernández, and P. Castells, An ontology-based information retrieval model, The Semantic Web: Research and Applications, pp. 103-110, 2005. open in new tab
  10. J. Sowa, Principles of semantic networks. Morgan Kaufmann, 1991. open in new tab
  11. C. Bizer, T. Heath, and T. Berners-Lee, Linked data -the story so far, International journal on semantic web and information systems, vol. 5, no. 3, pp. 1-22, 2009. open in new tab
  12. A. Gomez-Perez, M. Fernández-López, and O. Corcho, Ontological engineering. Springer Heidelberg, 2004, vol. 139.
  13. L. Specia and E. Motta, Integrating folksonomies with the semantic web, in The semantic web: research and applications. Springer, 2007, pp. 624-639. open in new tab
  14. R. Studer, V. R. Benjamins, and D. Fensel, Knowledge engineering: principles and methods, Data & knowledge engineering, vol. 25, no. 1, pp. 161-197, 1998. open in new tab
  15. L. Von Ahn, Games with a purpose, Computer, vol. 39, no. 6, pp. 92-94, 2006.
  16. M. Ruiz-Casado, E. Alfonseca, and P. Castells, Automatic assignment of wikipedia encyclopedic entries to wordnet synsets, Advances in Web Intelligence, pp. 380-386, 2005. open in new tab
  17. J. Szymański and D. Kilanowski, Wikipedia and WordNet integration based on words co-occurrences, Proceedings of 30th International Conference Information Systems, Architecture and Technology, vol. 1, pp. 93-103, 2009.
  18. R. Schenkel, F. M. Suchanek, and G. Kasneci, Yawn: A semantically annotated wikipedia xml corpus, Proceedings of 12th Symposium on Database Systems for Busi- ness, 2007.
  19. F. M. Suchanek, G. Kasneci, and G. Weikum, Yago: A large ontology from wikipedia and wordnet, Web Semantics: Science, Services and Agents on the World Wide Web, vol. 6, no. 3, pp. 203-217, 2008. open in new tab
  20. F. Suchanek, G. Kasneci, and G. Weikum, Yago: a core of semantic knowledge, in Proceedings of the 16th international conference on World Wide Web. ACM, 2007, pp. 697-706. open in new tab
  21. R. Mihalcea, T. Chklovski, and A. Kilgarriff, The Senseval-3 English lexical sample task, in Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. Barcelona, Spain;, 2004, pp. 25-28.
  22. D. Nadeau and S. Sekine, A survey of named entity recognition and classification, Lingvisticae Investigationes, vol. 30, no. 1, pp. 3-26, 2007. open in new tab
  23. D. Lenat, Cyc: A large-scale investment in knowledge infrastructure, Communications of the ACM, vol. 38, no. 11, pp. 33-38, 1995. open in new tab
  24. D. A. Ferrucci, Introduction to this is watson, IBM Journal of Research and Devel- opment, vol. 56, no. 3.4, pp. 1-1, 2012. open in new tab
  25. S. P. Ponzetto and R. Navigli, Large-scale taxonomy mapping for restructuring and integrating wikipedia. in IJCAI, vol. 9, 2009, pp. 2083-2088.
  26. N. Reiter, M. Hartung, and A. Frank, A resource-poor approach for linking ontology classes to wikipedia articles, in Proceedings of the 2008 Conference on Semantics in Text Processing. Association for Computational Linguistics, 2008, pp. 381-387. open in new tab
  27. D. Milne and I. H. Witten, Learning to link with wikipedia, in Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008, pp. 509-518. open in new tab
  28. E. Niemann and I. Gurevych, The people's web meets linguistic knowledge: automatic April 11, 2018 11:53 WSPC/INSTRUCTION FILE cooperWiki-WN
  29. Julian Szymański and Tomasz Boiński sense alignment of wikipedia and wordnet, in Proceedings of the Ninth International Conference on Computational Semantics. Association for Computational Linguistics, 2011, pp. 205-214.
  30. D. P. Anderson and G. Fedak, The computational and storage potential of volun- teer computing, in Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on, vol. 1. IEEE, 2006, pp. 73-80. open in new tab
  31. J. Howe. (2006) Crowdsourcing: A definition. http://www.crowdsourcing.com/cs/ 2006/06/crowdsourcing\_a.html.[Online, accessed: 10.10.2017].
  32. A. Kosorukoff, Human based genetic algorithm, in Systems, Man, and Cybernetics, 2001 IEEE International Conference on, vol. 5. IEEE, 2001, pp. 3464-3469. open in new tab
  33. D. Wightman, Crowdsourcing human-based computation, in Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries. ACM, 2010, pp. 551-560. open in new tab
  34. J. Simko and M. Bieliková, Games with a purpose: User generated valid metadata for personal archives, in Semantic Media Adaptation and Personalization (SMAP), 2011 Sixth International Workshop on. IEEE, 2011, pp. 45-50. open in new tab
  35. A. Swanson, M. Kosmala, C. Lintott, R. Simpson, A. Smith, and C. Packer, Snapshot serengeti, high-frequency annotated camera trap images of 40 mammalian species in an african savanna, Scientific data, vol. 2, p. 150026, 2015. open in new tab
  36. L. Von Ahn and L. Dabbish, Designing games with a purpose, Communications of the ACM, vol. 51, no. 8, pp. 58-67, 2008. open in new tab
  37. L. Von Ahn and L. Dabbish, Labeling images with a computer game, in Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2004, pp. 319-326. open in new tab
  38. L. Von Ahn, S. Ginosar, M. Kedia, and M. Blum, Improving image search with phetch, in Acoustics, speech and signal processing, 2007. icassp 2007. ieee interna- tional conference on, vol. 4. IEEE, 2007, pp. IV-1209. open in new tab
  39. L. Von Ahn, R. Liu, and M. Blum, Peekaboom: a game for locating objects in images, in Proceedings of the SIGCHI conference on Human Factors in computing systems. ACM, 2006, pp. 55-64. open in new tab
  40. E. L. Law, L. Von Ahn, R. B. Dannenberg, and M. Crawford, TagATune: A game for music and sound annotation, in ISMIR, vol. 3, 2007, p. 2. open in new tab
  41. J. Simko, Semantics discovery via human computation games, Semantic Web: Ontol- ogy and Knowledge Base Enabled Tools, Services, and Applications, p. 286, 2013. open in new tab
  42. L. Von Ahn, B. Maurer, C. McMillen, D. Abraham, and M. Blum, recaptcha: Human- based character recognition via web security measures, Science, vol. 321, no. 5895, pp. 1465-1468, 2008. open in new tab
  43. M. Foley, prove you're human: Fetishizing material embodiment and immaterial labor in information networks, Critical Studies in Media Communication, vol. 31, no. 5, pp. 365-379, 2014. open in new tab
  44. AJT, Lessons from Duolingo's Effort to Support Free Language Learn- ing from Crowdsourcing, https://digit.hbs.org/submission/lessons-from\ \-duolingos-effort-to-support-free-language-learning-from-crowdsourcing, 2015. [Online, accessed: 12.05.2017] open in new tab
  45. D. Vannella, D. Jurgens, D. Scarfini, D. Toscani, and R. Navigli, Validating and extending semantic knowledge bases using video games with a purpose. in ACL (1), 2014, pp. 1294-1304. open in new tab
  46. D. Jurgens and R. Navigli, It's all fun and games until someone annotates: Video games with a purpose for linguistic annotation, Transactions of the Association of Computational Linguistics, vol. 2, no. 1, pp. 449-464, 2014. open in new tab
  47. J. Szymański and T. Boiński, Improvement of imperfect string matching based on asymmetric n-grams, in Computational Collective Intelligence. Technologies and Ap- plications, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013, vol. 8083, pp. 306-315. open in new tab
  48. F. J. Damerau, A technique for computer detection and correction of spelling errors, Commun. ACM, vol. 7, pp. 171-176, March 1964. open in new tab
  49. G. Hripcsak and A. Rothschild, Agreement, the f-measure, and reliability in infor- mation retrieval, Journal of the American Medical Informatics Association, vol. 12, no. 3, pp. 296-298, 2005. open in new tab
  50. R. Korytkowski and J. Szymanski, Collaborative approach to WordNet and Wikipedia integration, in The Second International Conference on Advanced Collaborative Net- works, Systems and Applications, COLLA, 2012, pp. 23-28.
  51. J. Szymański, Mining relations between Wikipedia categories, in Networked Digital Technologies. Springer, 2010, pp. 248-255. open in new tab
  52. J. Szymański, Words context analysis for improvement of information retrieval, in Computational Collective Intelligence. Technologies and Applications. Springer, 2012, pp. 318-325. open in new tab
  53. J. Szymański and W. Duch, Self organizing maps for visualization of categories, in Neural Information Processing. Springer, 2012, pp. 160-167. open in new tab
  54. J. Szymański et al. (2012, Jun.) Computational Wikipedia project. http://kask. eti.pg.gda.pl/CompWiki/index.php?page=wordnet\&.
  55. J. Szymański and W. Duch, Representation of hypertext documents based on terms, links and text compressibility, Proceedings of ICONIP, pp. 282-289, 2010. open in new tab
  56. T. Boiński, Game with a purpose for mappings verification, in Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on. IEEE, 2016, pp. 405-409. open in new tab
  57. O. Medelyan, D. Milne, C. Legg, and I. Witten, Mining meaning from wikipedia, International Journal of Human-Computer Studies, vol. 67, no. 9, pp. 716-754, 2009. open in new tab
  58. M. Ruiz-Casado, E. Alfonseca, and P. Castells, Automatising the learning of lexical patterns: An application to the enrichment of wordnet by extracting semantic rela- tionships from wikipedia, Data & Knowledge Engineering, vol. 61, no. 3, pp. 484-499, 2007. open in new tab
  59. J. Szymański and W. Duch, Context search algorithm for lexical knowledge acquisi- tion, Control and Cybernetics, vol. 41, no. 1, pp. 81-97, 2012.
Sources of funding:
  • Statutory activity/subsidy
Verified by:
Gdańsk University of Technology

seen 172 times

Recommended for you

Meta Tags