Abstract
The paper presents an approach to build references (also called mappings) between WordNet and Wikipedia. We propose four algorithms used for automatic construction of the references. Then, based on an aggregation algorithm, we produce an initial set of mappings that has been evaluated in a cooperative way. For that purpose, we implement a system for the distribution of evaluation tasks, that have been solved by the user community. To make the tasks more attractive, we embed them into a game. Results show the initial mappings have good quality, and they have also been improved by the community. As a result, we deliver a high quality dataset of the mappings between two lexical repositories: WordNet and Wikipedia, that can be used in a wide range of NLP tasks. We also show that the framework for collaborative validation can be used in other tasks that require human judgments.
Citations
-
4
CrossRef
-
0
Web of Science
-
3
Scopus
Authors (2)
Cite as
Full text
- Publication version
- Accepted or Published Version
- License
- Copyright (World Scientific Publishing Company)
Keywords
Details
- Category:
- Articles
- Type:
- artykuł w czasopiśmie wyróżnionym w JCR
- Published in:
-
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING
no. 29,
edition 03,
pages 317 - 344,
ISSN: 0218-1940 - Language:
- English
- Publication year:
- 2019
- Bibliographic description:
- Szymański J., Boiński T.: Crowdsourcing-Based Evaluation of Automatic References Between WordNet and Wikipedia// INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING. -Vol. 29, iss. 03 (2019), s.317-344
- DOI:
- Digital Object Identifier (open in new tab) 10.1142/s0218194019500141
- Bibliography: test
-
- J. Szymański, H. Krawczyk, and M. Deptula, Retrieval with semantic sieve, in Intel- ligent Information and Database Systems, ser. Lecture Notes in Computer Science, A. Selamat, N. Nguyen, and H. Haron, Eds. Springer Berlin Heidelberg, 2013, vol. 7802, pp. 236-245. open in new tab
- T. Berners-Lee, J. Hendler, O. Lassila et al., The semantic web, Scientific american, vol. 284, no. 5, pp. 28-37, 2001. open in new tab
- Y. Ding, D. Fensel, M. Klein, and B. Omelayenko, The semantic web: yet another hip? Data & Knowledge Engineering, vol. 41, no. 2-3, pp. 205-227, 2002. open in new tab
- A. Maedche and S. Staab, Ontology learning for the semantic web, Intelligent Sys- tems, IEEE, vol. 16, no. 2, pp. 72-79, 2001. open in new tab
- K. Goczy la, T. Grabowska, W. Waloszek, and M. Zawadzki, The knowledge April 11, 2018 11:53 WSPC/INSTRUCTION FILE cooperWiki-WN open in new tab
- Crowdsourcing Based Evaluation of Automatic References 25 open in new tab
- cartography-a new approach to reasoning over description logics ontologies, SOF- SEM 2006: Theory and Practice of Computer Science, pp. 293-302, 2006. open in new tab
- H. Sun, W. Fan, W. Shen, and T. Xiao, Ontology-based interoperation model of collaborative product development, Journal of Network and Computer Applications, vol. 35, no. 1, pp. 132-144, 2011. open in new tab
- D. Vallet, M. Fernández, and P. Castells, An ontology-based information retrieval model, The Semantic Web: Research and Applications, pp. 103-110, 2005. open in new tab
- J. Sowa, Principles of semantic networks. Morgan Kaufmann, 1991. open in new tab
- C. Bizer, T. Heath, and T. Berners-Lee, Linked data -the story so far, International journal on semantic web and information systems, vol. 5, no. 3, pp. 1-22, 2009. open in new tab
- A. Gomez-Perez, M. Fernández-López, and O. Corcho, Ontological engineering. Springer Heidelberg, 2004, vol. 139.
- L. Specia and E. Motta, Integrating folksonomies with the semantic web, in The semantic web: research and applications. Springer, 2007, pp. 624-639. open in new tab
- R. Studer, V. R. Benjamins, and D. Fensel, Knowledge engineering: principles and methods, Data & knowledge engineering, vol. 25, no. 1, pp. 161-197, 1998. open in new tab
- L. Von Ahn, Games with a purpose, Computer, vol. 39, no. 6, pp. 92-94, 2006.
- M. Ruiz-Casado, E. Alfonseca, and P. Castells, Automatic assignment of wikipedia encyclopedic entries to wordnet synsets, Advances in Web Intelligence, pp. 380-386, 2005. open in new tab
- J. Szymański and D. Kilanowski, Wikipedia and WordNet integration based on words co-occurrences, Proceedings of 30th International Conference Information Systems, Architecture and Technology, vol. 1, pp. 93-103, 2009.
- R. Schenkel, F. M. Suchanek, and G. Kasneci, Yawn: A semantically annotated wikipedia xml corpus, Proceedings of 12th Symposium on Database Systems for Busi- ness, 2007.
- F. M. Suchanek, G. Kasneci, and G. Weikum, Yago: A large ontology from wikipedia and wordnet, Web Semantics: Science, Services and Agents on the World Wide Web, vol. 6, no. 3, pp. 203-217, 2008. open in new tab
- F. Suchanek, G. Kasneci, and G. Weikum, Yago: a core of semantic knowledge, in Proceedings of the 16th international conference on World Wide Web. ACM, 2007, pp. 697-706. open in new tab
- R. Mihalcea, T. Chklovski, and A. Kilgarriff, The Senseval-3 English lexical sample task, in Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. Barcelona, Spain;, 2004, pp. 25-28.
- D. Nadeau and S. Sekine, A survey of named entity recognition and classification, Lingvisticae Investigationes, vol. 30, no. 1, pp. 3-26, 2007. open in new tab
- D. Lenat, Cyc: A large-scale investment in knowledge infrastructure, Communications of the ACM, vol. 38, no. 11, pp. 33-38, 1995. open in new tab
- D. A. Ferrucci, Introduction to this is watson, IBM Journal of Research and Devel- opment, vol. 56, no. 3.4, pp. 1-1, 2012. open in new tab
- S. P. Ponzetto and R. Navigli, Large-scale taxonomy mapping for restructuring and integrating wikipedia. in IJCAI, vol. 9, 2009, pp. 2083-2088.
- N. Reiter, M. Hartung, and A. Frank, A resource-poor approach for linking ontology classes to wikipedia articles, in Proceedings of the 2008 Conference on Semantics in Text Processing. Association for Computational Linguistics, 2008, pp. 381-387. open in new tab
- D. Milne and I. H. Witten, Learning to link with wikipedia, in Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008, pp. 509-518. open in new tab
- E. Niemann and I. Gurevych, The people's web meets linguistic knowledge: automatic April 11, 2018 11:53 WSPC/INSTRUCTION FILE cooperWiki-WN
- Julian Szymański and Tomasz Boiński sense alignment of wikipedia and wordnet, in Proceedings of the Ninth International Conference on Computational Semantics. Association for Computational Linguistics, 2011, pp. 205-214.
- D. P. Anderson and G. Fedak, The computational and storage potential of volun- teer computing, in Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on, vol. 1. IEEE, 2006, pp. 73-80. open in new tab
- J. Howe. (2006) Crowdsourcing: A definition. http://www.crowdsourcing.com/cs/ 2006/06/crowdsourcing\_a.html.[Online, accessed: 10.10.2017].
- A. Kosorukoff, Human based genetic algorithm, in Systems, Man, and Cybernetics, 2001 IEEE International Conference on, vol. 5. IEEE, 2001, pp. 3464-3469. open in new tab
- D. Wightman, Crowdsourcing human-based computation, in Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries. ACM, 2010, pp. 551-560. open in new tab
- J. Simko and M. Bieliková, Games with a purpose: User generated valid metadata for personal archives, in Semantic Media Adaptation and Personalization (SMAP), 2011 Sixth International Workshop on. IEEE, 2011, pp. 45-50. open in new tab
- A. Swanson, M. Kosmala, C. Lintott, R. Simpson, A. Smith, and C. Packer, Snapshot serengeti, high-frequency annotated camera trap images of 40 mammalian species in an african savanna, Scientific data, vol. 2, p. 150026, 2015. open in new tab
- L. Von Ahn and L. Dabbish, Designing games with a purpose, Communications of the ACM, vol. 51, no. 8, pp. 58-67, 2008. open in new tab
- L. Von Ahn and L. Dabbish, Labeling images with a computer game, in Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2004, pp. 319-326. open in new tab
- L. Von Ahn, S. Ginosar, M. Kedia, and M. Blum, Improving image search with phetch, in Acoustics, speech and signal processing, 2007. icassp 2007. ieee interna- tional conference on, vol. 4. IEEE, 2007, pp. IV-1209. open in new tab
- L. Von Ahn, R. Liu, and M. Blum, Peekaboom: a game for locating objects in images, in Proceedings of the SIGCHI conference on Human Factors in computing systems. ACM, 2006, pp. 55-64. open in new tab
- E. L. Law, L. Von Ahn, R. B. Dannenberg, and M. Crawford, TagATune: A game for music and sound annotation, in ISMIR, vol. 3, 2007, p. 2. open in new tab
- J. Simko, Semantics discovery via human computation games, Semantic Web: Ontol- ogy and Knowledge Base Enabled Tools, Services, and Applications, p. 286, 2013. open in new tab
- L. Von Ahn, B. Maurer, C. McMillen, D. Abraham, and M. Blum, recaptcha: Human- based character recognition via web security measures, Science, vol. 321, no. 5895, pp. 1465-1468, 2008. open in new tab
- M. Foley, prove you're human: Fetishizing material embodiment and immaterial labor in information networks, Critical Studies in Media Communication, vol. 31, no. 5, pp. 365-379, 2014. open in new tab
- AJT, Lessons from Duolingo's Effort to Support Free Language Learn- ing from Crowdsourcing, https://digit.hbs.org/submission/lessons-from\ \-duolingos-effort-to-support-free-language-learning-from-crowdsourcing, 2015. [Online, accessed: 12.05.2017] open in new tab
- D. Vannella, D. Jurgens, D. Scarfini, D. Toscani, and R. Navigli, Validating and extending semantic knowledge bases using video games with a purpose. in ACL (1), 2014, pp. 1294-1304. open in new tab
- D. Jurgens and R. Navigli, It's all fun and games until someone annotates: Video games with a purpose for linguistic annotation, Transactions of the Association of Computational Linguistics, vol. 2, no. 1, pp. 449-464, 2014. open in new tab
- J. Szymański and T. Boiński, Improvement of imperfect string matching based on asymmetric n-grams, in Computational Collective Intelligence. Technologies and Ap- plications, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013, vol. 8083, pp. 306-315. open in new tab
- F. J. Damerau, A technique for computer detection and correction of spelling errors, Commun. ACM, vol. 7, pp. 171-176, March 1964. open in new tab
- G. Hripcsak and A. Rothschild, Agreement, the f-measure, and reliability in infor- mation retrieval, Journal of the American Medical Informatics Association, vol. 12, no. 3, pp. 296-298, 2005. open in new tab
- R. Korytkowski and J. Szymanski, Collaborative approach to WordNet and Wikipedia integration, in The Second International Conference on Advanced Collaborative Net- works, Systems and Applications, COLLA, 2012, pp. 23-28.
- J. Szymański, Mining relations between Wikipedia categories, in Networked Digital Technologies. Springer, 2010, pp. 248-255. open in new tab
- J. Szymański, Words context analysis for improvement of information retrieval, in Computational Collective Intelligence. Technologies and Applications. Springer, 2012, pp. 318-325. open in new tab
- J. Szymański and W. Duch, Self organizing maps for visualization of categories, in Neural Information Processing. Springer, 2012, pp. 160-167. open in new tab
- J. Szymański et al. (2012, Jun.) Computational Wikipedia project. http://kask. eti.pg.gda.pl/CompWiki/index.php?page=wordnet\&.
- J. Szymański and W. Duch, Representation of hypertext documents based on terms, links and text compressibility, Proceedings of ICONIP, pp. 282-289, 2010. open in new tab
- T. Boiński, Game with a purpose for mappings verification, in Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on. IEEE, 2016, pp. 405-409. open in new tab
- O. Medelyan, D. Milne, C. Legg, and I. Witten, Mining meaning from wikipedia, International Journal of Human-Computer Studies, vol. 67, no. 9, pp. 716-754, 2009. open in new tab
- M. Ruiz-Casado, E. Alfonseca, and P. Castells, Automatising the learning of lexical patterns: An application to the enrichment of wordnet by extracting semantic rela- tionships from wikipedia, Data & Knowledge Engineering, vol. 61, no. 3, pp. 484-499, 2007. open in new tab
- J. Szymański and W. Duch, Context search algorithm for lexical knowledge acquisi- tion, Control and Cybernetics, vol. 41, no. 1, pp. 81-97, 2012.
- Sources of funding:
-
- Statutory activity/subsidy
- Verified by:
- Gdańsk University of Technology
Referenced datasets
seen 172 times