Increasing K-Means Clustering Algorithm Effectivity for Using in Source Code Plagiarism Detection

Patrik Hrkút; Michal Ďuračík; Miroslava Mikušová; Mauro Callejas-cuervo; Joanna Żukowska

Increasing K-Means Clustering Algorithm Effectivity for Using in Source Code Plagiarism Detection

Abstrakt

The problem of plagiarism is becoming increasingly more significant with the growth of Internet technologies and the availability of information resources. Many tools have been successfully developed to detect plagiarisms in textual documents, but the situation is more complicated in the field of plagiarism of source codes, where the problem is equally serious. At present, there are no complex tools available to detect plagiarism in a large number of software projects, such as student projects, which are created hundreds per year at each faculty of informatics. Our project aim is to create such a system for finding plagiarism in a large dataset of source codes. The whole system consists of several parts. A classification of source code is an essential part of the whole system because it makes it much more efficient to manipulate source code and divide data into individual clusters so that searching in large volumes of source code is as efficient as possible. The paper discusses how to optimize the implementation of clustering, so the whole system would deliver results in a reasonable time because allocating the different parts of the source code into suitable clusters will allow faster and more memory-efficient search for similar parts of the code.

Autorzy (5)

Patrik Hrkút
Michal Ďuračík
Miroslava Mikušová
Mauro Callejas-cuervo
Joanna Żukowska dr hab. inż.

Cytuj jako

Pełna treść

pełna treść publikacji nie jest dostępna w portalu

Słowa kluczowe

Informacje szczegółowe

Kategoria:: Publikacja monograficzna
Typ:: rozdział, artykuł w książce - dziele zbiorowym /podręczniku w języku o zasięgu międzynarodowym
Język:: angielski
Rok wydania:: 2019
Opis bibliograficzny:: Hrkút P., Ďuračík M., Mikušová M., Callejas-Cuervo M., Żukowska J.: Increasing K-Means Clustering Algorithm Effectivity for Using in Source Code Plagiarism Detection// Smart Technologies, Systems and Applications/ : , 2019, s.120-131
Weryfikacja:: Politechnika Gdańska