Search results for: text representation documents categorization information retrieval
-
Anna Baj-Rogowska dr
PeopleAnna Baj-Rogowska is employed as an assistant professor at the Department of Informatics in Management at the Faculty of Management and Economics, Gdańsk University of Technology. Her higher education is connected with the University of Gdańsk, where she graduated from a master's degree in business informatics, doctoral studies and then obtained a PhD degree in economics in management science (Department of Business Informatics...
-
Improving css-KNN Classification Performance by Shifts in Training Data
PublicationThis paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose...
-
An Analysis of Neural Word Representations for Wikipedia Articles Classification
PublicationOne of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...
-
Improving the Accuracy in Sentiment Classification in the Light of Modelling the Latent Semantic Relations
PublicationThe research presents the methodology of improving the accuracy in sentiment classification in the light of modelling the latent semantic relations (LSR). The objective of this methodology is to find ways of eliminating the limitations of the discriminant and probabilistic methods for LSR revealing and customizing the sentiment classification process (SCP) to the more accurate recognition of text tonality. This objective was achieved...
-
Concept description vectors and the 20 question game
PublicationKnowledge of properties that are applicable to a given object is a necessary prerequisite to formulate intelligent question. Concept description vectors provide simplest representation of this knowledge, storing for each object information about the values of its properties. Experiments with automatic creation of concept description vectors from various sources, including ontologies, dictionaries, encyclopedias and unstructured...
-
Fusion-based Representation Learning Model for Multimode User-generated Social Network Content
PublicationAs mobile networks and APPs are developed, user-generated content (UGC), which includes multi-source heterogeneous data like user reviews, tags, scores, images, and videos, has become an essential basis for improving the quality of personalized services. Due to the multi-source heterogeneous nature of the data, big data fusion offers both promise and drawbacks. With the rise of mobile networks and applications, UGC, which includes...
-
Development and Research of the Text Messages Semantic Clustering Methodology
PublicationThe methodology of semantic clustering analysis of customer’s text-opinions collection is developed. The author's version of the mathematical models of formalization and practical realization of short textual messages semantic clustering procedure is proposed, based on the customer’s text-opinions collection Latent Semantic Analysis knowledge extracting method. An algorithm for semantic clustering of the text-opinions is developed,...
-
Just look at to open it up: A biometric verification facility for password autofill to protect electronic documents
PublicationElectronic documents constitute specific units of information, and protecting them against unauthorized access is a challenging task. This is because a password protected document may be stolen from its host computer or intercepted while on transfer and exposed to unlimited offline attacks. The key issue is, therefore, making document passwords hard to crack. We propose to augment a common text password authentication interface...
-
Agile Commerce in the light of Text Mining
PublicationThe survey conducted for this study reveals that more than 84% of respondents have never encountered the term “agile commerce” and do not understand its meaning. At the same time, they are active participants of this strategy. Using digital channels as customers more often than ever before, they have already been included in the agile philosophy. Based on the above, the purpose of the study is to analyse major text sets containing...
-
Ontologies vs. Rules — Comparison of Methods of Knowledge Representation Based on the Example of IT Services Management
PublicationThis text provides a brief overview of selected structures aimed at knowledge representation in the form of ontologies based on description logic and aims at comparing them with their counterparts based on the rule-based approach. Due to the limitations on the length of the article, only elements associated with the representation of concepts could be shown, without including roles. The formalisms of the OWL language were used...
-
Semantic Analysis and Text Summarization in Socio-Technical Systems
PublicationIn this chapter the authors present the results of the development the methodology for increasing the reliability of the functioning of the Socio-Technical System. The existed methods and algorithms for processing unstructured (textual) information were studied. Taking into account noted above strengths and weaknesses of Discriminant and Probabilistic approaches of Latent Semantic Relations analysis in of the summarization projection...
-
Context Search Algorithm for Lexical Knowledge Acquisition
PublicationA Context Search algorithm used for lexical knowledge acquisition is presented. Knowledge representation based on psycholinguistic theories of cognitive processes allows for implementation of a computational model of semantic memory in the form of semantic network. A knowledge acquisition using supervised dialog templates have been performed in a word game designed to guess the concept a human user is thinking about. The game,...
-
Methodology of Selecting the Hadoop Ecosystem Configuration in Order to Improve the Performance of a Plagiarism Detection System
PublicationThe plagiarism detection problem involves finding patterns in unstructured text documents. Similarity of documents in this approach means that the documents contain some identical phrases with defined minimal length. The typical methods used to find similar documents in dig- ital libraries are not suitable for this task (plagiarism detection) because found documents may contain similar content and we have not any war- ranty that...
-
SEMANTIC ANALYSIS ALGORITHMS FOR KNOWLEDGE WORKERS SUPPORT
PublicationThe paper examines various aspects of text analysis application for knowledge worker’s activity realization. Conclusions are drawn about the relevance and importance of processing the non-structured textual information in order to increase knowledge worker’s efficiency, as well as their awareness in different branches of science. The paper considers the existing algorithms of texts semantic analysis as the sphere of documents topical...
-
Information Retrieval Facility Conference
Conferences -
Asia Information Retrieval Symposium
Conferences -
European Conference on Information Retrieval
Conferences -
SIGIR workshop: Stylistic Analysis of Text For Information Access
Conferences -
Music Mood Visualization Using Self-Organizing Maps
PublicationDue to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which...
-
DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING
PublicationThe algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...
-
Ontologie vs. reguły — porównanie metod reprezentacji wiedzy na przykładzie dziedziny zarządzania usługami informatycznymi
PublicationTekst stanowi krótki przegląd wybranych konstrukcji służących reprezentacji wiedzy w postaci ontologii opartych na logice opisowej i porównanie ich z odpowiednikami opartymi na zapisie regułowym. Z powodu ograniczonej liczby stron pokazano tylko elementy związane z reprezentacją konceptów, bez uwzględniania ról. Do zapisu ontologii wykorzystano formalizmy języka OWL, zaś reguły wyrażono w Prologu. Dla lepszego zilustrowania tych...
-
Krystyna Dziubich mgr inż.
PeopleKrystyna Dziubich obtained a Eng. degree in computer science granted by a council at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology in 1996. 1996-2005 employment in industry, as a computer scientist as specialist analyst in the Department of Management Systems Development; She is employed at ETI Faculty as a lecturer since 2005. She conducts lectures for full-time, extramural...
-
Internal legal acts of technical and medical universities in Poland regulating classes conducted in-person during the Covid-19 pandemic
Open Research DataA database of legal acts and other internal documents of medical and technical universities in Poland regulating the way of organizing in-person or hybrid classes during the COVID-19 pandemic from the summer semester 2019/2020 to the winter semester 2020/2021.Documents were encoded in two separate coding systems using the MAXQDA program for qualitative...
-
Speech Analytics Based on Machine Learning
PublicationIn this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...
-
Towards Increasing Density of Relations in Category Graphs
PublicationIn the chapter we propose methods for identifying new associations between Wikipedia categories. The first method is based on Bag-of-Words (BOW) representation of Wikipedia articles. Using similarity of the articles belonging to different categories allows to calculate the information about categories similarity. The second method is based on average scores given to categories while categorizing documents by our dedicated score-based...
-
Retrieval of Heterogeneus Sevices in C2NIWA Repository
PublicationThe paper reviews the methods used for retrieval of information and services. The selected approaches presented in the review inspired us to build retrieval mechanisms in a system for searching the resources stored in the C2NIWA repository. We describe the architecture of the system, its functions and the surrounding subsystems to which it is related. For retrieval of C2NIWA sevices we propos three approaches based on: keyword...
-
Marek Czachor prof. dr hab.
People -
Context-Aware Indexing and Retrieval for Cognitive Systems Using SOEKS and DDNA
PublicationVisual content searching, browsing and retrieval tools have been a focus area of interest as they are required by systems from many different domains. Context-based, Content-Based, and Semantic-based are different approaches utilized for indexing/retrieving, but have their drawbacks when applied to systems that aim to mimic the human capabilities. Such systems, also known as Cognitive Systems, are still limited in terms of processing...
-
International Conference on the Theory of Information Retrieval (The 3rd ACM International Conference on the Theory of Information Retrieval)
Conferences -
CAD. Integrated Architectural Design, MSc Arch (2022/2023)
e-Learning CoursesThe programme will provide students with a solid grounding in BIM (Building Information Modelling) using Autodesks Revit Architecture. Students will review the advanced features of Revit for Architecture, a tool to support BIM (Building Information Modelling) and delivery of 3D digital models and related documentation. The lesson plans will specifically introduce students to common workflows and problem-solving skills while creating...
-
CAD. Integrated Architectural Design, BSc Arch (2023-24)
e-Learning CoursesThe programme will provide students with a solid grounding in BIM (Building Information Modelling) using Autodesks Revit Architecture. Students will review the advanced features of Revit for Architecture, a tool to support BIM (Building Information Modelling) and delivery of 3D digital models and related documentation. The lesson plans will specifically introduce students to common workflows and problem-solving skills while creating...
-
DBpedia and YAGO Based System for Answering Questions in Natural Language
PublicationIn this paper we propose a method for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference) based on DBpedia and YAGO. Our method is based on generating dependency trees for the query. In the dependency tree we look for paths leading from the root to the named entity of interest. These paths (referenced further as fibers) are candidates for representation of actual user intention. The...
-
Contextual ontology for tonality assessment
Publicationclassification tasks. The discussion focuses on two important research hypotheses: (1) whether it is possible to construct such an ontology from a corpus of textual document, and (2) whether it is possible and beneficial to use inferencing from this ontology to support the process of sentiment classification. To support the first hypothesis we present a method of extraction of hierarchy of contexts from a set of textual documents...
-
Semantic Memory for Avatars in Cyberspace
PublicationAvatars that show intelligent behavior should have an access to general knowledge about the world, knowledge that humans store in their semantic memories. The simplest knowledge representation for semantic memory is based on the Concept Description Vectors (CDVs) that store, for each concept, an information whether a given property can be applied to this concept or not. Unfortunately large-scale semantic memories are not available....
-
Next Generation Digital
PublicationThe paper outlines the major objectives of the MENAID research project, eimed at novel architectures of digital documents. Such documents will enable reduction of information overflow and strain, a major threat to the growth of a digital society. They will be forward compatible, technology neutral and lightweight, allowing workers of network organizations to use personal devices of any type.
-
Modeling the Customer’s Contextual Expectations Based on Latent Semantic Analysis Algorithms
PublicationNowadays, in the age of Internet, access to open data detects the huge possibilities for information retrieval. More and more often we hear about the concept of open data which is unrestricted access, in addition to reuse and analysis by external institutions, organizations and people. It’s such information that can be freely processed, add another data (so-called remix) and then published. More and more data are available in text...
-
Towards Healthcare Cloud Computing
PublicationIn this paper we present construction of a software platform for supporting medical research teams, in the area of impedance cardiography, called IPMed. Using the platform, research tasks will be performed by the teams through computer-supported cooperative work. The platform enables secure medical data storing, access to the data for research group members, cooperative analysis of medical data and provide analysis supporting tools...
-
Machine Learning and Text Analysis in an Artificial Intelligent System for the Training of Air Traffic Controllers
PublicationThis chapter presents the application of new information technology in education for the training of air traffic controllers (ATCs). Machine learning, multi-criteria decision analysis, and text analysis as the methods of artificial intelligence for ATCs training have been described. The authors have made an analysis of the International Civil Aviation Organization documents for modern principles of ATCs education. The prototype...
-
Workflow patterns applicable to virtual knowledge-based organizations
PublicationWorkflow is a term specifying how to automate a business process, in whole or part during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules. Workflow is therefore directly applicable in virtual knowledge-based organizations, where information is exchanged via electronic documents. In the literature, is presented a complete list of workflow control-flow...
-
System of specific grants for local government units in Poland
PublicationThe article analyses the system of specific grants in local governments in Poland. First, main revenue sources of local self-governments are presented. Their presentation is based upon the consideration of one of the basic important principles in democratic states today, i.e. decentralization. The text then, in more details, describes specific grants with respect to the European Charter of Local Self-Government. Subsequently, the...
-
Gaining knowledge through experience: developing decisional DNA applications in robotics
PublicationOmówiono nowatorskie podejscie do zastosowania wiedzy opartej na doświadczeniu i budowie decyzyjnego DNA w obszarach związanych z robotyką.In this article, we explore an approach that integrates Decisional DNA, a domain-independent, flexible, and standard knowledge representation structure, with robots in order to test the usability and suitability of this novel knowledge representation structure. Core issues in using this Decisional...
-
Facial data registration facility for biometric protection of electronic documents
PublicationIn modern world, information is crucial, and its leakage may lead to serious losses. Documents as the main medium of information must be therefore highly protected. Nowadays, the most common way of protecting data is using passwords, however it seems inconvenient to type complex passwords, when it is needed many times a day. For that reason a significant research has been conducted on biometric authentication...
-
ACM SIGIR Workshop on XML and Information Retrieval
Conferences -
International Symposium on String Processing and Information Retrieval
Conferences -
Magdalena Szuflita-Żurawska
PeopleHead of the Scientific and Technical Information Services at the Gdansk University of Technology Library and the Leader of the Open Science Competence Center. She is also a Plenipotentiary of the Rector of the Gdańsk University of Technology for open science. She is a PhD Candidate. Her main areas of research and interests include research productivity, motivation, management of HEs, Open Access, Open Research Data, information...
-
Manufacturing Data Analysis in Internet of Things/Internet of Data (IoT/IoD) Scenario
PublicationComputer integrated manufacturing (CIM) has enormous benefits as it increases the rate of production, reduces errors and production waste, and streamlines manufacturing sub-systems. However, there are some new challenges related to CIM operating in the Internet of Things/Internet of Data (IoT/IoD) scenarios associated with Industry 4.0 and cyber-physical systems. The main challenge is to deal with the massive volume of data flowing...
-
Self-Organizing Map representation for clustering Wikipedia search results
PublicationThe article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...
-
Self–Organizing Map representation for clustering Wikipedia search results
PublicationThe article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...
-
ACM International Conference on Research and Development in Information Retrieval
Conferences -
Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives
PublicationLong-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to access the documents within Web archives is provided through their URLs, which are...