Towards Increasing Density of Relations in Category Graphs
PublicationIn the chapter we propose methods for identifying new associations between Wikipedia categories. The first method is based on Bag-of-Words (BOW) representation of Wikipedia articles. Using similarity of the articles belonging to different categories allows to calculate the information about categories similarity. The second method is based on average scores given to categories while categorizing documents by our dedicated score-based...
Wydobywanie wiedzy z Wikipedii
PublicationWikipedia jest olbrzymim źródłem wiedzy encyklopedycznej gromadzonej przez ludzi i przeznaczonej dla ludzi. W systemach informatycznych odpowiednikiem takiego źródła wiedzy są ontologie. Ten rozdział pokazuje, w jaki sposób Wikipedia jest transformowana w ontologię i jak wydobywać z niej pojęcia, ich właściwości i relacje między nimi.
Elgold partial: News
Open Research DataThe dataset contains 37 English texts scrapped from news websites. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking...
Zespół mieszkaniowo-usługowy ZUS przy ul. Partyzantów w Gdyni, jako przykład nowoczesnego zintegrowania funkcji i formy = / ZUS residential and service complex in Partyzantów St. in Gdynia as an example of modern integration of form and function
PublicationArtykuł dotyczy zespołu zabudowy, który powstał w Gdyni pod koniec lat 30. XX w. Ukończenie założenia przerwał wybuch II wojny, ale pomimo tego zrealizowana część dowodzi, że zastosowane rozwiązania przestrzenne i funkcjonalne należały do najnowocześniejszych w ówczesnej Polsce.
Marek Szelągowski dr
PeopleMarek Szelągowski has participated in the creation and implementation of IT solutions in the fields of accounting, human resources management, production, IT infrastructure management, etc. As the CIO of the BUDIMEX Group in 2000–2008 he was responsible for the accommodation of informatization strategies to the changing needs of the business sector. He was managing and participating in analyses and optimizations of business processes...
Elgold partial: Automotive blogs
Open Research DataThe dataset contains 34 English texts scrapped from automotive blogs. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and...
Elgold partial: Scientific papers' abstracts
Open Research DataThe dataset contains 87 Scientific papers' abstracts in English randomly chosen from the folowing scientific disciplines: Biomedicine, Life Sciences, Mathematics, Medicine, Science, Humanities, Social Science.
Elgold partial: Movie reviews
Open Research DataThe dataset contains 37 English texts with movie reviews. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: Amazon product reviews
Open Research DataThe dataset contains 34 Amazon product reviews in English. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: Job offers
Open Research DataThe dataset contains 34 English texts scrapped from the web portals offering job offers. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity...
Elgold partial: History blogs
Open Research DataThe dataset contains 13 texts from English history blogs. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Information Integration and Web-based Applications and Services
Conferences -
International Conference on Enterprise Integration and Modelling Technology
Conferences -
International Symposium on Advanced DB Technologies and Integration
Conferences -
Elgold intermediate: verified by the authors
Open Research DataThe dataset contains the texts from Elgold intermediate: verified by verification team additionaly verified by the dataset authors but before the final validation step with the elgold toolset.
Elgold intermediate: verified by verification team
Open Research DataThe dataset contains the texts from Elgold intermediate: annotated raw additionaly verified by the five-person verification team. arly 25% of the mentions were corrected in some aspect.
Natura i dziedzictwo – Cele Zrównoważonego Rozwoju (SDG) jako czynniki integracji społecznej w przestrzeniach osiedli mieszkaniowych. Studium Zaspy./ Nature and heritage – Sustainable Development Goals (SDG) as factors of social integration in the space of housing estates. Study of Zaspa
PublicationThis article introduces the topic of revitalising outdoor common spaces in existing housing developments. The aim of the research is to present universal design models by analysing nature and heritage as dominant values, complementing perceived natural and cultural deficiencies. For this purpose, the Sustainable Development Goals (SDG), UNESCO (UNESCO, 2015) and the Research Through Design (RTD) method were used. The Zaspa housing...
Information based integration for complex systems. W: Knowledge and infor-mation technology management in the 21st century organizations. Ed. A. Gu- nasekaran, O. Khalil, M.R. Syed. London: Idea**2002 s. 89-104 Informacyjna integracja systemów złożonych.
PublicationW rozdziale zaproponowano strukturę inteligentnego systemu wspomagania pro-cesu integracji dla złożonych systemów wytwarzania. System wspomagania opar-to na bazie wiedzy, w której wiedza modelowana jest regulami produkcji. Zbu-dowano również iteracyjny algorytm integracji. Samą ideę integracji opartona przepływach informacyjnych.
Review on Wikification methods
PublicationThe paper reviews methods on automatic annotation of texts with Wikipedia entries. The process, called Wikification aims at building references between concepts identified in the text and Wikipedia articles. Wikification finds many applications, especially in text representation, where it enables one to capture the semantic similarity of the documents. Also, it can be considered as automatic tagging of the text. We describe typical...
Tomasz Dziubich dr inż.
PeopleScientific projects and grants Internet platform for data integration and collaboration of medical research teams for the stroke treatment centers 2013 - 2016 MAYDAY EURO 2012 Supercomputer Platform for Context Analysis of Data Streams in Identification of Specified Objects or Hazardous Events – task 4.2 (Development of algorithms and applications supporting medical diagnosis), 2008-2012 Other GrandPrix on trade show ...
Evaluation of Path Based Methods for Conceptual Representation of the Text
PublicationTypical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...
Budowa ontologii usług dla potrzeb wyszukiwania
PublicationOntologie, dzięki zapewnieniu formalnego opisu przy zachowaniu czytelności dla człowieka, są coraz powszechniej stosowaną metodą opisu usług sieciowych. Zaprezentowano słownik WordNet i jego zastosowanie jako meta ontologia do opisów usług podobnych różnych dostawców. Zaproponowano algorytm oparty o ten słownik umożliwiający integrację ontologii usług w celu zapewnienia interoperacyjności rozwiązań dostępnych w sieci Internet.
Towards Facts Extraction From Texts in Polish Language
PublicationThe Polish language differs from English in many ways. It has more complicated conjugation and declination. Because of that automatic facts extraction from texts is difficult. In this paper we present basic differences between those languages. The paper presents an algorithm for extraction of facts from articles from Polish Wikipedia. The algorithm is based on 7 proposed facts schemes that are searched for in the analyzed text....
Dynamic Semantic Visual Information Management
PublicationDominant Internet search engines use keywords and therefore are not suited for exploration of new domains of knowledge, when the user does not know specific vocabulary. Browsing through articles in a large encyclopedia, each presenting a small fragment of knowledge, it is hard to map the whole domain, see relevant concepts and their relations. In Wikipedia for example some highly relevant articles are not linked with each other....
Selecting Features with SVM
PublicationA common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection. Experiments were performed on three text datasets generated from a Wikipedia dump. Amount...
International Workshop on Multimedia Data Storage, Retrieval, Integration and Applications
Conferences -
DBpedia As a Formal Knowledge Base – An Evaluation
PublicationDBpedia is widely used by researchers as a mean of accessing Wikipedia in a standardized way. In this paper it is characterized from the point of view of questions answering system. Simple implementation of such system is also presented. The paper also characterizes alternatives to DBpedia in form of OpenCyc and YAGO knowledge bases. A comparison between DBpedia and those knowledge bases is presented.
Self Organizing Maps for Visualization of Categories
PublicationVisualization of Wikipedia categories using Self Organizing Mapsshows an overview of categories and their relations, helping to narrow down search domains. Selecting particular neurons this approach enables retrieval of conceptually similar categories. Evaluation of neural activations indicates that they form coherent patterns that may be useful for building user interfaces for navigation over category structures.
Joanna Wolszczak-Derlacz dr hab.
PeopleI am an Associate Professor at Gdańsk University of Technology, Faculty of Management and Economics. I gained my PhD in July 2006 for which I received the Polish Prime Ministry’s Award for the outstanding doctoral thesis. Between 2007 and 2008, I was a beneficiary of the Max Weber Fellowship at the European University Institute in Florence, Italy. I conducted part of my research at the Katholieke Universiteit Leuven, Belgium...
Jarosław Guziński prof. dr hab. inż.
PeopleJaroslaw Guzinski received M.Sc., Ph.D. and D.Sc. degrees from the Electrical Engineering Department at Technical University of Gdansk, Poland in 1994, 2000 and 2011 respectively. Since 2016 he is Associate Professor at Gdansk University of Technology. Currently he is the head of the Department of Electric Drives and Energy Conversion. From 2006 to 2009 he was involved in European Commission Project PREMAID Marie Curie, ‘Predictive...
Automatyczna budowa taksonomii usług w oparciu o ich głosy w języku naturalnym oraz przy uzyciu zewnętrznych źródeł wiedzy
PublicationPrzedstawiono propozycję metody automatycznej budowy taksonomiiusług na podstawie ich opisów w języku naturalnym, w oparciu ometodę analizy formalnych koncepcji (FCA). Dodatkowo przedstawione rozwiązanie przewiduje możliwość skorzystania z zewnętrznych źródeł wiedzy takich jak Wikipedia, Word Net, ConceptNet lub globalnej sieci WWW w celu eliminacji problemu niepełnych danych wejściowych (ang. data sparseness).
Cooperative Word Net Editor for Lexical Semantic Acquisition
PublicationThe article describes an approach for building Word Net semantic dictionary in a collaborative approach paradigm. The presented system system enables functionality for gathering lexical data in a Wikipedia-like style. The core of the system is a user-friendly interface based on component for interactive graph navigation. The component has been used for Word Net semantic network presentation on web page, and it brings functionalities...
Pączkowanie - metoda rozwoju interoperacyjnych komponentów dla systemów rozproszonych = Budding – the software development method of interoperable components for distributed systems
PublicationPrzedstawiono 2 współczesne metody wytwarzania oprogramowania: iteracyjno-przyrostową oraz techniki zwinne, ich zalety i wady w kontekście budowy interoperacyjnych platform i środowisk rozproszonych. Zaprezentowano metodę rozwoju oprogramowania przez pączkowanie, jej założenia, zalety i wady. Przedstawiono technologie, na bazie których działa metodologia wytwarzania oprogramowania przez pączkowanie: Software Product Line, Enterprise...
Metody ekstrakcji ustrukturalizowanej treści z Wikipedii
PublicationWikipedia jest od dawna przedmiotem zainteresowania badaczy. Jednym z obszarów zainteresowania jest pozyskiwanie wiedzy z treści Wikipedii a to wymaga parsowania tekstu artykułów. W tym rozdziale przedstawiono analizę porównawczą różnych możliwości parsowania treści Wikipedii, wskazując problemy, z jakimi muszą się mierzyć autorzy parserów. Dzięki temu można zrozumieć, dlaczego proces wydobywania wiedzy z Wikipedii jest trudny
Words context analysis for improvement of information retrieval
PublicationIn the article we present an approach to improvement of retrieval informationfrom large text collections using words context vectors. The vectorshave been created analyzing English Wikipedia with Hyperspace Analogue to Language model of words similarity. For test phrases we evaluate retrieval with direct user queries as well as retrieval with context vectors of these queries. The results indicate that the proposed method can not...
Management of Textual Data at Conceptual Level
PublicationThe article presents the approach to the management of a large repository of documents at conceptual level. We describe our approach to representing Wikipedia articles using their categories. The representation has been used to construct groups of similar articles. Proposed approach has been implemented in prototype system that allows to organize articles that are search results for a given query. Constructed clusters allow to...
Extracting concepts from the software requirements specification using natural language processing
PublicationExtracting concepts from the software require¬ments is one of the first step on the way to automating the software development process. This task is difficult due to the ambiguity of the natural language used to express the requirements specification. The methods used so far consist mainly of statistical analysis of words and matching expressions with a specific ontology of the domain in which the planned software will be applicable....
Enhancing Word Embeddings for Improved Semantic Alignment
PublicationThis study introduces a method for the improvement of word vectors, addressing the limitations of traditional approaches like Word2Vec or GloVe through introducing into embeddings richer semantic properties. Our approach leverages supervised learning methods, with shifts in vectors in the representation space enhancing the quality of word embeddings. This ensures better alignment with semantic reference resources, such as WordNet....
Muhammad Usman PhD
PeopleMuhammad Usman is currently a Computer Vision Researcher at Gdansk University of Technology, working on the BE-LIGHT project, where his research focuses on advancing biomedical diagnostics through the integration of light-based technologies and machine learning techniques. He has completed his Master’s degree in Control Science and Engineering from the University of Science and Technology of China (USTC), Hefei, China. His research...
Anna Wałek dr
PeopleDr Anna Wałek, President of IATUL – International Association of University Libraries, director of the Gdańsk University of Technology Library. An experienced library manager, an expert in the field of Open Science, and organization and management of a scientific library. She conducts scientific research in data management in various scientific disciplines, metadata for research data, and data management support services - incl....
EU Enlargement and Labour Demand in the New Member States
PublicationResearch to date on labour market responses to EU integration has tended to concentrate on the labour markets of the 'old' EU members. But what effects has the integration of trade had on wages in the new member states? The following article attempts to answer this question using and empirical model of conditional labour demand.
A Developer's View of Application Servers Interoperability
PublicationThe paper describes analysis of application servers interoperability that considers both the available level of integration and the required level of development complexity. Development complexity ranges from simple GUI operations to changes of undocumented features in configuration files. We verify if an integration can be established on a given level of development complexity, rather than verify if it is objectively feasible....
Thresholding Strategies for Large Scale Multi-Label Text Classifier
PublicationThis article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classification tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classifier on medium scale text corpora extracted from Wikipedia. Obtained results show that the...
Towards Effective Processing of Large Text Collections
PublicationIn the article we describe the approach to parallelimplementation of elementary operations for textual data categorization.In the experiments we evaluate parallel computations ofsimilarity matrices and k-means algorithm. The test datasets havebeen prepared as graphs created from Wikipedia articles relatedwith links. When we create the clustering data packages, wecompute pairs of eigenvectors and eigenvalues for visualizationsof...
Piotr Grudowski dr hab. inż.
PeopleProfessor Dr hab. Eng. Piotr Grudowski heads the Department of Quality and Commodity Management at the Faculty of Management and Economics of Gdansk University of Technology. In the years 1987-2009 he worked at the Faculty of Mechanical Engineering of the Gdansk University of Technology, where he obtained a doctoral degree in technical sciences in the discipline of construction and operation of machines and he headed the Department...
Interoperability Description of Web Services Based Application Servers
PublicationWeb services standards were designed to enable interoperability of heterogeneous application servers in the Service Oriented Architecture. Although the standards proved to be highly successful, there are still difficulties in effective services integration. The paper presents a methodology that enables description of application servers interoperability in order to improve the service integration process. The methodology proposes...
External Validation Measures for Nested Clustering of Text Documents
PublicationAbstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...
EU-Turkey Customs Union and Bilateral Foreign Direct Investment Flows
PublicationMain aim of this text is presentation of the effects of customs union between the European Union and turkey on bilateral FDI flows in light of the theory of linkages between economic integration and FDI flows. First section of the text is a survey of main theoretical links between economic integration and FDI flows. Second section focuses on the history and scope of the customs union. Third and fourth sections are empirical and...
Jean Monet
ProjectsProject realized in Faculty of Management and Economics
On the Structure of Time in Computational Semantics of a Variable-Step Solver for Hybrid Behavior Analysis
PublicationHybrid dynamic systems combine continuous and discrete behavior. Often, computational approaches are employed to derive behaviors that approximate the analytic solution. An important part of this is the approximation of differential equation behavior by numerical integration. The accuracy and computational efficiency of the integration usually depend on the complexity of the method and its implicated approximation errors, especially...