Filters
total: 4824
filtered: 2848
-
Catalog
- Publications 2848 available results
- Journals 37 available results
- Conferences 10 available results
- People 53 available results
- Inventions 2 available results
- Projects 2 available results
- Laboratories 1 available results
- Research Equipment 4 available results
- e-Learning Courses 399 available results
- Events 11 available results
- Offers 1 available results
- Open Research Data 1456 available results
Chosen catalog filters
displaying 1000 best results Help
Search results for: TEXT INDEXING
-
Context-Aware Indexing and Retrieval for Cognitive Systems Using SOEKS and DDNA
PublicationVisual content searching, browsing and retrieval tools have been a focus area of interest as they are required by systems from many different domains. Context-based, Content-Based, and Semantic-based are different approaches utilized for indexing/retrieving, but have their drawbacks when applied to systems that aim to mimic the human capabilities. Such systems, also known as Cognitive Systems, are still limited in terms of processing...
-
Acquisition and indexing of RGB-D recordings for facial expressions and emotion recognition
PublicationIn this paper KinectRecorder comprehensive tool is described which provides for convenient and fast acquisition, indexing and storing of RGB-D video streams from Microsoft Kinect sensor. The application is especially useful as a supporting tool for creation of fully indexed databases of facial expressions and emotions that can be further used for learning and testing of emotion recognition algorithms for affect-aware applications....
-
A comparison of indexing methods to evaluate quality of soils: the role of soil microbiological properties
Publication -
The partial-order tree: a new structure for indexing on complex attributes in object-oriented databases
Publication -
Inducing a map on homology from a correspondence
Publication -
A comparison of indexing methods to evaluate quality of horticultural soils. Part II. sensitivity of soil microbiological indicators
Publication -
Text classifiers for automatic articles categorization
PublicationThe article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.
-
Agile Commerce in the light of Text Mining
PublicationThe survey conducted for this study reveals that more than 84% of respondents have never encountered the term “agile commerce” and do not understand its meaning. At the same time, they are active participants of this strategy. Using digital channels as customers more often than ever before, they have already been included in the agile philosophy. Based on the above, the purpose of the study is to analyse major text sets containing...
-
A Proliferation-Inducing Ligand and B-Cell Activating Factor Are Upregulated in Patients with Essential Thrombocythemia
Publication -
Prioritising national healthcare service issues from free text feedback – A computational text analysis & predictive modelling approach
PublicationPatient experience surveys have become a key source of evidence for supporting decision-making and continuous quality improvement within healthcare services. To harness free-text feedback collected as part of these surveys for additional insights, text analytics methods are increasingly employed when the data collected is not amenable to traditional qualitative analysis due to volume. However, while text analytics techniques offer...
-
Redox reactions of the FAD-containing apoptosis-inducing factor (AIF) with quinoidal xenobiotics: A mechanistic study
Publication -
UGT1A1gene polymorphism as a potential factor inducing iron overload in the pathogenesis of type 1 hereditary hemochromatosis
Publication -
Text Documents Classification with Support Vector Machines
Publication -
Towards Effective Processing of Large Text Collections
PublicationIn the article we describe the approach to parallelimplementation of elementary operations for textual data categorization.In the experiments we evaluate parallel computations ofsimilarity matrices and k-means algorithm. The test datasets havebeen prepared as graphs created from Wikipedia articles relatedwith links. When we create the clustering data packages, wecompute pairs of eigenvectors and eigenvalues for visualizationsof...
-
Parallel Computations of Text Similarities for Categorization Task
PublicationIn this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....
-
Interactive Information Search in Text Data Collections
PublicationThis article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today’s standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results...
-
Text Categorization Improvement via User Interaction
PublicationIn this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...
-
Evaluation and Irony in Text in the Light of Speech Act Theory
Publication -
Evaluation of Path Based Methods for Conceptual Representation of the Text
PublicationTypical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...
-
Text categorization with semantic commonsense knowledge: First results
PublicationDo przetwarzania tekstów typowo wykorzystuje się reprezentacjeBOW. Podejście takie nie daje jednak dobrych rezultatów w sytuacjigdy podobne dokumenty nie współdzielą ze sobą słów.W artykule zaprezentowano podejście do konstrukcji funkcjijądra dla klasyfikatorów SVM opartego na zewnętrznej bazie wiedzyo pojęciach językowych.
-
External Validation Measures for Nested Clustering of Text Documents
PublicationAbstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...
-
Time-domain prosodic modifications for text-to-speech synthesizer
PublicationAn application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.
-
Semantic Analysis and Text Summarization in Socio-Technical Systems
PublicationIn this chapter the authors present the results of the development the methodology for increasing the reliability of the functioning of the Socio-Technical System. The existed methods and algorithms for processing unstructured (textual) information were studied. Taking into account noted above strengths and weaknesses of Discriminant and Probabilistic approaches of Latent Semantic Relations analysis in of the summarization projection...
-
Comparative Analysis of Text Representation Methods Using Classification
PublicationIn our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...
-
Two Stage SVM and kNN Text Documents Classifier
PublicationThe paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...
-
Selection of Relevant Features for Text Classification with K-NN
PublicationIn this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated...
-
Development and Research of the Text Messages Semantic Clustering Methodology
PublicationThe methodology of semantic clustering analysis of customer’s text-opinions collection is developed. The author's version of the mathematical models of formalization and practical realization of short textual messages semantic clustering procedure is proposed, based on the customer’s text-opinions collection Latent Semantic Analysis knowledge extracting method. An algorithm for semantic clustering of the text-opinions is developed,...
-
Towards facts extraction from text in Polish language
PublicationNatural Language Processing (NLP) finds many usages in different fields of endeavor. Many tools exists allowing analysis of English language. For Polish language the situation is different as the language itself is more complicated. In this paper we show differences between NLP of Polish and English language. Existing solutions are presented and TEAMS software for facts extraction is described. The paper shows also evaluation of...
-
Generating actionable evidence from free-text feedback to improve maternity and acute hospital experiences: A computational text analytics & predictive modelling approach
PublicationBackground Patient experience surveys are a key source of evidence for supporting decision-making and quality improvement in healthcare services. These surveys contain two main types of questions: closed and open-ended, asking about patients’ care experiences. Apart from the knowledge obtained from analysing closed-ended questions, invaluable insights can be gleaned from free-text data. Advanced analytics techniques are increasingly...
-
Test PDF
PublicationTest PDF
-
Thresholding Strategies for Large Scale Multi-Label Text Classifier
PublicationThis article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classification tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classifier on medium scale text corpora extracted from Wikipedia. Obtained results show that the...
-
Wieloznaczność w języku i tekście [Ambiguity in language and text]
Publication -
Representation of hypertext documents based on terms, Links and text compressibility
PublicationOpisano metody reprezentacji dokumentów tekstowych oparte na słowach, wzajemnych powiązaniach i metodach kompresji. Dokonano ich oceny w oparciu o klasyfikator SVM.
-
Automatic prosodic modification in a Text-To-Speech synthesizer of Polish language
PublicationPrzedstawiono system syntezy mowy polskiej z funkcją automatycznej modyfikacji prozodii wypowiedzi. Opisane zostały metody automatycznego wyznaczania akcentu i intonacji wypowiedzi. Przedstawiono zastosowanie algorytmów przetwarzania sygnału mowy w procesie kształtowania prozodii. Omówiono wpływ zastosowanych modyfikacji na naturalność brzmienia syntezowanego sygnału. Zastosowana metoda oparta jest na algorytmie TD-PSOLA. Opracowany...
-
The use of can in automation test bench to test the engine cooling system
PublicationW publikacji opisano zasady rejestracji danych pomiarowych w hamowni silnikowej do badań układów chłodzenia silników samochodowych. Założeniami projektu stanowiska było użycie standardu przesyłu danych typu CAN. Opracowano metodę dużej gęstości zapisu przesyłanych danych.
-
Next Generation Digital
PublicationThe paper outlines the major objectives of the MENAID research project, eimed at novel architectures of digital documents. Such documents will enable reduction of information overflow and strain, a major threat to the growth of a digital society. They will be forward compatible, technology neutral and lightweight, allowing workers of network organizations to use personal devices of any type.
-
Application of dynamic time warping and cepstrograms to text-dependent speaker verification
PublicationThis work provides a description of an automatic speaker verification (ASV) system. In particular, it documents the evolution of all individual stages of the proposed ASV system design from the phase of preprocessing to an operational decision making system. The aim of this research was to achieve the system of the best safety and ease of use in view of users. The objective estimation of this target has been accomplished by assessing...
-
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
PublicationThe main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...
-
Text-mining Similarity Approximation Operators for Opinion Mining in BI tools
PublicationThe concept of the Text-mining Similarity Approximation Operators for Opinion Mining as extensions to Natural Language Interface Database is defined. The new operators: “keywords of” dimension; subsetting operator “about C is q”; aggregation operator “by similar C” are proposed. These operators are based on the Latent Semantic Analysis and Social Network Analysis
-
Text Mining Algorithms for Extracting Brand Knowledge; The fashion Industry Case
PublicationBrand knowledge is determined by customer knowledge. The opportunity to develop brands based on customer knowledge management has never been greater. Social media as a set of leading communication platforms enable peer to peer interplays between customers and brands. A large stream of such interactions is a great source of information which, when thoroughly analyzed, can become a source of innovation and lead to competitive advantage....
-
The Method of a Two-Level Text-Meaning Similarity Approximation of the Customers’ Opinions
PublicationThe method of two-level text-meaning similarity approximation, consisting in the implementation of the classification of the stages of text opinions of customers and identifying their rank quality level was developed. Proposed and proved the significance of major hypotheses, put as the basis of the developed methodology, notably about the significance of suggestions about the existence of analogies between mathematical bases of...
-
Application of colour image segmentation for localization and extraction text from images
PublicationW otaczającym nas świecie informacja tekstowa odgrywa wielką rolę. W postaci tekstowej podawane są: nazwy ulic, nazwy sklepów i instytucji, opisy przedmiotów np. tytuły książek, opakowań itp. Jednocześnie współczesne programy komputerowe służące do rozpoznawania tekstu (OCR) ''nie radzą sobie'' z analizą obrazów otrzymanaych za pomocą kamer. Segmentacja obrazu z następującą kontekstową analizą parametrów segmentów może dostarczyć...
-
(Di-tert-butylmethylphosphane)(η2-di-tert-butylphosphanylphosphinidene)(triphenylphosphane)platinum(0)
PublicationStruktura krystaliczna tytułowego związku, [(Ph3P)(tBu2PMe)Pt(η2-tBu2PP)], zawiera cztery cząsteczki w części niezależnej nieznacznie różniące się konformacjami. Odległości P-P w ligandzie tBu2PP są zbliżone dla wszystkich czterech cząsteczek [2.0661(13)-2.0678(13)A˚]. Odległości te, wskazują na wielokrotny charakter wiązania P-P w ligandzie tBu2PP. Atom platyny w kompleksie wykazuje koordynację płaską kwadratową. Prezentowana...
-
Synthesis and structure of Dicyclohexylammonium Tri-tert-pentoxysilanethiolate and 5-aminopentylammonium Tri-tert-pentoxysilanethiolate
PublicationTri-tert-pentoksysilanotiol reaguje z dicykloheksyloaminą i 1,5-diaminopentanem dając odpowiednie sole amoniowe. Sole te scharakteryzowano poprzez analizę elementarną, widma IR i NMR oraz metodą rentgenowskiej analizy strukturalnej. Są to pierwsze pochodne tri-tert-pentoksysilanotiolu, dla których wyznaczono strukturę krystaliczną.
-
Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network
PublicationTo effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...
-
Plug-in to Eclipse environment for VHDL source code editor with advanced formatting of text
Publication -
Ontology-based text convolution neural network (TextCNN) for prediction of construction accidents
PublicationThe construction industry suffers from workplace accidents, including injuries and fatalities, which represent a significant economic and social burden for employers, workers, and society as a whole.The existing research on construction accidents heavily relies on expert evaluations,which often suffer from issues such as low efficiency, insufficient intelligence, and subjectivity.However, expert opinions provided in construction...
-
Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary
PublicationThis paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments...
-
The smoothness test for a density function
PublicationThe problem of testing hypothesis that a density function has no more than μ derivatives versus it has more than μ derivatives is considered. For a solution, the L2 norms of wavelet orthogonal projections on some orthogonal ‘‘differences’’ of spaces from a multiresolution analysis is used. For the construction of the smoothness test an asymptotic distribution of a smoothness estimator is used. To analyze that asymptotic distribution,...
-
Collective Uncertainty Entanglement Test
PublicationFor a given pure state of a composite quantum system we analyze the product of its projections onto aset of locally orthogonal separable pure states. We derive a bound for this product analogous to theentropic uncertainty relations. For bipartite systems the bound is saturated for maximally entangled statesand it allows us to construct a family of entanglement measures, we shall call collectibility. As thesequantities are experimentally...