prof. dr inż. Wojciech Szpankowski
Publications
Filters
total: 17
Catalog Publications
Year 2016
-
Asymmetric Renyi Problem and > PATRICIA Tries
PublicationIn 1960 R´enyi asked for the number of random queries necessary to recover a hidden bijective labeling of n distinct objects. In each query one selects a random subset of labels and asks, what is the set of objects that have theselabels? Weconsider here anasymmetric version of the problem in which in every query an object is chosenwith probability p > 1/2 and we ignore “inconclusive” queries. We study the number of queries needed...
-
Average Size of a Suffix Tree for Markov Sources
PublicationWe study a suffix tree built from a sequence generated by a Markovian source. Such sources are more realistic probabilistic models for text generation, data compression, molecular applications, and so forth. We prove that the average size of such a suffix tree is asymptotically equivalent to the average size of a trie built over n independentsequences from the same Markovian source. This equivalenceis only known for memoryless...
-
Lossless Compression of Binary Trees with Correlated Vertex Names
PublicationCompression schemes for advanced data structures have become the challenge of today. Information theory has traditionally dealt with conventional data such as text, image, or video. In contrast, most data available today is multitype and context-dependent. To meet this challenge, we have recently initiated a systematic study of advanced data structures such as unlabeled graphs [1]. In this paper, we continue this program by considering...
-
The Boltzmann sequence-structure channel
PublicationWe rigorously study a channel that maps binary sequences to self-avoiding walks in the two-dimensional grid, inspired by a model of protein statistics. This channel, which we also call the Boltzmann sequence-structure channel, is characterized by a Boltzmann/Gibbs distribution with a free parameter corresponding to temperature. In our previous work, we verified experimentally that the channel capacity has a phase transition for...
-
Types of Markov Fields and Tilings
PublicationThe method of types is one of the most popular techniques in information theory and combinatorics. However, thus far the method has been mostly applied to one-dimensional Markov processes, and it has not been thoroughly studied for general Markov fields. Markov fields over a finite alphabet of size m ≥ 2 can be viewed as models for multi-dimensional systems with local interactions. The locality of these interactions is represented...
Year 2015
-
On the Origin of Protein Superfamilies and Superfolds
PublicationDistributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to...
-
Phase Transition in a Sequence-Structure Channel
PublicationWe study an interesting channel which maps binary sequences to self-avoiding walks in the two-dimensional grid, inspired by a model of protein folding from statistical physics. The channel is characterized by a Boltzmann/Gibbs distribution with a free parameter corresponding to temperature. We estimate the conditional entropy between the input sequence and the output fold, giving an upper bound which exhibits an unusual phase transition...
Year 2014
-
A Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence
PublicationWe resolve a conjecture proposed by D.E. Knuth concerning a recurrence arising in the satisfiability problem. Knuth's recurrence resembles recurrences arising in the analysis of tries, in particular PATRICIA tries, and asymmetric leader election. We solve Knuth's recurrence exactly and asymptotically, using analytic techniques such as the Mellin transform and analytic depoissonization.
-
On Symmetry of Uniform and Preferential Attachment Graphs
PublicationMotivated by the problem of graph structure compression under realistic source models, we study the symmetry behavior of preferential and uniform attachment graphs. These are two dynamic models of network growth in which new nodes attach to a constant number m of existing ones according to some attachment scheme. We prove symmetry results for m=1 and 2 , and we conjecture that for m≥3 , both models yield asymmetry with high...
-
On the Limiting distribution of Lempel Ziv'78 Redundancy for Memoryles Sources
PublicationWe show that the Lempel Ziv'78 redundancy rate tends to a Gaussian distribution for memoryless sources. We accomplish it by extending findings from our 1995 paper [3]. We present a new simplified proof of the Central Limit Theorem for the number of phrases in the LZ'78 algorithm. As in our 1995 paper, here we first analyze the asymptotic behavior of the total path length in a digital search tree (a DST) built from independent sequences....
-
On the Limiting Distribution of Lempel-Ziv’78 Redundancy for Memoryless Sources
PublicationWe study the Lempel-Ziv'78 algorithm and show that its (normalized) redundancy rate tends to a Gaussian distribution for memoryless sources. We accomplish it by extending findings from our 1995 paper, in particular, by presenting a new simplified proof of the central limit theorem (CLT) for the number of phrases in the LZ'78 algorithm. We first analyze the asymptotic behavior of the total path length in the associated digital search...
Year 2013
-
Average Redundancy of the Shannon Code for Markov Sources
PublicationIt is known that for memoryless sources, the average and maximal redundancy of fixed–to–variable length codes, such as the Shannon and Huffman codes, exhibit two modes of behavior for long blocks. It either converges to a limit or it has an oscillatory pattern, depending on the irrationality or rationality, respectively, of certain parameters that depend on the source. In this paper, we extend these findings, concerning the Shannon...
-
Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model
PublicationTries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel- Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing...
Year 2012
-
On a Recurrence Arising in Graph Compression
PublicationIn a recently proposed graphical compression algorithm by Choi and Szpankowski (2012), the following tree arose in the course of the analysis. The root contains n balls that are consequently distributed between two subtrees according to a simple rule: In each step, all balls independently move down to the left subtree (say with probability p) or the right subtree (with probability 1p). A new node is created as long as...
Year 2011
-
Limiting distribution of Lempel Ziv'78 redundancy
PublicationWe show that the Lempel Ziv'78 redundancy rate tends to a Gaussian distribution for memoryless sources. We accomplish it by extending findings from our 1995 paper [3]. We present a new simplified proof of the Central Limit Theorem for the number of phrases in the LZ'78 algorithm. As in our 1995 paper, here we first analyze the asymptotic behavior of the total path length in a digital search tree (a DST) built from independent sequences....
Year 2008
-
What is information?
PublicationPrzedstawiono zarys metodologii konstrukcji miar informacyjnych opartej na sterowaniu zdarzeniowym. Podejście takie umożliwia jawne uwzględnianie wpływu kontekstu, kryterium użyteczności i protokołu postępowania na ilość informacji. Podano szereg przykładów obejmujących systemy rozproszone, transmisję danych, środowiska niekooperacyjne i in. wraz z oceną wynikowej przepustowości. Wskazano dalsze kierunki badań, m. in. w kierunku...
Year 2007
-
What is information?
PublicationPojęcie informacji było dotąd kwantyfikowane głównie w terminach statystycznych, dając początek shannonowskiej teorii informacji i zasad cyfrowej transmisji danych. Współczesne systemy komunikacyjne łączące skomplikowane, inteligentne i autonomiczne podmioty wymagają nowego spojrzenia na miary informacji, uwzględniającego kontekst i semantykę przekazu oraz racjonalne działania podmiotów. W niniejszym eseju zaproponowano podejście...
seen 869 times