ask Ask a question
Favorite

The LZ complexity [55] and its derivatives are closely related to the Kolmogorov complexity [56,57] and the Shannon entropy. They can be easily computed and have found wide applications in characterizing the randomness of complex data.

To compute the LZ complexity, a numerical sequence has to first be transformed into a symbolic sequence. The most popular approach is to convert the signal into a 0–1 sequence by comparing the signal with a threshold value Sd [14]. That is, whenever the signal is larger than Sd, one maps the signal to 1; otherwise, one maps it to 0. One good choice of Sd is the median of the signal [15]. When multiple threshold values are used, one may map the numerical sequence to a multisymbol sequence. Note that if the original numerical sequence is a nonstationary random walk-type process, one should analyze the stationary difference data instead of the original nonstationary data.

After the symbolic sequence is obtained, it can then be parsed to yield distinct words, and the words can be encoded. Let L(n) denote the length of the encoded sequence for those words. The LZ complexity can be computed as

Note, this is very much in the spirit of the Kolmogorov complexity [56,57].

There exist many different methods for performing parsing. One popular scheme was proposed by the original authors of the LZ complexity [55]. Another attractive method is proposed by Cover and Thomas [58]. Let c(n) denote the number of words in the parsing of the source sequence by the second scheme. For each word, we can use log2c(n) bits to describe the location of the prefix to the word and one bit to describe the last bit. Then, the total length of the encoded sequence is L(n)=c(n)[log2c(n)+1]. Equation (2) then becomes

When n is very large, c(n)n/log2n [55,58]. Dropping terms much smaller than log2n, we can replace log2c(n) in Equation (3) by log2n and obtain

This is the functional form for the definition of the commonly used LZ complexity. Unfortunately, CLZ depends on the sequence length. Rapp et al. [59] were among the first to consider normalizing the LZ complexity to make it independent of the sequence length by computational means. This issue was reconsidered by Hu et al. [60] using an analytic approach.

PE is introduced in [21] as a convenient means of analyzing a time series. It may be considered as a measure from chaos theory, since embedding vectors are used in the analysis. Using the notations of [22], it can be described as follows.

For a given but otherwise arbitrary i, the m number of the real values of Xi=[x(i),x(i+L),,x(i+(m1)L)] are sorted in ascending order, [x(i+(j11)L)x(i+(j21)L)x(i+(jm1)L]. When an equality occurs, e.g., x(i+(ji11)L)=x(i+(ji21)L), the quantities x are ordered according to the values of their corresponding j’s; that is, if ji1<ji2, then we write x(i+(ji11)L)x(i+(ji21)L). Therefore, the vector Xi is mapped onto a sequence of numbers, (j1,j2,,jm), which is one of the m! permutations of m distinct symbols (1,2,,m). When each such permutation is considered as a symbol, the reconstructed trajectory in the m-dimensional space is represented by a symbol sequence. Let the probability for the Km! distinct symbols be P1,P2,,PK. Then, PE, denoted by Ep, for the time series {x(i),i=1,2,} is defined as

The maximum of EP(m) is ln(m!) when Pj=1/(m!). It is convenient to work with

Thus, Ep gives a measure of the departure of the time series under study from a completely random one: the smaller the value of Ep is, the more structure the time series has.

To detect interesting dynamical changes in a time series, one can partition a time series into overlapping or nonoverlapping segments of short length, compute the PE from each segment, and examine how the PE changes with the segments [22]. Here, we apply this approach to compute the PE from the minute-to-minute logarithmic yields of the composite indices of SSE and DJIA on each trading day, then check how the PE varies with time.

In this paper, for the daily stock index data of both USA and China, we employ the same segmentation and choose m=5 and L=1. For the intraday stock index data, since on each day, the data for the Chinese market is shorter than for the USA, we choose m=4, L=1 for SSE and m=5, L=1 for DJIA, respectively. Note that the time delay L was always chosen to be 1; this is based on the reasoning that stock data are basically random. As explained in [36], such a choice is not only sufficient, but optimal. The selection of m is basically constrained by the length of each sub-dataset under computation—if the data segment is short, then m cannot be too big. To better cope with the randomness in the data, m should not be too small either. The fact that the embedding window (m1)L is often larger than 1 (and as a result of the patterns in the dataset) makes it possible to quantify the correlations in the data to some degree with the PE. This will be discussed in more depth later in the paper.

Consider a 1/f2H+1 process, where 0<H<1. Such processes are normally called nonstationary random-walk processes. Its differentiation, denoted as X={xt:t=0,1,2,}, is a covariance stationary stochastic process, with mean μ, variance σ2, and autocorrelation function r(w) have the form [35]

When 1/2<H<1, wr(w)=, leading to long-range temporal correlation. The process X has a PSD of 1/f2H1. A 1/f process cannot be aptly modeled by a Markov process or an ARIMA model [61] since the PSD of those processes are distinctly different from 1/f. To adequately model a 1/f process, a fractional order process has to be used. A well-known process of this class is the fractional Brownian motion model [34].

There are many excellent methods to estimate the Hurst parameter [36]. One of the most popular methods for estimating the Hurst parameter H is detrended fluctuation analysis (DFA) [62]. This involves constructing a random walk process

where x¯ is the mean of the series x(i),i=1,2,, dividing the constructed random walk process into nonoverlapping segments, determining the best linear or polynomial fits in each segment as the local trends, getting the variance of the differences between the random walk process and the local trends, and averaging them over all the segments. Clearly, DFA may involve discontinuities at the boundaries of adjacent segments. Such discontinuities could be detrimental when the data contain trends [63], or nonstationarity [64] or nonlinear oscillatory components such as signs of rhythmic activity [65,66]. Fortunately, this shortcoming can be readily overcome using a method called adaptive fractal analysis (AFA) [38,39]. AFA is based on a nonlinear adaptive multiscale decomposition, which starts by partitioning a time series into segments of length w=2n+1, where neighboring segments overlap by n+1 points. Each segment is then fitted with the best polynomial of order M. We denote the fitted polynomials for the i-th and (i+1)-th segments by y(i)(l1) and y(i+1)(l2), respectively, where l1,l2=1,,2n+1. We then define the fitting for the overlapped region as

where w1=1l1n and w2=l1n can be written as (1dj/n) for j=1,2, and where dj denotes the distances between the point and the centers of y(i) and y(i+1), respectively. This means that the weights decrease linearly with the distance between the point and the center of the segment. Such a weighting ensures symmetry and effectively eliminates any jumps or discontinuities around the boundaries of neighboring segments, and therefore can maximally suppress the effect of complex nonlinear trends on the scaling analysis.

With the above procedure, AFA can be readily described. For an arbitrary window size w, we determine, for the random walk process u(i), a global trend v(i),i=1,2,,N, where N is the length of the walk. The residual, u(i)v(i), characterizes fluctuations around the global trend, and its variance yields the Hurst parameter H according to

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A