Counting Runs of Ones and Ones in Runs of Ones in Binary Strings

doi:10.4236/ojapps.2012.24B011

Paper Menu >>

Journal Menu >>

Counting Runs of Ones and Ones in Runs of Ones in

Binary Strings

Counting runs in binary strings

Frosso S. Makri, Zaharias M. Psillakis, Nikolaos Kollas

Departments of Mathematics and Physics

University of Patras

Patras, Greece

makri@math.upatras.gr; psillaki@physics.upatras.gr

Abstract—Consider a binary string (a symmetric Bernoulli sequence) of length . For a positive integer , we exactly enumerate,

in all  possible binary strings of length , the number of all runs of 1s of length (equal, at least)  and the number of 1s in all runs of 1s of

length at least . To solve these counting problems, we use probability theory and we obtain simple and easy to compute explicit formulae as

well as recursive schemes, for these potential useful in engineering numbers.

Keywords-runs; symmetric Bernoulli trials; probability theory; combinatorial problems

1. Introduction and Preliminaries

Nowadays, the increasing use of the computer science in

diverse applications including encoding, compression and

transmission of digital information calls for understanding the

distribution of runs of 1s or 0s. For instance, such knowledge

would help in analyzing, and comparing also, several

techniques used in communication networks (wired or

wireless). In such networks binary data, ranging from a few

bytes (e.g. e-mails) to many gigabytes of greedy multimedia

applications (e.g. video on demand), are highly processed. For

details, see [1-2] and the references therein.

Another area where the study of the distribution of runs of

1s and 0s has become increasingly useful is the field of

bioinformatics or computational biology. In particular,

molecular biologists examine tandem repeats among DNA

(Deoxyribonucleic acid) segments trying to specify how

probable are runs of matches, denoted as 1s, in adjacent

segments of a DNA sequence. See, e.g. [3-5].

In such applications, as the indicative ones mentioned

above, a key point is the understanding how 1s and 0s are

distributed and combined among the elements of a binary

sequence (finite or infinite, memoryless or not) and eventually

forming runs of 1s and 0s according to certain enumeration

rules (counting schemes). Each enumeration rule defines how

runs of same symbols (i.e. 1s or 0s) are formed and

consequently counted. A rule may depend on, among other

considerations, whether overlapping counting is allowed or not

as well as if the counting starts or not from scratch when a run

of a certain size has been so far enumerated. For extensive

reviews of the runs literature we refer to [6-8]. The topic is still

active and attractive too, because of the wide range of its

application in many areas of applied probability and

engineering including hypothesis testing, quality control,

system reliability and financial engineering. Some recent

contributions on the subject, among others, are the works of [9]

– [22].

Let be a sequence of binary (two-state) random

variables (RVs) taking on the values zero (0) or one (1) ordered

on a line. According to Mood’s [23] enumeration scheme a run

of 1s (1-run) is defined to be a sequence of consecutive 1s

preceded and succeeded by 0s or by nothing. The number of 1s

in a 1-run is referred to as its length (or size). For a positive

integer  let  denote the number of 1-runs of length

exactly  in the first  , binary trials. Following

Makri and Psillakis [19], we use the indicator functions



 ending at  

 with the convention . Consequently, the

statistic  can be expressed as  



 . The RV ,

which is a fundamental one in the run literature, besides its

independent merit, may be used for the representation of other

interesting statistics, too. Among them, the following two have

been frequently discussed in the literature and in particular in

financial engineering and bioinformatics [4, 17-18]. They refer

to 1-runs of length exceeding a positive integer , ,

in  binary trials, and they are the number  of 1-runs of

length at least ; 



 , and the number  of 1s in

all 1-runs of length at least ;  



  An alternative

interpretation of  is that it denotes the sum of the lengths of

the 1-runs of length greater than or equal to  The statistics

 have been studied on binary sequences of

several internal structures by many researches who used

various methods. See, e.g. [1, 4, 9, 11-14, 17-19, 23-30].

In this brief note we show how someone can easily

enumerate explicitly the (total) number of occurrences of all 1-

runs associated with the first two mentioned statistics, as well

as, the (total) number of 1s according to the third one, in all

possible  binary strings of length . Our approach is relied

on simple and efficient probabilistic arguments. It provides an

Open Journal of Applied Sciences

Supplement：2012 world Congress on Engineering and Technology

alternative way to recapture explicit formulae for numbers

associated with 1-runs and it also establishes a new explicit

expression for the number of 1s in certain 1-runs. A unified

recursive scheme for these numbers is provided, too.

2. Main Results

Let 

 stand for the RVs ,,  for  ,

respectively. The support (range set) of 

 is



 

  

 . (1)

Next, we consider a sequence 

 of length of

independent (i.e. derived by a memoryless source) and

identically distributed 0-1 RVs with a common probability of

1s  i.e.  , 

  Such a sequence, called a finite Bernoulli sequence,

is of particular importance in studies of applied probability

because of its simplicity, and also since it may be considered as

a special case of a sequence with dependent elements; e.g. a

Markovian or an exchangeable one.

For a Bernoulli sequence of length , let   

denote the probability mass function (PMF) of the RV 

; i.e.

 

 ,   

,  . (2)

Then, the expected value of 

,

  

   



 (3)

is given by (see Makri et al. [11])

 











   

  

 

 

   

In the sequel, we consider a symmetric () finite

Bernoulli sequence (i.e. a finite binary string) of length  for

which we obtain our main results. Since the cardinality of a

proper sample space is  (i.e. there are  binary strings that

are equally likely to occur) the classical definition of

probability implies that

  

 

 

. (5)

The numbers 

 ,  

, for  admit the

following interpretation: (i) 

 ,   is the number of

all binary strings of length  with exactly  



 , 1-runs of length [exactly (),

at least ()] , among all the  possible binary strings of

length , . (ii) 

 is the number of all binary

strings of length  with exactly  n, 1s

contained in all 1-runs of length at least , among all the 

possible binary strings of length , .

Simple explicit expressions of 

 (not repeated here) are

given in Makri and Psillakis [19]. The authors provided an

explicit formula of 

 , for  in terms of binomial

coefficients. Their method is based on the solution of a

combinatorial problem; specifically, the allocation of balls into

cells under certain constrains (see Lemma 2.2 of [11]). An

explicit expression of 

 , in terms of binomial coefficients

too, is given by Sinha and Sinha [1] who used a generating

function approach. The latter expression contains an additional

sum; therefore it may be evaluated slower computationally than

that provided in [19].

The numbers 

 allow us to establish (we do not actually

need their specific expressions according to the new proposed

approach) respective numbers referring to all possible 

binary strings of length . They are defined as









, (6)

i.e. 

 is the total number of occurrences of all 1-runs of

length [exactly (), at least ()] , and the total

number of 1s in all 1-runs of length at least  [], in all

possible  binary strings of length , .

Since 

  



, hence

by (5) and (6), 

 

. Therefore (4) implies













   

  

  

 

  

(7)

Readily, by symmetry, 

 provides the respective

numbers associated with 0-runs and 0s in 0-runs.

By (7) it is noted that for a fixed , 

,  

decreases exponentially as  increases, and for a fixed 

, as , it holds



 



 

 



 





 

(8)

Furthermore, 

 



 , since   ,

.

Table I presents the three numbers 





 in binary

strings of length  bits,  , and for . The

entries of the table confirm the previously noted behavior of

the depicted numbers.

Sinha [31] was the first who addressed the usefulness of the

number 

 and also provided its formula. Then, Sinha and

Sinha [1] obtained an explicit expression of 

 whereas

Makri and Psillakis [19] derived the same formula for it and

they also established an explicit expression of 

. In both

papers ([1] and [19]) the approach was relied on the definition

of 

 via 

 , . Recently, Sinha and Sinha [2]

reestablished 

 solving explicitly a recursive generation

scheme for it.

The proposed, in the present note, approach is a new one

and treats under the same frame all the numbers 

 in a

simple, unified and systematic way. Accordingly, by (7) we

effortless get a recursive scheme for 

,  . It gives a

way to generate 

 from 

 and it offers further insight in

understanding the interdependencies among the studied

numbers. Specifically, it holds





    

 

 

 

  

(9)

with 

   . We note that

for the particular case  we capture by the relevant entries

of (9) and (7), Theorems 2 and 3 of [2], respectively.

TABLE I. NUMBERS OF OCCURRENCES OF 1-RUNS, 



 AND

NUMBER OF 1S, 

, IN BINARY STRINGS OF LENGTH 















































147456

278528

524288

69632

131072

376832

32768

61440

237568

15360

28672

139264

7168

13312

77824

3328

6144

41984

1536

2816

22016

704

1280

11264

320

576

5632

144

256

2752

112

1312

608

272

116

3. Conlusions

In this note we stated three run statistics which are

important in many areas of applied probability. We defined

them on a binary (0-1) sequence, and we then provided

explicitly their mean values for a Bernoulli sequence. After that,

we considered binary strings (symmetric Bernoulli sequences)

and we showed how the analytic expressions of the means of

these RVs provide eventually the respective explicit

expressions of three numbers studied recently by different

methods. Finally, as a byproduct of our approach, we proposed

a unified recursive scheme which clarifies further the

interdependencies among these numbers. The examined

numbers are potential useful in many engineering applications

like the ones mentioned briefly in the Introduction. Early

results are encouraging in this direction.

REFERENCES

[1] K. Sinha and B. P. Sinha, “On the distribution of runs of

ones in binary strings,” Comput. Math. Appl., vol. 58, pp.

1816-1829, 2009.

[2] K. Sinha and B. P. Sinha, “Energy-efficient

communication: understanding the distribution of runs in

binary strings,” 1st International Conference on Recent

Advances in Information Technology (RAIT-2012), pp.

177-181, 2012.

[3] G. Benson, “Tandem repeats finder: a program to analyze

DNA sequences,” Nucleic Acids Res., vol. 27, pp. 573-

580, 1999.

[4] W.Y. W. Lou, “The exact distribution of the “-tuple

statistic for sequence homoloy,” Statist. Probab. Lett., vol.

61, pp. 51-59, 2003.

[5] G. Nuel, L. Regad, J. Martin and A.C. Camproux, “Exact

distribution of a pattern in a set of random sequences

generated by a Markov source: applications to biological

data,” Alg. Mol. Biol., vol. 5, pp. 1-18, 2010.

[6] N. Balakrishnan and M. V. Koutras, Runs and Scans with

Applications, New York: Wiley, 2002.

[7] J. C. Fu and W. Y. W. Lou, Distribution Theory of Runs

and Patterns and its Applications: A Finite Markov

Imbedding Approach, New Jersey: World Scientific, 2003.

[8] M. V. Koutras, “Applications of Markov chains to the

distribution theory of runs and patterns,” in Handbook of

Statistics, vol. 21, D. N. Shanbhag, C. R. Rao, Eds. North

Holland: Elsevier, 2003, pp. 431-472.

[9] S. Eryilmaz, “Success runs in a sequence of exchangeable

binary trials,” J. Statist. Plann. Inference, vol. 137, pp.

2954-2963, 2007.

[10] F. S. Makri, A.N. Philippou and Z. M. Psillakis, “Polya,

inverse Polya, and circular Polya distributions of order 

for -overlapping success runs,” Commun. Statist. Theory

Methods, vol. 36, pp. 657-668, 2007.

[11] F. S. Makri, A. N. Philippou and Z. M. Psillakis, “Success

run statistics defined on an urn model,” Adv. Appl.

Probab., vol. 39, pp. 991-1019, 2007.

[12] S. Eryilmaz, “Run statistics defined on the multicolor urn

model,” J. Appl. Probab., vol. 45, pp. 1007-1023, 2008.

[13] S. Demir and S. Eryilmaz, “Run statistics in a sequence of

arbitrarily dependent binary trials,” Stat. Papers, vol. 51,

pp. 959-973, 2010.

[14] K. Inoue and S. Aki, “On the conditional and

unconditional distributions of the number of success runs

on a circle with applications,” Statist. Probab. Lett., vol.

80, pp. 874-885, 2010.

[15] F. S. Makri, “On occurences of  strings in linearly

and circularly ordered binary sequences,” J. Appl. Probab.,

vol. 47, pp. 157-178, 2010.

[16] F. S. Makri, “Minimum and maximum distances between

failures in binary sequences,” Statist. Probab. Lett., vol.

81, pp. 402-410, 2011.

[17] F. S. Makri and Z. M. Psillakis, “On runs of length

exceeding a threshold: normal approximation,” Stat.

Papers, vol. 52, pp. 531-551, 2011.

[18] F. S. Makri and Z. M. Psillakis, “On success runs of

length exceeded a threshold,” Methodol. Comput. Appl.

Probab., vol. 13, pp. 269-305, 2011.

[19] F. S. Makri and Z. M. Psillakis, “On success runs of a

fixed length in Bernoull sequences: exact and aymptotic

results,” Comput. Math. Appl., vol. 61, pp. 761-772, 2011.

[20] S. D. Dafnis, A. N. Philippou and D. L. Anztoulakos,

“Distributions of patterns of two successes separeted by a

string of  failures,” Stat. Papers, vol. 53, pp. 323-

344, 2012.

[21] M. V. Koutras and F. S. Milienos, “Exact and asymptotic

results for pattern waiting times,” J. Statist. Plann.

Inference, vol. 142, pp. 1464-1479, 2012.

[22] F. S. Makri and Z. M. Psillakis, “Counting certain binary

strings,” J. Statist. Plann. Inference, vol. 142, pp. 908-924,

2012.

[23] A. M. Mood, “The distribution theory of runs,” Ann.

Math. Stat., vol. 11, pp. 367-392, 1940.

[24] J. C. Fu and M. V. Koutras, “Distribution theory of runs:

a Markov chain approach,” J. Amer. Statist. Assoc., vol.

89, pp. 1050-1058, 1994.

[25] D. L. Antzoulakos, “On waiting time problems associated

with runs in Markov dependent trials,” Ann. Inst. Statist.

Math., vol. 51, pp. 323-330, 1999.

[26] Q. Han and S. Aki, “Joint distributions of runs in a

sequence of multi-trials,” Ann. Inst. Statist. Math., vol. 51,

pp. 419-447, 1999.

[27] J. C. Fu, W. Y. W. Lou, Z. Bai and G. Li, “The exact and

limiting distributions for the number of successes in

success runs within a sequence of Markov-dependent

two-state trials,” Ann. Inst. Statist. Math., vol. 54, pp.

719-730, 2002.

[28] K. Sen, M. L. Agarwal and S. Chakraborty, “Lenths of

runs and waiting time distributions by using Polya-

Eggenberger sampling scheme,” Studia Sci. Math.

Hungar., vol. 2, pp. 309-332, 2002.

[29] D. L. Antzoulakos, S. Bersimis and M. V. Koutras, “On

the distribution of the total number of run lengths,” Ann.

Inst. Statist. Math., vol. 55, pp. 865-884, 2003.

[30] D. E. Martin, “Distribution of the number of successes in

success runs of length at least  in higher-order

Markovian sequences,” Methodol. Comput. Appl. Probab.,

vol. 7, pp. 543-554, 2005.

[31] K. Sinha, Location and communication issues in mobile

networks. Ph. D. Dissertaion, Department of Computer

Science and Engineering, Jadavpur University, Calcutta,

India , 2007.