^{1}

^{*}

^{2}

^{2}

^{1}

^{1}

^{2}

**The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to align and analyze the sequences, providing improvements mainly in the running time of these algorithms. In many situations, the parallel strategy contributes to reducing the computational complexity of the big problems. This work shows some results obtained by an implementation of a parallel score estimating technique for the score matrix calculation stage, which is the first stage of a progressive multiple sequence alignment. The performance and quality of the parallel score estimating are compared with the results of a dynamic programming approach also implemented in parallel. This comparison shows a significant reduction of running time. Moreover, the quality of the final alignment, using the new strategy, is analyzed and compared with the quality of the approach with dynamic programming.**

The biologists need the help of Bioinformatics because everyday they generate a huge amount of data from their experimental results and they need to analyze these results aligning the sequences, searching for some patterns on them and identifying some hot spots [1,2]. However, they can not perform this action in time without computers [

When considering multiple sequence alignments (MSA), the dynamic programming algorithms are efficient only for a limited number of multiple lengthy sequences [

Nevertheless, with the increase of problems’ complexity, the number of sequences to be analyzed has grown from several dozens to several thousands [

With the demand for more computing power, parallel and distributed computing were used to improve the performance of task execution in Bioinformatics, specially stochastic algorithms, thus putting together high performance computing and Bioinformatics.

Stochastic approaches can not arrive at the exact solution, but they try to obtain the best optimality degree of the solution. The progressive MSA is one of the most used stochastic algorithms for aligning sequences with a good performance and a reasonable quality.

The progressive MSA algorithm is divided into three stages: the first stage is the score matrix calculation, the second is the phylogenetic tree construction and the last is the multiple alignment. The score matrix calculation is the most computationally complex of the three, because it performs the pairwise comparisons among sequences, generally using dynamic programming. However, some strategies can be used to reduce the computational complexity of first stage and score estimating is one of them [7,8].

This work shows some results obtained by an implementation of a parallel score estimating technique (which we call the new approach) in the score matrix calculation stage and the comparison of this method with traditional dynamic programming also implemented in parallel (which we call the standard approach). The execution time and the quality of the obtained alignments were compared between the approaches. This new approach can reduce the computational complexity of this step from O(mn) to O(m + n), considering two sequences X and Y of lengths m and n, respectively [

This paper is organized as follows: Section II reviews the main concepts of the multiple sequence alignments and discusses some related works; in Section III the score estimating algorithm is described; in Section IV, the obtained results are presented and discussed, and in Section V, the conclusions are presented.

The sequence alignment is not an easy task and the biologists constantly need evaluations over genes and the characteristics of proteins of species.

The alignment among sequences of DNA/RNA and proteins of different species is a hypothesis of homology among the components of genes and proteins. The alignments can be used as models to propose and test evolutionary hypothesis which are also important to the studies of phylogeny [1,3].

The use of MSA algorithms, the parallelization of them and the optimization techniques for these algorithms have been improved in the last years.

Some methods to improve the execution time of MSA algorithm based on pairwise comparison, where the goal is to find an optimal alignment with some restrictions, were proposed [

Otherwise, with the growing size of the sequences, the parallel solutions began to lose their high efficiency, and the addition of optimization techniques in the algorithms of parallel solutions became necessary [13,14].

Our work explores all the parallelism power in the solution of multiple sequence alignment problems, developing a parallel version over the sequential score estimating optimization technique proposed by Chen et al. [

Usually, the standard progressive MSA algorithm uses in its first step the dynamic programming algorithm. However, from our research experience we realized that there are different strategies to calculate this score matrix (a matrix containing the final score of aligned pairs) which work better. More specifically, the standard approach is based on the progressive algorithm of Clustal [6,15], which is a well known, largely used and conescrated strategy in Bioinformatics. The source code of the tool can be obtained in the web page of the European Bioinformatics Institute^{1}. Basically, we have taken the first stage of this algorithm, the pairwise alignment, and parallelized it (standard approach) as can be seen in

In the new approach, we used the estimating score technique to obtain the score matrix results. However we did not use it in a sequential way, as it is reported in the literature [

Each one of the four stages of the estimating score algorithm performs a scan in the sequences which are placed in pairs. Each pair of sequences has to perform the four stages, necessarily. The stages are classified as: Right-Upper, Right-Lower, Left-Upper and Left-Lower. The classifications Upper and Lower are related to the position of the sequences in the analysis. The Right and Left movements are related to the scanning directions. The maximum score among four stages is used to the matrix score.

In the Right-Upper execution, the last character on the right side of the upper sequence is chosen and set as the

starter character. In this case, it is the character A. Departing from it, a series of comparisons with the characters of lower sequence are performed, going from right to left (this is the direction of scanning). If there is no equal character (a match) in the lower sequence, the algorithm moves to the next left character in the upper sequence, (i.e. considering it the new starter character) and repeats the complete scanning process again. Otherwise, if an equal character is found in the lower sequence (this is the case in our example, when we find a match), one point is scored and the scanning starts again, now taking the new starter character in the lower sequence. The match is with the character A (third from right to left in the lower sequence), as it can be seen in the

The square around the illustration of sequence pairs indicates that each task is performed in a different processor unit. The distribution of the stages is done as soon as the processor is or becomes available. This approach is possible, because the stages are totally independent.

The algorithm might be executed in any amount of processors, because the distribution of the stages is done through an order queue, where the four stages of the pair of sequences in time are distributed for the processors and, if there are more processors available, the stages of the next pair of sequences in the queue are distributed too, until all the processors are working (busy). This process is repeated until the end of the pairs of sequences and their stages. Below, we present an algorithm for this control:

While (pair_of_sequences_has_not_yet_executed > 0)

{

waiting(); //waiting message of processor’s availability

check_the_order_queue();

allocate_the_correct_stage_of_time();

decrement_counter_pair();

}

In this section we report the results that demonstrate the improvement in the performance of the new approach, implemented by the score estimating algorithm, when compared to the standard multiple progressive alignment, implemented with parallel dynamic programming. It is important to emphasize that the performance results showed here are related only to the execution time of the first stage of the algorithm.

The tests were performed with 550 residues on average for nucleotides and with 180 residues on average for amino acids, with different amounts of sequences for both approaches. They were run under a Linux Debian Beowulf cluster of Athlon XP 2100 + with 9 operational nodes. The front-end node has 2 GB of memory and 2 disks of 80 GB each, and the other 8 nodes have 1 GB of memory and 1 disk of 80 GB for each node. The communication interface is based on Fast Ethernet 10/100 and uses MPICH as a communication library.

It can be seen in