G. RESHETOVA ET AL.

Copyright © 2013 SciRes. JAMP

meters. Taking this into account we slice the total 3D

model into a number of disc-like subdomains Ω. Finite

difference scheme assumes communication between

neighboring processors requiring them to exchange func-

tion values on the interfaces between elementary discs.

It should be noted that DD on the base of this rather

simple geometry easily provides possibility to guarantee

uniform load of Processor Units involved in computa-

tions. Another advantage of the chosen DD is in ex-

tremely small portions of data PU should interchange at

each time step and, so, extremely small waiting period

before computation on the next time step would be done.

The MPI (Message Passing Interface) library is applied

for arranging the above-mentioned send/receive proce-

dures and special efforts are paid in order to minimize

idle time of Processor Units due to the data exchange. In

order to provide this we start computations for each sub-

domain from its interior widening them towards inter-

faces and use non-blocking functions Isend and Ireceive

in order to arrange data exchange between neighboring

PU.

Special attention was paid to analysis of effectiveness

and scalability of this approach. This analysis was per-

formed by the series of numerical experiments performed

on the cluster HKC-160 (Siberian Supercomputer Center,

Novosibirsk) made of 80 computation modules (hp Inte-

grity rx1620 with two PU Intel Itanium 2, each of 1.6

Ghz, 3 Mb cache, 4 Gb RAM) connected via 24-port

commutator InfiniBand (10 Gbot, Cluster Interconnect).

Peak performance of the cluster is about 1 Tflop/s.

In order to estimate effectiveness of parallelization , fixed

computational area was decomposed on different quanti-

ty of subdomains, so simulation was performed for the

same target area, but with increasing number of PU.

For each effectivenes s is found as

4* (4)

() * ()

time

eff nntime n

=

(3)

where

is computer time expended by PU

for simulation. We start with

in order to provide

from the very beginning the same amount of data ex-

change between adjacent PU. On Figure 1(a)) one can

see effectiveness computed for two different lengths of

computational area: 12 meters (circles) and 24 meters

(rectangles). These results are completely predictable -

the less is load of PU the less is effectiveness. It is con-

firmed by behavior of both curves - for target area of 24

meters the load of single PU is twice as many as for tar-

get area of 12 meters and decrease of effectiveness is not

so sharp, but finally they become the same.

Now let us perform the series of numerical experi-

ments in opposite manner-we fix size of elementary

subdomains but increase their quantity proportionally to

quantity of PU. This result one can see on Figure 1(b))

Figure 1. Scalability by numerical experimentations: Up-

per-Effectiveness with fixed computational area; Down-

Total time with fixed load of PU.

and certain that the computational time does not change

under increase of quantity of PU.

So, we can conclude, that the key parameter is ratio of

amount of data being load to each of PU and amount of

data it should exchange with its neighbor. The higher is

this ratio the higher is effectiveness of parallelization. In

particular, if this ratio is fixed it does not matter how

many PU are involved in simulation.

3.2. Cross-Hole Tomography

Geometry of computational area used for cross-hole to-

mography is very different in comparison with previous

one-now it is parallelepiped with length (distance be-

tween wells) about 300/500 meters, approximately the

same depth (vertical size) and rather narrow width of ≈

100 meters. Next, contrary to Sonic Log, now local grid

refinement should be implemented for two different spa-

tial locations—around wells with sources and receivers,

so we need to pay special attention in order to provide

uniform load of PU. The easiest way to do this is to per-

form straightforward application of described above 1D

Domain Decomposition along cross-well direction, but it

will lead to necessity of huge data exchange between PU