Int. J. Communications, Network and System Sciences, 2011, 4, 227-231
doi:10.4236/ijcns.2011.44027 Published Online April 2011 (http://www.SciRP.org/journal/ijcns)
Copyright © 2011 SciRes. IJCNS
An Improved Task Scheduling Algorithm in Grid
Computing Environment
Liang Yu, Gang Zhou, Yifei Pu
College of Comp ut er Sci ence, Sichua n U ni versi t y, Chengdu, China
E-mail: yuliang@scu.edu.cn
Received January 21, 2011; revised February 19, 2011; accepted February 21, 2011
Abstract
Algorithm research of task scheduling is one of the key techniques in grid computing. This paper firstly de-
scribes a DAG task scheduling model used in grid computing environment, secondly discusses generational
scheduling (GS) and communication inclusion generational scheduling (CIGS) algorithms. Finally, an im-
proved CIGS algorithm is proposed to use in grid computing environment, and it has been proved effec-
tively.
Keywords: Grid Computing, Model of Task Scheduling, Heuristics Algorithm, Dependent Task Scheduling
Algorithm
1. Introduction
How to effectively dispatch the task of Scheduling Com-
puting is one of the most important factors in the success
of grid computing. Users can share grid resources through
submitting computing tasks to grid system. According to
some strategy, grid scheduling process allocates those
tasks to appropriate resources. Efficient scheduling algo-
rithm can make good use of processing capacity of the
grid system, thereby improving application performance.
Grid system with the goal of optimizing throughput for
task scheduling has been proved to be NP complete
problem, so this has often using heuristic method to search
approximate optimal scheduling program. Heuristic
method which often based on visual inspiration is a ap-
proximation algorithm. The method that has gradually
optimized on the basis of feasible solution searches for a
similar algorithm at a low polynomial time operation.
2. The Mathematical Description of Direct
Acyclic Graph
Applications of grid environment can be described by set
which is composed of numbers of subtasks. A grid ap-
plication procedures can be expressed as a DAG (Direct
Acyclic Graph), G = (T, E), T is a node set and E is a set
of direct edge. A node t denotes a task in DAG, and a
direct edge is denoted by a pair of node (ti, tj). The first
node is called father and the latter is called children node.
The node with no father is called entrance, i.e., the be-
ginning of application. The node with no children is
called export, i.e., the end of application. Figure 1 gives
an example of DAG task. Among them, nx is a serial
number of task and the numbers in brackets are parame-
ters for calculating. The scheduling which is in a proc-
essing m unit and a task graph G = (T, E) is a function f.
Each of task is mapped on certain unit with a specific
beginning time by f. A scheduling can be described as
:1,2,,0,fT m in form. If it exits vT
,
,
f
vit, Scheduling tasks V should be scheduled on
the processing unit Pi and it starts from time t.
3. Dependent Task Scheduling Algorithm
GS and CIGS
Dependent task scheduling is an important scheduling
Figure 1. DAG task illustration.
L. YU ET AL.
228
problem, so many research papers have all expounded.
Most of the existing heuristic algorithm is based on an
independent task, and dependent task is researched in a
homogeneous environment with the same structure, which
composed of uniformity machine. For example, Shang [1]
and others adopt a new list of scheduling technology to
control dependence task scheduling in environment with
the same structure. Wu [2] and others based on topo-
logical sort have proposed a fast local search algorithm
to enhance scheduling of quality and efficiency. Oliveira
[3] and others have proposed a task of scheduling algo-
rithm with fixed priority under a distributed computer
system. Above of algorithms are too many constrains,
and not the general, Carter [4] and others have developed
Generation Scheduling (GS) to control dependent task in
heterogeneous system. The main steps of GS algorithm
can be described as follows:
1) Create scheduling problems. The task of interde-
pendent make some tasks start after the completion
of other tasks and each task is not set up the initial
scheduling. By tracking and analyzing the task of
interdependent relationship, when a pilot task of
entire mission has been completed, then the task
can be set up to scheduling.
2) Filtering mandate. When scheduling events occur
at any one time, its not scheduling task can be fil-
tered out. Then the remaining tasks will be sched-
uled as the “independence” by the formation of a
new set SI (Set of Independent tasks). Obviously,
all of tasks in the SI have no predecessors mandate
for immediate implementation.
3) Locally scheduling.Although these setting algo-
rithms that can be selected by users should be sched-
uled, preemptive scheduling model can not be used
should be paid attention.
4) Update scheduling problem.When incidents of new
scheduling cycle started or repeated have perceived,
relevant tasks are collected. The problem of a new
dependent task scheduling is formed and the time
has been set for the machines. If all the tasks sched-
uling have finished, then it should be terminated.
Otherwise repeat the steps 1-4 (See Figure 2).
Carter and others proposed a CIGS (Communication-
Inclusive of GS) to consider communication of GS algo-
rithm, which support dependent task scheduling as a typi-
cal algorithm in heterogeneous distributed computer sys-
tem. The algorithm is simple and easy achieving. In CIGS,
through analyzing the dependence of task, the tasks that
these have been to meet the mandate of current ancestors
constrain should be filtered out from all those that have
not yet scheduling tasks, i.e. current “task independence”.
The actual scheduling of sequence and matching sched-
uling are limited to “independence mandate”. With these
First:
Formed with the
prior ity scheduling
constrains
Second:
The failed tas k of
filtering
Third:
The re al iz at i on of the
unconstraint of a
smal ler pr iority
scheduling
Fouth:
Detecti on o f
re-scheduling
events
Figure 2. GS overview.
new task scheduling and execution, the dependence tasks
possibly become the current “independence mandate”,
which are scheduled until all tasks are assigned. The
number of frequency filters has directly influence on all
tasks of expected completion time. The task of makespan
firstly filtered out as preparation time of all machine,
priority will be ensured with an orderly implementation.
However, overall makespan have increased. Its perform-
ance could be improved by reducing the number of filters
and makespan can be reduced by optimizing the prepara-
tion time of machinery. The task of scheduling at any
time or has been or is being implemented in the pro- cess
of scheduling. When CIGS map tasks to a certain re-
sources node, it could adopt an option heuristics algo-
rithm. Including all relevant data transmission to the host
node and the time cost of implementation in the host
nodes will be considered in every mapping.
Efficiency of node in the dimension of time or space
in grid computing is dynamic change. To allow CIGS
can effectively deal with grid environment, the need of
related improvement should be done. Otherwise, it would
not be appropriate to improve its performance, such as
lowering span, raising the utilization rate of resources, etc.
4. The Improvement of Task Scheduling
Algorithm CIGS
In order to solve the above problem, the paper presents
an improved algorithm CIGS to adapt computational grid
environment. In the scheduling process, by making a
judgment, if there is a 0 outdegree task in SI, it will be
filtered to form a new set of BSI. Each of BSI tasks is not
allowed to schedule in the current scheduling cycle, that
is to say, BSI, such as a free buffer zone. Similarly, its
forecast by the data transmission time, the time of ex-
pectant execution for all ancestors task and preparation
time of machines will be determinant factors of schedul-
Copyright © 2011 SciRes. IJCNS
L. YU ET AL.229
ing, thus better performance can be acquired.
4.1. CIGS Algorithm with Improved Constrain
Conditions and Definition of Terms
Improved constrain condition includes:
1) An application (the overall mission) can be divided
into many tasks. Dependent relationship may exist
between these tasks. These restrictions that ances-
tors meet the conditions are not time-varying, but
they would be able to describe by graph DAG.
2) A node machine operates only one task at the same
time and the task will not be seized by others.
3) Heuristics would be the same in every cycle. When
data transmit, the communicated cost of transmis-
sion will be used as one of decisions for scheduling.
4) Although the performance and status of network
and nodes are dynamic changes in grid environ-
ment, they can be accurately predicted at a time in
scheduling. Execution of tasks and the communi-
cated cost of transmission also can be predicted.
5) All tasks scheduled to nodes usually can be im-
plemented. If a node failure at any time, its task
will automatically enter a new scheduling problem
sets in next cycle. Checkpoint mechanisms, move-
ment and error recovery mechanisms are not be
supported in scheduling.
The improved definition of terms:
Improved CIGS algorithm for the main definition of
terms described. Decomposition has been described here
is the application which is divided into N tasks that there
are part or all dependent relationship between.
1) N is the total number of all tasks in grid scheduling
application, set S is scheduling tasks.

0, 1, ,1SS SSN.
2) M involved in matching scheduling in grid is the
number of node machines, machines based sched-
uling for the pool H,

0,1,,1 ,2HH HHMM.
3) SI is a task set, which all the elements of can be
executed when its all ancestors have completely
implemented. It is dynamic set in scheduling.
4) BSI is a subset of SI, and exists in scheduling cycle.
Every element is not predecessor for any unexe-
cuted mission. It is filtering through from the cur-
rent SI that outdegrees of all tasks is 0.
5) R is expectation time for node machine in a grid
environment.

0,1, ,1RR RRM,
RJ
is node or machine
H
j expectations for period.
6)

00,01, ,11,EEEEN M
Ek j is expressed as expected execution time
of
Sk task in machinery
H
j.
7) D is data-transmission capacity of the matrix size in
nodes operating for task.
00, 01,DD D
,1DNM 1
,
Dk j said for the im-
plementation of mandate
Sk in machine, which
are essential for the quantity of data transmission.
If
Sk execute in their own data sets
H
j, the
0Dk j
.
8) TC execute in expected data transmission time
matrix of nodes or tasks,
000,TC TC
001, ,1TCTC NM1 1M
,
TC kij stands the necessary expected time
of data transmission for task
Sk between node
H
i and node
H
j.
9) C is the time matrix expected to completion of task,
00,01, ,11CCCCN M
,
Ck j is expressed as expected completion time
of
Sk task in node or machinery
H
j.
10) LC and LM stand for complete time of expected
execution in a machine for task, and they are a tem-
porary set.
0,1, ,1LCLCLCLC N
,
and
0,1, ,1LMLMLMMLM
.
11) Makespan is complete time for entire tasks. It means
that all the biggest complete time of task can be ex-
pressed as following:


1
0
max
N
k
M
akespanLC k
12) TR is dependent matrix for the task. Assuming the
dependence
TRkt of task between
Sk and
St. If
1k tTR
,
Sk depend on
St. Only
if
Sk is completely executed,
St will be exe-
cuted. Symbol “” is defined as dependence, i.e.
if
St Sk,
Sk depend on
St.
Sk is
successor (or son) of
St
,
St is pilot (or fa-
ther) of
Sk.
13) DAG ,VU (Direct Acyclic Graph, DAG), the
node
Vk stands for task
Sk, the direct edge
,Ukt that is from
Vk to
Vt said for depen-
dence of
Sk and
St.
14) ID is an indegree set of all node tasks, OD is an
outdegree set.
0,1, ,1IDIDIDID N
,
1, ,1ODOD N0,OD OD
.
4.2.The Computing of Indegree and Outdegree
for DAG
DAG node indegree is the number of precursor to nodes
and node indegree can be acquired by computing the
precursor of a node. The method of computing precur-
Copyright © 2011 SciRes. IJCNS
L. YU ET AL.
230
sors is simple. According to the relationship matrix be-
tween tasks, the precursor of a node is corresponding to
the number of line elements “1” in matrix. The indegree
of a node is sum of the line elements.
The method is computing the number of successors.
According to the relationship matrix between tasks, the
successor of a node is corresponding to the number of
array elements “1” in matrix. The outdegree of a node is
sum of the array elements. Indegree
I
Dk and outde-
gree
OD k of node
Vk can be expressed by for-
mula as follows:
 
1
0
N
i
I
DkTRk i
 
1
0
N
i
ODkTR ik
4.3. The Assigned Priority of DAG Node
Filtering out non-reliance tasks from set S of dependence
tasks can be used in various ways. The paper adopts filter
method by priority. Based on the filtering method of pri-
ority can guarantee the same priority tasks with non-
reliance relationship. The priority of node
Vk and task
Sk can be acquired by formula as follows:
max 1TPkTP i
Si PSk
There are main steps for assigned priority number of
DAG node:
1) Labeling all nodes as Unassigned and putting all
nodes in an unassigned set US.
2) The source node is assigned as the highest priority
level 0.
3) Labeling nodes that have been assigned priority as
Assigned. Deleting it from US, the nodes will be
put into an assigned node set AS. Is US empty? If
it is empty, go to step 7. Otherwise, go to step 4.
4) Taking a node form unassigned priority nodes.
5) If pilots have been assigned priority number, the
highest priority number P should be found in pilot
node. Otherwise, go to step 4.
6) The priority number should be assigned as P + 1,
go to step 3.
7) Getting the largest priority number Pmax of all tasks
and assigning each individual task to priority num-
ber Pmax.
8) Algorithm end.
4.4. The Key Steps of Improving Algorithm
After completing above computing work, CIGS sched-
uling algorithm is improved in accordance with the fol-
lowing order. Scheduling summarized as follows:
1) Finding out all nodes that priority number is 0 in
AS (Filtering out all source nodes from DAG), and
these tasks will be set into the tasks set SI. Search-
ing each task
SI k that belongs to SI, only if
0OD k
. This task will be filtered out and put
into the task set BSI. To be known or to identify the
distribution of data sets, nodes and network status
information can be acquired by grid tools. Updat-
ing H and R, if scheduled task is
Sk to every
machine, situation of operation will be forecasted.
Obtaining the corresponding expected com- pletion
time and it is simultaneously recorded in
LMj
and
LC k.
2) According to validity machine situation of set H,
the mandate of task set SI is scheduled by using
certain heuristics algorithm (Min-Min or Max-Min
algorithm). But all mandates of task set BSI will be
overlooked for the time being.
3) All task that its priority number is 1 will be filtered
out from set AS and these tasks are put into SI. The
mandate
SI k which its
0 is filtered
out from SI and these are put into set BSI. Assum-
ing each task
OD k
Sk which can be scheduled and
executed in machine
H
i, algorithm will calculate
its expected completion time in accordance with
the following formula. Updating H and R, all tasks
are scheduled by expected completion time matrix
C in set SI. And use the same algorithm as step (2).
4) Updating set H and R, number of validity machines
(expressed as N2) is counted in set H. All key tasks
of set i.e.
Sx are statistical calculated and
1PTP x
. Only if 12NN, the algorithm
will go to step (6).
5) According to (3) the method of calculating its ex-
pected completion time for each task of set BSI, all
tasks will be scheduled in set BSI in accordance
with time matrix of expected completion. The com-
pleted scheduling task will be taken out from BSI,
which used (2) the same algorithm.
6) Executing P = P + 1, algorithm is executed by cy-
cle from (3) to (5) until all tasks have been sched-
uled in the pool SI and BSI.
5. Simulation Test
We use a grid prototype system of computing visualiza-
tion to test the performance of scheduling algorithm.
LAN A and B bandwidth is 10 M and 100 M in this sys-
tem, which are connected with 10 M bandwidth. We
have selected a specific application tasks in the areas of
computational fluid dynamics. All tasks exist depend-
Copyright © 2011 SciRes. IJCNS
L. YU ET AL.
Copyright © 2011 SciRes. IJCNS
231
ence in applications and their size is random. CIGS compared with CTGS algorithm is always small.
In other word, improved CIGS algorithm can effectively
scheduled for dependent task and management perform-
ance is improved in computing grid environment.
Three experiments have established.
1) Three machines are selected in LAN grid environ-
ment B, implementation of dependent ten tasks will
be tested. 6. Conclusions
2) One hundred and forty tasks will be tested in seven
machines, five machines are chosen in B LAN grid
environment and two machines are chosen in A
LAN grid environment.
Algorithm research of task scheduling is one of the key
techniques in grid computing. The paper has analyzed
scheduling problem that composed of numbers of de-
pendent tasks. There is in-depth analysis for the typical
task scheduling algorithm GS in heterogeneous systems
and the CIGS algorithm considered the cost of commu-
nication. The paper proposed an improved CIGS algo-
rithm that is suitable for computing grid. The corre-
sponding experiments show that improved CIGS algo-
rithm can improve performance of execution.
3) One hundred and sixty-five tasks will be testes in 9
machines, seven machines are chosen in B LAN
grid environment and two machines are chosen in
A LAN grid environment.
Its scheduling algorithm has used by CIGS-Greedy,
CIGS-Min-Min, improved CIGS-Greedy, and improved
CIGS-Min-Min. According to the performance of span
makespan, test results are shown below in bargraph:
According to the bargraph (Figure 3), when the num-
ber of dependent tasks has increased, the span reduces
obviously. In contrast, the number of tasks is less, the
span between them become small. When the core match-
ing algorithm is adopted, the span of improved algorithm
7. References
[1] M. Shang, S. Sun, et al., “An Efficient Parallel Schedul-
ing Algorithm of Dependent Task Graphs,” Proceedings
of the 4th International Conference on Parallel and Dis-
tributed Computing, Applications and Technologies, Cheng-
du, 27-29 August 2003, pp. 595-598.
doi:10.1109/PDCAT.2003.1236372
[2] M. Wu, W. Shu, et al., “Efficient Local Search for DAG
Scheduling,” IEEE Transactions on Parallel and Distrib-
uted Systems, Vol. 12, No. 6, 2001, pp. 617-627.
[3] R. S. Oliveira and J. S. Fraga, “Fixed Priority Scheduling
of Tasks with Arbitrary Precedence Constraints in Dis-
tributed Hard Real-Time Systems,” Journal of Systems
Architecture, Vol. 46, No. 9, 2000, pp. 991-1004.
[4] B. R. Carter, D. W. Watson, et al., “Generational Sched-
uling for Dynamic Task Management in Heterogeneous
Computing Systems,” Journal of Information Sciences,
Vol. 106, No. 1, 1998, pp. 219-236.
Figure 3. Scheduling performance comparison.