Analysis of an Approximation Algorithm for Scheduling Independent Parallel Tasks

. In this paper, we consider the problem of scheduling independent parallel tasks in parallel systems with identical processors. The problem is NP-hard, since it includes the bin packing problem as a special case when all tasks have unit execution time. We propose and analyze a simple approximation algorithm called ✁✄✂ , where ☎ is a positive integer. Algorithm ✁✄✂ has a moderate asymptotic worst-case performance ratio in the range ✆✞✝✠✟✡☞☛✌☛ ✝✎✍ ✡ ✍✑✏ ✒ for all ☎✔✓✖✕ ; but the algorithm has a small asymptotic worst-case performance ratio in the range ✆✗✝✙✘✚✝✜✛✣✢✥✤✠✘✚✝✜✦ ☛✞☛ ✝✙✘✚✝✜✛✜✤ ✒ , when task sizes do not exceed ✝✜✛✜✤ of the total available processors, where ✤★✧✩✝ is an integer. Furthermore, we show that if the task sizes are independent, identically distributed (i.i.d.) uniform random variables, and task execution times are i.i.d. random variables with ﬁnite mean and variance, then the average-case performance ratio of algorithm ✁✪✂ is no larger than 1.2898680..., and for an exponential distribution of task sizes, it does not exceed 1.2898305.... As demonstrated by our analytical as well as numerical results, the average-case performance ratio improves signiﬁcantly when tasks request for smaller numbers of processors.


Introduction
In this paper, we consider the problem of scheduling independent parallel tasks in parallel systems with identical processors.Assume that we are given a list of ✫ tasks ✬ ✮✭ ✰✯ ✲✱ ✴✳ ✶✵ ✷✱ ✹✸ ✶✵ ✻✺ ✼✺ ✽✺ ✼✵ ✾✱ ✹✿ ❁❀ .Each task ✱ ✹❂ is specified by its execution time ❃ ✻❂ , and its size ❄ ☞❂ , i.e., ✱ ✹❂ requires ❄ ☞❂ processors to execute.There are ❅ identical processors, and any ❄ ☞❂ processors can be allocated to ✱ ✹❂ .Once ✱ ✹❂ starts to execute, it runs without interruption until it completes.Tasks in ✬ are mutually independent, that is, there is no precedence constraint nor data communication among the ✫ tasks.The problem addressed here is to find a nonpreemptive schedule of ✬ such that its makespan (i.e., the total execution time of the ✫ tasks) is minimized.
The problem is NP-hard, since it includes the bin packing problem as a special case when all tasks have unit execution time.Therefore, one practical way to solve this problem is to design and analyze approximation algorithms that produce near optimal solutions.Let ❆ ❇✯ ❈✬ ❉❀ be the makespan of the schedule ❊ The author can be reached at email: li@mcs.newpaltz.edu,phone: (914) 257-3534, fax: (914) 257-3571.is called the asymptotic worst-case performance ratio of algorithm ❆ .If there exist two constants ✐ and • such that for all ✬ , ❆ ❇✯ ✑✬ ✄❀ ✪❦ •✐ ♥♠ ♦• ■❍ ❉❏ ▲✯ ✑✬ ❉❀ ❲♣ q• ✴❃ ✎r , where ❃ ✎r s✭ t❚ ✈✉ ✶✇ ①✳ ③② ④❂ ✑② ④✿ ❴✯ ✑❃ ✻❂ ⑤❀ is the longest execution time of the ✫ tasks, then ▼ ◆ ❦ ⑥✐ , and ✐ is called an asymptotic worst-case performance bound of algorithm ❆ .Moreover, if for any small ⑦ ❤⑧ (⑨ and all large ⑩ ❶⑧ •⑨ , there exists a list ✬ , such that • ■❍ ❑❏ ▲✯ ❈✬ ❉❀ ✄❷ ❸⑩ , and ❆ ❇✯ ✑✬ ✄❀ ✪❷ ❹✯ ✑✐ ✖❺ ❻⑦ ③❀ ❼• ■❍ ❑❏ ▲✯ ✑✬ ✄❀ , then the bound ✐ is called tight, i.e., ▼ ◆ ✭ ❽✐ .When task sizes and execution times are random variables, both ❆ ❇✯ ✑✬ ✄❀ and • ■❍ ❉❏ ▲✯ ✑✬ ❉❀ We notice that our scheduling problem defined above looks similar to but is quite different from the two dimensional rectangle packing problem [1,2,7,9], where each task ✱ ❂ is treated as a rectangle with width ❄ ❂ and height ❃ ❂ .The rectangle packing model implies that processors should be allocated in contiguous groups.That is, the ❅ processors have indices 1, 2, 3, ..., ❅ , and task ✱ ❂ must be allocate ❄ ❂ processors with indices ➌ , ➌ s♣ (➍ , ..., ➌ ★♣ ➎❄ ❂ ❺ •➍ for some ➌ .Such a scheduling problem arises in parallel systems like linear arrays.The rectangle packing problem has been extensively studied, where a complicated algorithm with asymptotic worst-case performance ratio as low as 1.25 has been found [1].However, contiguous processor allocation is not required in our model, where any ❄ ☞❂ processors can be allocated to ✱ ✹❂ .Our problem could be regarded as a resource constraint scheduling problem [3,6], where the resource is a set of processors.It has applications in parallel computing systems such as symmetric shared memory multiprocessors and distributed computing systems such as bus-connected networks of workstations.In these systems, a processor allocation mechanism is independent of the topology of an interconnection network.Another related problem is scheduling malleable tasks which has also been investigated in the literature [8,10].In that problem, each task requests for several possible numbers of processors, i.e., a task has adjustable size, and for each size, an execution time is also specified.The problem has several variations depending on different ways in which the execution time of a task changes with the number of processors allocated to it, and the performance measures to be optimized (e.g., makespan, average flow time).The problem we consider here is to schedule nonmalleable tasks with noncontiguous processor allocation.Even though the complicated algorithm in [1] for rectangle packing can also be applied to solve our problem, we propose and analyze a simple approximation algorithm called ➏ ★➐ , where ➑ is a positive inte- ger.Algorithm ➏ ➐ has a moderate asymptotic worst-case performance ratio (➍ ➒✺ ➓ ➒➓ ✙➓ ❁✺ ✼✺ ✼✺ ➔❦ ▼ ◆ → ❲➣ ❦ ↔➍ ➒✺ ✗↕ ✙➙ ✙➙ ❁✺ ✽✺ ✼✺ for all ➑ ➛❷ ↔➓ ); but the algorithm has a small asymptotic worst-case performance ratio ➍ ❤♣ ↔➍ ➃➜ ❁✯ ✲➝ ★♣ ↔➍ ➞❀ s❦ ▼ ◆ → ➣ ❦ ✮➍ s♣ ➟➍ ➃➜ ✶➝ , when task sizes do not exceed ❅ ❸➜ ✶➝ , where ➝ ➋⑧ ➠➍ is an integer.We notice that the capability to deal with small tasks is important in real applications since many task sizes are relatively small as compared with the system size so that a large scale parallel system can be shared by many users simultaneously.However, it is not clear whether the algorithm in [1] has such capability.Furthermore, the simplicity of our algorithm allows us to conduct average-case performance analysis.In particular, we show that if the numbers of processors requested by the tasks are independent, identically distributed (i.i.d.) random variables uniformly distributed in the range ➡ ✽➍ ✙✺ ✼✺ ❅ ↔➢ , and task execution times are i.i.d.ran- dom variables with finite mean and variance, then ❾ ▼ ◆ → ✹➣ ❦ ➤➍ ➒✺ ➙ ✙➥ ✙➦ ➒➥ ✙➓ ✙➥ ➒⑨ ❁✺ ✼✺ ✽✺ , i.e., ➀ ✯ ✑➏ s➐ ➂✯ ❈✬ ❉❀ ✷❀ ✷➜ ➀ ✯ ⑤• ■❍ ❑❏ ❖✯ ❈✬ ❉❀ ✷❀ is asymptotically bounded from above by 1.2898680... as ✫ ↔➈ ➧➉ .For an exponential distribution of task sizes, we have ❾ ▼ ◆ → ✹➣ ❦ ➠➍ ✙✺ ✗➙ ✣➥ ➒➦ ✙➥ ✙➨ ➒⑨ ➒➩ ❁✺ ✽✺ ✼✺ .As demonstrated by our analytical as well as numerical re- sults, the average-case performance ratio improves significantly when tasks request for smaller numbers of processors.We notice that there is lack of such results on probabilistic algorithm analysis, especially in multi-dimensional cases [4].The average-case performance of the algorithm in [1] is unknown.
The rest of the paper is organized as follows.We present algorithm ➏ ➐ in Section 2. The worst-case performance of the algorithm is analyzed in Section 3, and its average-case performance is studied in Section 4. Finally we give a summary in Section 5.
Such an allocation can be implemented using, for example, the list scheduling algorithm [4].Suppose Upon the completion of a task ✱ ➵ ➴ , the first unscheduled task in ✬ ✄➵ , i.e., ✱ ➵ ➵ ③➬ ✳ , is removed from ✬ ✄➵ and scheduled to execute on ➘ ➴ .This process repeats until all tasks in ✬ ❉➵ are finished.Then algorithm H➐ begins the scheduling of next sublist ✬ ❉➵ ✜➬ ✳ .
For tasks with small sizes, algorithm ➏ ★➐ exhibits much better performance due to increasing processor utilization, as claimed by the following theorem.

Probabilistic Analysis
Now let us consider the average-case performance of algorithm H➐ .For convenience, we assume the task sizes are normalized such that ⑨ ➭➼ •❄ ➔❂ ë❦ ❽➍ , and that the ❄ ☞❂ 's are independent, identically distributed (i.i.d.) random variables with a common probability density function î ï✯ ✑ae ④❀ in the range ✯ ❈⑨ ❁✵ ♦➍ ✜➢ .Our assumption on the task execution times is quite general, i.e., the ❃ ♦❂ 's are i.i.d.random variables with mean ð and variance ñ ✸ , where ð and ñ are any finite numbers independent of ✫ .The probability distributions of task sizes and execution times are independent of each other.

Conclusions
We have studied the problem of scheduling independent nonmalleable parallel tasks in parallel systems with identical processors.We proposed a simple approximation algorithm called ➏ ★➐ , and performed combinatorial analysis for its worst-case performance and probabilistic analysis for its average-case performance.In particular, we proved the following results.(1) The asymptotic worst-case performance ratio ▼ ◆ → ❲➣ is in the range ➡ ✽➍ ✸ Ð ✺ ✽✺ ✼➍ ✳ Ð ✳ ✾è ➢ . (2)If the numbers of processors requested by the tasks are uniformly distributed i.i.d.random variables and task execution times are i.i.d.random variables with finite mean and variance, then the average-case performance ratio is ❾ ▼ ◆ → ✹➣ ❦ Þ➍ ➒✺ ➙ ✙➥ ✙➦ ➒➥ ✙➓ ✙➥ ➒⑨ ❁✺ ✼✺ ✽✺ .In other words, less than 22.5% of the allocated computing power is wasted.(3) Both the worst-and average-case performance ratios improve significantly when tasks request for smaller numbers of processors.(4) Results similar to (2)-(3) also hold for the truncated exponential distribution of task sizes.