For a parallel computer system with m identical computers, we study optimal performance precaution for one possible computer crash. We want to calculate the cost of crash precaution in the case of no crash. We thus define a tolerance level r meaning that we only tolerate that the completion time of a parallel program after a crash is at most a factor r + 1 larger than if we use optimal allocation on m - 1 computers. This is an r-dependent restriction of the set of allocations of a program. Then, what is the worst-case ratio of the optimal r-dependent completion time in the case of no crash and the unrestricted optimal completion time of the same parallel program? We denote the maximal ratio of completion times f(r, m) - i.e., the ratio for worst-case programs. In the paper we establish upper and lower bounds of the worst-case cost function f (r, m) and characterize worst-case programs.

Source : oai:HAL:hal-00990570v1

Volume: Vol. 14 no. 1

Section: Distributed Computing and Networking

Published on: March 23, 2012

Submitted on: April 18, 2011

Keywords: parallel computer,scheduling,computer crash,load balancing,process allocation,optimization,[INFO.INFO-DM] Computer Science [cs]/Discrete Mathematics [cs.DM]

This page has been seen 45 times.

This article's PDF has been downloaded 78 times.