Speed Scaling for Maximum Lateness 3 processor problem with common release dates and a strongly NP-hardness proof for arbitrary release dates. Moreove...

0 downloads 0 Views 367KB Size

Speed Scaling for Maximum Lateness Evripidis Bampis · Dimitrios Letsios · Ioannis Milis · Georgios Zois

the date of receipt and acceptance should be inserted later

Abstract We consider the power-aware problem of scheduling non-preemptively a set of jobs on a single speed-scalable processor so as to minimize the maximum lateness, under a given budget of energy. In the offline setting, our main contribution is a combinatorial polynomial time algorithm for the case in which the jobs have common release dates. In the presence of arbitrary release dates, we show that the problem becomes strongly N P-hard. Moreover, we show that there is no O(1)-competitive deterministic algorithm for the online setting in which the jobs arrive over time. Then, we turn our attention to an aggregated variant of the problem, where the objective is to find a schedule minimizing a linear combination of maximum lateness and energy. As we show, our results for the budget variant can be adapted to derive a similar polynomial time algorithm and an N Phardness proof for the aggregated variant in the offline setting, with common and arbitrary release dates respectively. More interestingly, for the online case, we propose a 2-competitive algorithm. Keywords Energy efficiency · Speed scaling · Scheduling · Maximum lateness 1 Introduction In classical scheduling an important measure of the Quality of Service (QoS) of a schedule is the maximum lateness [8]. Every job, among other characteristics, is associated with a due date and A preliminary version of this paper has been published in the Proceedings of the 18th Annual International Computing and Combinatorics Conference (COCOON 2012). E. Bampis and D. Letsios were partially supported by the French Agency for Research under the DEFIS program TODO, ANR-09-EMER-010, and by GDR-RO of CNRS. I. Milis was partially supported by the project THALESALGONOW co-financed by the European Union (European Social Fund - ESF) and Greek national funds, through the Operational Program “Education and Lifelong Learning”. G. Zois was supported by Heracleitus II program co-financed by the European Social Fund (ESF) and Greek national funds, through the Operational Program “Education and Lifelong Learning”. E. Bampis Sorbonne Universit´ es, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France E-mail: [email protected] D. Letsios Sorbonne Universit´ es, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France E-mail: [email protected] I. Milis Dept. of Informatics, Athens University of Economics and Business, Greece E-mail: [email protected] G. Zois Sorbonne Universit´ es, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France and Dept. of Informatics, Athens University of Economics and Business, Greece E-mail: [email protected]

2

Evripidis Bampis et al.

the lateness of a job, with respect to a particular schedule, is defined as the difference of the job’s completion time minus its due date, while the maximum lateness is computed as the maximum over all jobs. In this paper, we propose to optimize this QoS objective in the context of power management, where the operating system may change the speed of the processor(s) in order to save energy. In general, high processor speeds imply high performance with respect to the QoS criterion (here the maximum lateness) at the price of high energy consumption. Formally, an instance of our problem consists of a set of n jobs J = {1, 2, . . . , n}, where every job i is associated with a release date ri , a work wi and a delivery time qi , that have to be executed non-preemptively on a single speed-scalable processor. Note that in this setting, where jobs are associated with delivery times instead of deadlines, different jobs may be delivered simultaneously. For a given schedule the lateness of job i is defined as Li = Ci + qi , where Ci is the completion time of job i and the maximum lateness is defined as Lmax = max1≤i≤n {Li }. Jobs that attain the maximum lateness in a schedule are referred as critical jobs. At a given time t, if a processor runs at speed s, then its power consumption is P (s) = sα , where α > 2 is a constant. By integrating the power over time we can compute the processor’s energy consumption. That is, if a processor operates at a constant speed s, it executes an amount of work w in w/s time units and consumes an amount of energy E = wsα−1 . As maximum lateness minimization and energy savings are conflicting objectives, we consider two variants: In the, so called, budget variant, we aim in minimizing Lmax = maxi∈J {Li } for a given budget of energy. Using the classical three field notation [10], we denote such a problem by S1 | ri | Lmax (E), where S1 denotes a single speed scalable processor. In the second approach, that we call aggregated variant, our objective is to minimize a linear combination of maximum lateness and energy, that is S1 | ri | Lmax +βE, where β ≥ 0 is a given parameter that specifies the relative importance of energy versus maximum lateness (see [4] for a motivation of the aggregated approach). In this context, a schedule σ has to specify for every job the time interval during which it is executed as well as its speed over time. It is well known, e.g. [16], that there is an optimal schedule where each job i is executed at a constant speed; this is a consequence of the convexity of speed-to-power function.

Related work and our results. Yao, Demers and Shenker, in their seminal paper [16] proposed an optimal polynomial time algorithm for finding a feasible preemptive schedule on a single processor for a set of jobs with release dates and deadlines minimizing the energy used. They also proposed two online algorithms for the same problem (OA and AVR). Bunde [6] studied the budget variant of the non-preemptive makespan minimization problem for the single-processor case and the multiple processor case with jobs of unit work. He also proved the N P-hardness of the multiprocessor case whenever the jobs have arbitrary works. Pruhs et al. [14] studied the budget variant of the non-preemptive multiprocessor makespan minimization problem in the presence of precedence constraints, and proposed an approximation algorithm. They also gave a PTAS for the case with no precedence constraints. Albers et al. [3] were the first to consider an aggregated variant for a power-aware scheduling problem by studying online and offline versions of the non-preemptive problem of minimizing the sum of flow times of the jobs plus energy, with jobs of unit work. The flow time of a job is defined as the difference between its completion time and its release date. It has to be noticed that Pruhs et al. [13] have studied the budget variant of this problem. Bansal et al. [4] proved that there is no O(1)-competitive algorithm, for the budget variant, even if all jobs have unit works. The interested reader may find recent reviews on power-aware scheduling in [1, 2]. In this paper we consider the maximum lateness criterion in the power-aware context. For the budget variant we propose an optimal algorithm for the non-preemptive single processor case with common release dates, while in Section 3 we prove that the problem, in the presence of release dates becomes strongly N P-hard and it does not admit any O(1)-competitive deterministic algorithm. In Section 4, we move to the aggregated variant, and we give an optimal algorithm for the single

Speed Scaling for Maximum Lateness

3

processor problem with common release dates and a strongly N P-hardness proof for arbitrary release dates. Moreover, we propose a 2-competitive algorithm for the latter case.

2 Budget variant with common release dates In this section we present a polynomial-time algorithm for the S1 | | Lmax (E) problem. Our algorithm is based on a number of structural properties of an optimal schedule, deduced by formulating our problem as a convex program and applying the KKT (Karush, Kuhn, Tucker) conditions.

2.1 General form of KKT conditions Next, we describe the general form of the KKT conditions for convex programs (see e.g., [5]). Assume that we are given the following convex program: min f (x) gi (x) ≤ 0,

hj (x) = 0, x ∈ Rn

1≤i≤q

1≤j≤r

Suppose that the program is strictly feasible, i.e. there is a point x such that gi (x) < 0 and hj (x) = 0 for all 1 ≤ i ≤ q and 1 ≤ j ≤ r, where all functions gi and hj are differentiable at x. Let λi and µj be the dual variables associated to the constraints gi (x) ≤ 0 and hj (x) = 0, respectively. The Karush-Kuhn-Tucker (KKT) conditions are:

gi (x) ≤ 0,

hj (x) = 0, λi ≥ 0,

λi gi (x) = 0, ∇f (x) +

q X i=1

λi ∇gi (x) +

r X j=1

µj ∇hj (x) = 0

1≤i≤q

(1)

1≤i≤q

(3)

1≤j≤r

(2)

1≤i≤q

(4) (5)

KKT conditions are necessary and sufficient for solutions x ∈ Rn , λ ∈ Rq and µ ∈ Rr to be primal and dual optimal, where λ = (λ1 , λ2 , . . . , λq ) and µ = (µ1 , µ2 , . . . , µr ). We refer to the conditions (1) and (2) as primal feasible, to the (3) as dual feasible, to the (4) as complementary slackness and to the (5) as stationarity conditions, respectively.

2.2 A convex programming formulation A convex programming formulation of our problem stems from two basic properties of an optimal schedule. First, because of the convexity of the speed-to-power function, each job i runs at a constant speed si . Second, jobs are scheduled according to the EDD (Earliest Due Date First) rule, or equivalently in non-increasing order of their delivery times; this can be easily shown by a standard exchange argument. Hence, we propose the following formulation where all jobs are considered to be released at time zero and numbered according to the EDD order:

4

Evripidis Bampis et al.

min L Ci + qi ≤ L, w1 ≤ C1 , s1 wi Ci−1 + ≤ Ci , si n X wi siα−1 ≤ E

1≤i≤n

(6) (7)

2≤i≤n

(8) (9)

i=1

L, Ci , si ≥ 0,

1≤i≤n

(10)

Our objective is to minimize the maximum lateness, L, among all feasible schedules. Constraints (6) ensure that the lateness of each job is at most L, constraints (7) and (8) enforce the jobs to be scheduled according to the EDD rule in non-overlapping time intervals, constraint (9) does not allow to exceed the given energy budget E and constraints (10) ensure that the maximum lateness, the completion times and the speeds of jobs are non-negative. Constraint (9), for α > 2, and constraints (7) and (8) are convex, while constraints (6) and (10) and the objective function are linear. Thus, our mathematical program is indeed convex. This convex program already implies a polynomial algorithm for our problem, as convex programs can be solved to arbitrary precision by the Ellipsoid algorithm [12]. Since the Ellipsoid algorithm is rather impractical, we will exploit this convex program to derive a fast combinatorial algorithm.

2.3 Properties of an optimal schedule In what follows we deduce a number of structural properties of an optimal schedule by applying the KKT conditions to the above convex program.

Lemma 1 For the maximum lateness problem with an energy budget E, the following properties are necessary and sufficient for optimality of a feasible schedule. (i) Each job i runs at a constant speed si . (ii) Jobs are scheduled according to the EDD rule. (iii) There are no idle periods in the schedule. (iv) The last job is critical, i.e., Ln = Lmax . (v) Every non-critical job i has equal speed with the job i + 1, i.e., si = si+1 . (vi) Jobs are executed in non-increasing speeds, i.e., si ≥ si+1 . (vii) All the energy budget is consumed.

Proof In order to apply the KKT conditions to the convex program, we associate to each set of constraints from (6) up to (9), dual variables βi , γ1 , γi , δ, respectively. W.l.o.g. the variables L, Ci and si are positive and, by the complementary slackness conditions, the dual variables associated to the constraints (10) are equal to zero.

Speed Scaling for Maximum Lateness

5

Stationarity conditions give that ∇L + +

n X i=2

n X i=1

βi ∇(Ci + qi − L) + γ1 ∇(

γi ∇(Ci−1 + (1 −

+(βn − γn )∇Cn +

n X

n X wi − Ci ) + δ∇( wi sa−1 − E) = 0 ⇒ i si i=1

βi )∇L +

i=1 n X i=1

w1 − C1 ) s1

n−1 X i=1

(βi − γi + γi+1 )∇Ci

a−2 (−γi wi s−2 )∇si = 0 i + (a − 1)δwi si

Equivalently, we obtain the following equalities. n X

βi = 1

(11)

i=1

βi = γi − γi+1 1 ≤ i ≤ n − 1

βn = γ n γi (α − 1)δ = α si

1≤i≤n

(12) (13) (14)

The complementary slackness conditions give that βi (Ci + qi − L) = 0 1 ≤ i ≤ n w1 γ1 ( − C1 ) = 0 s1 wi γi (Ci−1 + − Ci ) = 0 2 ≤ i ≤ n si X n α−1 δ wi si −E =0

(15) (16) (17) (18)

i=1

First, we will show that the properties are necessary for optimality. That is, there is always an optimal schedule satisfying them. (i)-(ii) They have been already discussed above. (iii) First, note that δ 6= 0. If δ = 0 then Pnby (14), we get that γi = 0 for each 1 ≤ i ≤ n. This, combined with (12) and (13) yields that i=1 βi = 0, which is a contradiction because of (11). Since δ 6= 0, we get by (14) that γi 6= 0 for each 1 ≤ i ≤ n. Then, equations (16) and (17) give that there is no idle time in any optimal schedule since C1 = ws11 and Ci = Ci−1 + wsii , for 2 ≤ i ≤ n, respectively. (iv) Since δ 6= 0, by (14), it follows that γn 6= 0 and finally, because of (13), βn 6= 0. So, the last job to finish is always a critical job, by (15). (v) Note that for every non-critical job i, it holds that Ci + qi < L and (15) implies that βi = 0 for every such job. Hence, if a job i is non-critical βi = 0 ⇒ γi = γi+1 ⇒ si = si+1 , by (12) and (14), respectively. (vi) By the dual feasibility conditions and the equations (12) and (14) we get, respectively, that βi ≥ 0 ⇒ γi ≥ γi+1 ⇒ si ≥ si+1 . Thus, the jobs are executed with non-increasing speeds. (vii) If the energy budget is not entirely consumed, then by (18), δ = 0, which is a contradiction, since, as we have already proved, δ 6= 0. Next, we will show that the properties are also sufficient for optimality. That is, any feasible schedule satisfying them must be optimal. In order to show this, it suffices to prove that, given any feasible schedule satisfying the properties, we can always give values to the dual variables such that the KKT conditions are satisfied.

6

Evripidis Bampis et al.

Consider a feasible schedule and let si and Ci be the speed and the completion time of the job i, 1 ≤ i ≤ n, respectively. Moreover, let L be the maximum lateness of the schedule. We give values to the dual variables as follows. 1 (α − 1)sα 1 sα i γi = α , 1 ≤ i ≤ n s1 α sα − s βi = i α i+1 , 1 ≤ i ≤ n − 1 s1 sα n βn = α s1 δ=

We, now, observe that these values of the dual variables together with the values of the primal variables satisfy the KKT conditions. Note that n X i=1

βi =

n α X sα i − si+1 i=1

sα − sα sα βi = i α i+1 = iα − s1 s1

sα 1 sα i+1 sα 1

=

sα 1 =1 sα 1

= γi − γi+1 1 ≤ i ≤ n − 1

sα n = γn sα 1 1 sα 1 γi (α − 1)δ = α = iα α = α s1 s1 si si βn =

1≤i≤n

So the stationarity conditions are satisfied. Consider now a job i, 1 ≤ i ≤ n. If i is critical, then Ci + qi = L. Else, by property (v) we have that, for 1 ≤ i ≤ n − 1, sα sα si = si+1 ⇔ iα = i+1 ⇔ βi = 0 s1 sα 1 Thus, equation (15) is satisfied. By property (iii), we have that C1 = ws11 and Ci = Ci−1 + wsii , for 2 ≤ i ≤ n. Therefore, equations (16) and (17) are also satisfied. Furthermore, by property (vii), all the energy budget is consumed and the equation (18) holds. Hence, the complementary slackness conditions are satisfied. Finally, in order to complete our proof, it remains to show that the values of all the dual variables are non-negative. The only case for which this is not straightforward, is for the values of variables βi , for 1 ≤ i ≤ n − 1. But, it must be the case that βi ≥ 0 for all 1 ≤ i ≤ n − 1, because of the property (vi) and the theorem follows. We refer to any schedule satisfying the properties of Lemma 1 as a regular schedule. By Lemma 1, every optimal schedule is regular and vice versa; however, there might be feasible, but not optimal, non-regular schedules. By (i, j) we denote a sequence of consecutive jobs i, i+1, . . . , j. Any regular schedule can be partitioned into groups of jobs, of the form (i, j), where the jobs i − 1 and j are critical and the jobs i, i + 1, . . . , j − 1 are not. By Lemma 1(v), all jobs of such a group are executed at the same speed. We denote this common speed by sj and the total amount of work Pj of jobs in (i, j) by w(i, j) = k=i wk . Then, the next proposition follows easily from Lemma 1. Proposition 1 Let i, j, be two consecutive critical jobs of a regular schedule. The speed of each job in the group (i + 1, j) equals to sj = w(i+1,j) qi −qj .

Proof Assume without loss of generality that i completes before j. Since i and j are both critical, they attain equal maximum latnesses, i.e. Li = Lj . Moreover, in any regular schedule, by Lemma 1(iv), there is no idle period between jobs i, i + 1, . . . , j. Furthermore, all jobs i + 1, i + 2, . . . , j − 1

Speed Scaling for Maximum Lateness

7

are non-critical and, by Lemma 1(vi), they are all executed with speed equal to that of job j. Hence, we get, respectively, that Li = Lj ⇒ Ci + qi = Cj + qj ⇒

i X wk

k=1

sk

+ qi =

j X wk

k=1

sk

+ qj ⇒ sj =

w(i + 1, j) . qi − qj

Clearly, given that Li = Lj and Ci < Cj , it must be the case that qi 6= qj . 2.4 An optimal combinatorial algorithm So far, by proving that the properties of a regular schedule are necessary and sufficient for optimality, we have derived a clear image of the structure of an optimal schedule for the S1 | | Lmax (E) problem. Next, we propose Algorithm BUD which constructs such a schedule in polynomial time. Note that a regular schedule is fully specified by the speeds of the jobs. The rough idea of our algorithm is the following: First, it constructs a preliminary schedule by finding groups of jobs running in non-increasing speeds without taking care of the energy consumption. Second, the algorithm manages the energy consumption w.r.t. the energy budget E and determines the final speeds of all jobs. Let E 0 be the energy consumption of the current schedule at any point of the execution of the algorithm. Algorithm BUD needs the jobs to be ordered/numbered according to the EDD rule and an initial sorting step is required. Once this step is performed, it starts from job n which is always a critical job and considers all jobs but the first, in reverse order. When a job i, 2 ≤ i ≤ n, is considered for the first time, its speed si is set according to Proposition 1, assuming that jobs i − 1 and i are critical. If si ≥ sj , for i + 1 ≤ j ≤ n, then si is called eligible speed and it is assigned to wn job i; by definition, the speed sn = qn−1 −qn is considered to be eligible. If speed si is not eligible, i is a non-critical job and it is merged with the (i + 1)’s group. More specifically, if c is the last job of this group, then the speeds of jobs i, i + 1, . . . , c are calculated by applying Proposition 1, assuming that i − 1 and c are critical while i, i + 1, . . . , c − 1 are not. Next, the algorithm examines whether the new value of si is eligible. If this is the case, then it considers the job i − 1. Otherwise, a further merging of the i’s group with the (c + 1)’s group, is performed, as before. That is, if c0 is the last job of the (c + 1)’s group, all jobs i, i + 1, . . . , c0 are assigned the same speed assuming that jobs i − 1 and c0 are critical, while i, i + 1, . . . , c0 − 1 are not. This speed, according to the w(i,c0 ) Proposition 1, is equal to sc0 = qi−1 −qc0 . Note that the job c is no longer critical in this case. This merging procedure is repeated until job i is assigned an eligible speed. In a degenerate case, jobs i, i + 1, . . . , n are merged into one group. When the algorithm has assigned an eligible speed to all jobs 2, 3, . . . , n, it sets s1 = s2 and its first part completes. Note that s1 becomes also eligible. An example of the first part of our algorithm is given in Figure 1(i).

speed 2 1

speed

speed

Set s1 = s2 j1

j2

1

j1

j3 5

6

Reduce the speed of the first group to s3

8

time

(i) Lmax = 10, E 0 > E, E 0 = 50

j2 10

q

j3 12

(ii) Lmax = 16, E 0 < E, E 0 = 14

14

time

Assign energy E − E 0 to the first group 3 2

j1 8.16

j2

j3

9.79

11.79 time

(iii) Lmax = 13.79, E 0 = 20

Fig. 1 The execution of Algorithm BUD for an instance of 3 jobs, without release dates, works 10, 2, 2, delivery times 5, 4, 2, α = 3 and E = 20.

Next, Algorithm BUD takes into account the available budget of energy E. If E − E 0 ≥ 0, the current schedule’s energy consumption does not exceed the budget of energy, and the surplus

8

Evripidis Bampis et al.

E − E 0 is assigned to the first job. Otherwise, the current schedule is regular, except that it consumes an amount of energy greater than E. Then, the algorithm reduces the consumed energy until it becomes equal to E. In fact, it decreases the speed of the first group, by merging subsequent groups with it, if necessary. This merging procedure is different from the one of the first part of the algorithm and it is as follows: let i be the critical job of maximal index with si = s1 in the current schedule. Observe that si > si+1 . The algorithm sets the speed of jobs 1, 2, . . . , i equal to si+1 . This causes a reduction to E 0 and there are two cases to distinguish: either E 0 ≤ E or E 0 > E. In the first case, the algorithm adds an amount of energy E − E 0 to jobs 1, 2, . . . , i by increasing their speeds uniformly, i.e. so that they are all executed with the same speed. In the second case, at least one further merging step has to be performed. When the algorithm terminates, it is obvious that E 0 = E. For an example of the second part of our algorithm see Figures 1(ii) and 1(iii). Algorithm BUD 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16:

Sort the jobs according to the EDD order. for i = n to 2 do Set si assuming that i and i − 1 are critical. while si is not eligible do Merge the i’s group with the next group. Set s1 = s2 Let E 0 be the current energy consumption. if E > E 0 then Assign energy E − E 0 to job 1. else while E < E 0 do Set the speed of the first group equal to the speed of the following group. Update E 0 . if E < E 0 then Merge the first group with the next one. Assign E − E 0 energy uniformly to the first group.

Theorem 1 Algorithm BUD is optimal for the S1 | | Lmax (E) problem. Proof We shall prove that the algorithm satisfies the properties of Lemma 1, i.e., it produces a regular schedule. For convenience, we distinguish two parts in the algorithm: Part I, corresponding to lines 1-6 and Part II, corresponding to lines 7-16, respectively. Property (i)-(ii): The algorithm gives a single constant speed to each job and keeps their initial EDD order. Property (iii): In Part I, the speeds of jobs are assigned according to Proposition 1. Specifically, the algorithm fixes two consecutive critical jobs i and j, i < j, with, potentially, some non-critical jobs between them. Then the speed of the non-critical jobs and the one of the critical job j is defined such that there is no idle period between the jobs. In Part II, no idle period is added between any jobs. Property (iv) - (v): When the speed of job n is initialized, this is done by assuming that it is critical. Next, consider the current schedule just after the completion of Part I. This schedule can be partitioned into sequences of jobs, a + 1, a + 2, . . . , b, with a ≥ 1, such that the jobs of each sequence are executed with the same speed which has been assigned by applying Proposition 1, assuming that the jobs a and b are critical. In fact, jobs a and b attain equal lateness. In order for such a sequence to be a group, we should also prove that all but the last jobs are non-critical while the last job is critical. Let a + 1, a + 2, . . . , b be a sequence of jobs. We claim that Li < Lb , for a + 1 ≤ i ≤ b − 1. Assume, by contradiction, that there exists a job j, where a + 1 ≤ j ≤ b − 1, such that Lj ≥ Lb , or Pb i equivalently, qj − qb ≥ i=j+1 w sb . Since the last job of a sequence attains equal lateness with the Pb i last job of the sequence that follows, we have that La = Lb . This yields that qa − qb = i=a+1 w sb . Pj wi Therefore, qa − qj ≤ i=a+1 sb .

Speed Scaling for Maximum Lateness

9

wi Obviously, for any job i, a + 1 ≤ i ≤ b − 1, we must have a speed si > qi−1 −qi , since otherwise, wi it wouldn’t have been merged with another group. That is, qi−1 − qi > si . If we sum the last Pj i inequalities for a + 1 ≤ i ≤ j, we get that qa − qj > i=a+1 w sb , a contradiction. At this point, we have showed that when Part I completes, if a job i, 2 ≤ i ≤ n, is critical, then it must be the right extremity of a sequence. Moreover, among all jobs 2, 3, . . . , n, the last jobs of all sequences, including job n, attain equal lateness and the remaining jobs attain smaller lateness. In addition, job 1 attains equal lateness with the last job of the sequence that follows. Recall that, at this point, we set s1 = s2 . Job 1 would have equal lateness with the last job of the sequence that follows for any s1 > 0 since the speed of the second group is set by applying Proposition 1, assuming that 1 is critical. So, at the end of Part I, job 1, job n and every last job of a sequence are critical. Therefore, after Part I finishes, Properties (iv) and (v) hold. In Part II, if no merging step is performed, then the processing time of job 1 is decreased by some t ≥ 0 and its lateness decreases by t, while the processing times and speeds of the other jobs are not modified. So, the lateness of every other job also decreases by t. Hence, the Properties (iv) and (v) hold. If at least one merging step is performed, then the speed of the jobs in the first group decreases and their processing time increases. Then, in the first group, every non-critical job i has equal speed with the job i+1 that follows, while the speeds of the jobs in other groups remain unchanged. Now, let ti be the total increase in the processing time of job i, 1 ≤ i ≤ n. Note that this quantity is positive only for jobs belonging toPthe first group of the current schedule. Then, the lateness of i any job i, 1 ≤ i ≤ n, increases by j=1 tj ; if c1 is the critical job of the first group, it remains critical after the merging step since itsP lateness and the lateness of every other job that follows, c1 increase by the same quantity, equal to j=1 tj . Note, that if a further merging step is performed, we consider the first two groups as one group. Moreover, the lateness of any job increases by no more than the increase of the lateness of job n, and thus, in the final schedule, job n remains critical and Property (iv) holds. Furthermore, each non-critical job has equal speed with the job that follows and Property (v) holds as well. Property (vi): At the end of Part I, the speeds of jobs are non-increasing since otherwise, a merging step would be performed. Moreover, during Part II, no speed of a job becomes less than the speed of a subsequent job. Property (vii): Recall that E 0 is the total energy consumed when Part I completes. If E 0 is less than the energy budget, then the energy of the first job increases until the schedule consumes exactly E units of energy, while if E 0 is greater than the energy budget E, then the energy consumption of the schedule is gradually decreased until it becomes equal to E. Let us now consider the complexity of the algorithm. Initially, jobs are sorted according to the EDD rule in O(n log n) time. The first part of the algorithm may take O(n2 ) time since each merging step takes O(n) time and there can be O(n) merging steps. Also, the algorithm’s second part takes O(n2 ) time since the speed of each job may change at most O(n) times. Therefore, the overall complexity of the algorithm is O(n2 ).

3 Budget variant with arbitrary release dates We now consider the budget variant of the maximum lateness problem, where the jobs have arbitrary release dates, i.e., S1 | rj | Lmax (E) and we show that it becomes strongly N P-hard. Moreover, we show that there is no O(1)-competitive algorithm for its online version, even when all jobs have unit works. 3.1 NP-hardness We reduce 3-PARTITION to the S1 | rj | Lmax (E) problem. 3-PARTITION problem is a well known N P-hard [7] problem where, we are given a positive integer P B and a set of 3n positive integers A = {a1 , a2 , . . . , a3n }, where B/4 < ai < B/2 and ai ∈A ai = nB, and we ask if

10

Evripidis Bampis et al. 0

2B job 3n + 1

A1 B

4B A2

job 3n + 2 3B

6B job 3n + 3

A3

5B

···

A4 7B

(2n − 6)B An−2

(2n − 4)B

job 4n − 2

(2n − 5)B

An−1

(2n − 2)B

job 4n − 1

(2n − 3)B

An (2n − 1)B

Fig. 2 A feasible schedule σ for S1 | rj | Lmax (E) that attains maximum lateness equal to Lmax = (2n − 1)B.

there exists a partition of A into n disjoint sets A1 , A2 . . . , An such that, for each 1 ≤ k ≤ n, P ai ∈Ak ai = B. Our reduction is inspired by the N P-hardness proof for the classical 1 | rj | Lmax problem [11], where we are given a set of jobs with each job i having a release date ri , a due date di and a processing time pi and we seek a schedule minimizing the maximum lateness; note that, the feasibility version of this later problem is also known as the SEQUENCING WITHIN INTERVALS problem [7]. The 1 | rj | Lmax problem can be viewed as a variant of our problem where the speed of each job is part of the instance. Specifically, we consider that each job i has an amount of work wi = pi and it is executed at a constant speed si = 1. Based on this idea, we extend the existing N P-hardness reduction by fixing an energy budget, so that all jobs have to be executed at the same speed si = 1 in order to get a feasible schedule. Theorem 2 S1 | rj | Lmax (E) problem is strongly N P-hard. Proof We construct an instance of S1 | rj | Lmax (E) from an instance of 3-PARTITION as follows. The instance is depicted in Table 1. – For each integer ai , 1 ≤ i ≤ 3n, we create a job i with wi = ai , ri = 0 and qi = 0. – We introduce n − 1 gadget jobs, where the gadget job i, 3n + 1 ≤ i ≤ 4n − 1, has wi = B, ri = (2i − 6n − 1)B and qi = (8n − 2i − 1)B. – We set E = (2n − 1)B. i 1 2 ... 3n 3n + 1 3n + 2 3n + 3 ... 4n − 2 4n − 1

wi a1 a2 ... a3n B B B ... B B

ri 0 0 ... 0 B 3B 5B ... (2n − 5)B (2n − 3)B

qi 0 0 ... 0 (2n − 3)B (2n − 5)B (2n − 7)B ... 3B B

Table 1 An instance of S1 | rj | Lmax (E) reduced from an instance of 3-Partition.

We shall prove that there is a feasible schedule σ with Lmax = (2n − 1)B and total energy consumption E = (2n − 1)B if and only if there exists a 3-PARTITION of A. P (⇐) For the first direction, assume that A1 , A2 . . . , An is a partition of A, where ai ∈Ak ai = B for 1 ≤ k ≤ n. Then, consider the schedule σ where: (i) each job i corresponding to an integer ai ∈ Ak , 1 ≤ k ≤ n, is scheduled during the time interval [2(k − 1)B, (2k − 1)B], (ii) each gadget job i, 3n + 1 ≤ i ≤ 4n − 1 is scheduled during the time interval [(2i − 6n − 1)B, (2i − 6n)B], and (iii) all jobs are executed at constant speed si = 1. The schedule σ (see Figure 2) is feasible and attains maximum lateness equal to Lmax = (2n − 1)B. The total energy consumed is E = P4n−1 P4n−1 α−1 = i=1 wi = (2n − 1)B. i=1 wi si (⇒) For the opposite direction, assume that σ is a feasible schedule with Lmax = (2n − 1)B and total energy consumption E = (2n − 1)B. In σ, each job i, 1 ≤ i ≤ 3n, must have completion

Speed Scaling for Maximum Lateness

11

time Ci ≤ (2n − 1)B and each gadget job i, 3n + 1 ≤ i ≤ 4n − 1, must have completion time Ci ≤ (2i−6n)B, since Li ≤ (2n−1)B for every job i. For notational convenience, let W = (2n−1)B be the sum of works of all jobs. Let also pi be the execution time of job i, 1 ≤ i ≤ 4n − 1. It holds also that the completion time of (the last job of) schedule σ is Cmax = (2n − 1)B. To see this, assume for the sake of contradiction that Cmax < (2n − 1)B. Then, by the convexity of speed-to-power function, it follows that the total energy consumption in σ will be E(σ) =

4n−1 X i=1

wi sα−1 = i

4n−1 X i=1

wi

wi pi

α−1

≥W

W Cmax

α−1

> (2n − 1)B

which is not possible because the energy budget is exceeded. With a similar argument, it can be shown that there will be no idle time during the interval [0, (2n − 1)B] in σ. Moreover, due to the convexity of the speed-to-power function, among the schedules with makespan Cmax = (2n − 1)B which have no idle period during [0, (2n − 1)B], only the ones in which all the jobs are executed with speed equal to sj = 1 have energy consumption not greater than E = (2n − 1). Clearly, σ must be one of these schedules. Hence, every gadget job i, 3n + 1 ≤ i ≤ 4n − 1, is executed within the whole time interval [(2i − 6n − 1)B, (2i − 6n)B] in σ. So far we have shown that every gadget job i, 3n + 1 ≤ i ≤ 4n − 1, spans in σ the time interval [(2i − 6n − 1)B, (2i − 6n)B], while the other jobs i, 1 ≤ i ≤ 3n, span the time intervals [2(k − 1)B, (2k − 1)B], 1 ≤ k ≤ n. Therefore, during any interval [2(k − 1)B, (2k − 1)B], 1 ≤ k ≤ n, there will be executed a set of jobs with total amount of work B. This execution defines a 3PARTITION for A. 3.2 The on-line case Let us now turn our attention to the online version of the S1 | rj | Lmax (E) problem. Bansal et al. [4] gave an adversarial strategy for proving that there is no O(1)-competitive algorithm for the problem of minimizing the total flow of a set of unit work jobs on a single speed-scalable processor. This adversarial strategy consists of batches of jobs, B1 , B2 , . . . , Bk , with all the jobs in batch Bi released after the online algorithm has finished all the jobs in Bi−1 . Following a similar strategy it can be proved that the makespan minimization problem, for a given budget of energy, i.e. the problem S1 |rj , wj = 1| Cmax (E), does not admit an O(1)-competitive algorithm. Note that the makespan minimization is a special case of our lateness problem (with qi = 0, 1 ≤ i ≤ n). Theorem 3 There is no O(1)-competitive algorithm for the online version of the S1 | rj | Cmax (E) problem, even when jobs have unit works. Proof In order to prove the theorem, we assume the existence of a ρ-competitive algorithm A, where ρ > 1 is a constant. Then, we reach a contradiction by showing that there is an instance of the problem that cannot be feasibly solved by A. We consider a set of jobs consisting of batches B1 , B2 , . . . , B` , where the batch Bi , 1 ≤ i ≤ `, contains ni = 2i−1 unit work jobs which all arrive at the same time; the jobs of the batch B1 are released at the time r1 = 0 while the jobs of the batch Bi , 1 ≤ i ≤ `, are released at time ri . We assume that ri is large enough so that the algorithm A has completed the jobs in the batches B1 , . . . , Bi−1 by ri . ∗ We denote by Cmax,k , 1 ≤ k ≤ `, the value of the makespan that the optimal offline algorithm achieves for the instance that consists exactly of the jobs in the batches B1 , B2 , . . . , Bk . The term ∗ Cmax,k is upper bounded by the makespan of the schedule in which all the jobs in B1 , B2 , . . . , Bk are assigned equal speeds such that their energy consumption is equal to the energy budget E and they are executed continuously starting at time rk . Therefore, Pk

∗ Cmax,k ≤ rk +

i=1 ni

E

1 α α−1

(19)

12

Evripidis Bampis et al.

As A is a ρ-competitive algorithm, it must complete all jobs of the batches B1 , B2 , . . . , Bk not ∗ later than ρ · Cmax,k independently of the number of batches that our original instance contains. Otherwise, it wouldn’t be ρ-competitive for the instance of the problem that consists only of the batches B1 , B2 , . . . , Bk . Let Cmax,k be the completion time of the jobs in batches B1 , B2 , . . . , Bk in A’s schedule. Then, it must be the case that ∗ Cmax,k ≤ ρ · Cmax,k

(20)

Let Ek be the energy consumption of the jobs in batch Bk in A’s schedule. Due to the convexity of the speed-to-power function, we have that Ek ≥

nα k (Cmax,k − rk )α−1

(21)

∗ By combining inequalities (19), (20), (21) and the fact that rk ≤ Cmax,k , we obtain that

Ek ≥ P

nα k

k i=1

ni

Since ni = 2i−1 for 1 ≤ i ≤ k, we conclude that Ek ≥

α

E (2ρ − 1)α−1

E 2α (2ρ − 1)α−1

Thus, if the number of batches ` is large enough, i.e. ` → ∞, the algorithm will run out of energy after having completed d2α (2ρ − 1)α−1 e batches, so it won’t be able to finish the batches that follow. 4 Aggregated variant In this section, we turn our attention to the aggregated variant of the maximum lateness problem, where our objective is to minimize Lmax + βE, for some β > 0. For this variant, in the online case, we are able to overcome the impossibility of obtaining constant-factor competitive algorithms (Theorem 3). Initially, we consider instances in which the jobs have a common release date and we describe how to obtain an optimal offline algorithm for the aggregated variant by slightly modifying our algorithm and its analysis for the budget variant in Section 2. For instances with arbitrary release dates, we explain why our N P-hardness proof for the budget variant implies that the aggregated variant is also N P-hard. Last, we turn our attention to the online case of the aggregated problem in which the jobs arrive over time and we propose a 2-competitive algorithm which schedules the jobs into batches, by repeatedly applying our optimal offline algorithm for jobs with a common release date. Common release dates. When all jobs are released at the same time, S1 | | Lmax + βE, we are able to derive a polynomial algorithm, by using Algorithm BUD in the following way: suppose that we are given the energy consumption E ∗ of an optimal schedule minimizing Lmax + βE. Then, in order to construct such an optimal schedule, it suffices to apply the optimal algorithm for the budget variant with an energy budget equal to E ∗ . This means that the optimal schedule for the aggregated variant is a regular schedule, satisfying the properties of Lemma 1 (with budget E ∗ ). However, in order to construct the optimal schedule minimizing Lmax + βE, we need to compute E ∗ . One approach, which has been already suggested in the literature for the total flow time criterion (see [3, 4]), would be to perform a binary search procedure in the interval of all possible energy levels. Here, we describe an alternative approach which resembles to the one we followed for the budget variant. We first formulate the aggregated variant as a convex program similar to the one for the budget variant. Now, we do not introduce a constraint on the total energy consumption, since it

Speed Scaling for Maximum Lateness

13

is added in the objective function. By applying the KKT conditions, we obtain almost the same structure (properties) of an optimal solution with one single difference: the energy consumption is not specified by a given budget of energy, but it results from the fact that the speed of the first job should always be equal to a fixed value. Specifically, theProperty (vii) of Lemma 1 is replaced by the fact that “the job executed first runs at speed s1 =

1 (α−1)β

α

”. Therefore, in order to obtain

an optimal schedule for the aggregated variant, it suffices to do the following: Run lines (1)-(6) of Algorithm BUD. Let σ be the schedule produced. Find the highest-index critical job, i, i 6= 1, in σ, such that its corresponding sequence, (k, i), has speed si < s1 . Modify σ such that all jobs 1, 2, . . . , k − 1 are executed at speed s1 .

Arbitrary release dates. It is not hard to see that if we are given an optimal algorithm for the aggregated variant, then we can obtain an optimal polynomial algorithm for the budget variant by using binary search on the possible values of β and stopping at a value β ∗ such that the energy consumption of the schedule minimizing Lmax + β ∗ E is equal to the energy budget. Since, by Theorem 2, the budget variant is N P-hard to solve, we conclude that the aggregated variant is also N P-hard. Now, we turn our attention in the online version of the aggregated variant and we derive a 2competitive online algorithm for the S1 | rj | Lmax +βE problem. The algorithm schedules the jobs in a number of phases by repeatedly applying the optimal offline algorithm for the S1 | | Lmax +βE problem. This approach was introduced in [15]. We denote by σ ∗ (J, t) the optimal schedule of a set of jobs J with a common release date t. Algorithm ALE. Let J0 be the set of jobs that arrive at time t0 = 0. In phase 0, jobs in J0 are scheduled according to σ ∗ (J0 , 0). Let t1 be the time at which the last job of J0 is finished, i.e., the end of phase 0, and J1 be the set of jobs released during (t0 , t1 ]. In phase 1, jobs in J1 are scheduled as in σ ∗ (J1 , t1 ) and so on. In general, if ti is the end of phase i − 1, we denote Ji to be the set of jobs released during (ti−1 , ti ]. Jobs in Ji are scheduled by computing σ ∗ (Ji , ti ). Next, we analyze the competitive ratio of the algorithm. Theorem 4 Algorithm ALE is 2-competitive for the online version of the S1 | rj | Lmax + βE problem. Proof Assume that Algorithm ALE produces a schedule in ` + 1 phases. Recall that the jobs of the i-th phase, i.e., the jobs in Ji , are released during (ti−1 , ti ] and scheduled as in σ ∗ (Ji , ti ). Let Lmax,i + βEi be the cost of σ ∗ (Ji , ti ), where Lmax,i is the maximum lateness among the jobs in Ji and Ei be the energy consumed by the jobs of Ji . The objective value of the algorithm’s schedule is ` X SOL = max {Lmax,i } + β Ei (32) 0≤i≤`

i=0

Now, we consider the optimal schedule. To lower bound the objective value OP T of an optimal schedule, we round down the release dates of the jobs; the release dates of the jobs in phase i, are rounded down to ti−1 . Let σd∗ and OP Td be the optimal schedule for the rounded instance and its cost, respectively. Clearly, any feasible schedule for the initial instance is also feasible for the rounded one. Thus, OP T ≥ OP Td . To lower bound OP Td we consider a restricted speed-scaling scheduling problem, i.e., a problem where each job can only be executed by a subset of the available processors. The instance of this problem consists of ` + 1 available speed-scalable processors M0 , M1 , . . . , M` and the set of jobs J, with their release dates rounded down, as before. Jobs in J0 can only be assigned to the processor M0 and every job in Ji can only be executed by one of the processors M0 or Mi , 1 ≤ i ≤ `. Moreover, it is required that all jobs in Ji , 0 ≤ i ≤ `, are executed by the same processor. Let

14

Evripidis Bampis et al.

∗ σm , OP Tm be the optimal schedule and its cost, respectively, for this restricted problem. Obviously, OP Td ≥ OP Tm since σd∗ is feasible for the restricted scheduling problem. ∗ Let us now describe an optimal schedule σm . Through a simple exchange argument, it can be shown that the jobs of Ji , 0 ≤ i ≤ `, in an optimal schedule, are executed by the processor Mi . Moreover, jobs in Ji , for 1 ≤ i ≤ `, are scheduled according to σ ∗ (Ji , ti−1 ), while for i = 0, according to σ ∗ (J0 , t0 ). Assume that the maximum lateness of the above schedule, is attained by a job of the set Jk , 0 ≤ k ≤ `, in the processor Mk . So, let L∗max = L∗max,k , where L∗max , L∗max,k ∗ is the maximum lateness of the schedules σm , σ ∗ (Ji , ti−1 ), respectively. Let Ei∗ be the energy ∗ consumption of schedule σ (Ji , ti−1 ). Then,

OP Tm = L∗max,k + β

` X

Ei∗

(33)

i=0

By considering the schedules σ ∗ (Ji , ti−1 ) and σ ∗ (Ji , ti ), it can be easily shown that L∗max,i = Lmax,i − (ti − ti−1 ) and Ei∗ = Ei . Then, by (32) and (33) it yields that OP Tm = SOL − (tk − tk−1 ). Note that tk − tk−1 is the total processing time of the jobs in Jk−1 , in the schedule produced by ∗ ALE, which is equal to the processing time of the jobs in Jk−1 in σm . Recall also that the last job ∗ of each set Ji attains Lmax,i . Thus, tk − tk−1 ≤ Lmax,k−1 ≤ OP T . Therefore, SOL ≤ 2OP T and Algorithm ALE is 2-competitive for the S1 | rj | Lmax + βE problem. 5 Concluding Remarks We presented positive and negative results for the offline and online power-aware versions of the classical maximum lateness scheduling problem. These results, along with the existing literature on power-aware versions of other problems, like makespan and total flow time, form a strong evidence for the complexity of the power-aware scheduling problems: they are in the same complexity class (polynomial or N P-hard) as their classical versions. For polynomial algorithms, the most promising approach consists of deducing strong structural properties of optimal schedules by applying the KKT conditions on a convex programming formulation of the problem. For N P-hardness results, existing N P-completeness reductions of the corresponding classical problems can be adapted, by forcing all jobs to be executed with speed equal to one and considering the processing times as works. An interesting direction for future work concerns the use of resource (energy) augmentation (see [9, 14]) for the online case of the budget variant of the problem, in order to overcome the fact that there is no O(1)-competitive deterministic algorithm (see Theorem 3). It is also interesting to improve the competitive ratio of Algorithm ALE, in Section 4, for the aggregated variant.

References 1. S. Albers. Energy-efficient algorithms. Communications of ACM, 53(5):86–6, 2010. 2. S. Albers. Algorithms for dynamic speed scaling. In Symposium on Theoretical Aspects of Computer Science, volume 9 of LIPIcs, pages 1–11. Schloss Dagstuhl, 2011. 3. S. Albers and H. Fujiwara. Energy-efficient algorithms for flow time minimization. ACM Transactions on Algorithms, 3(4):article 49, 2007. 4. N. Bansal, K. Pruhs, and C. Stein. Speed scaling for weighted flow time. SIAM Journal on Computing, 39(4):1294–1308, 2009. 5. Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. 6. D.P. Bunde. Power-aware scheduling for makespan and flow. Journal of Scheduling, 12(5):489–500, 2009. 7. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of N P -Completeness. W.H. Freeman and Company, New York, 1979. 8. J. R. Jackson. Scheduling a production line to minimize maximum tardiness. Res. Rep. 43, Management Science Research Project, UCLA, 1955. 9. B. Kalyanasundaram and K. Pruhs. Speed is as powerful as clairvoyance. Journal of the ACM, 47(4):617–643, 2000. 10. E.L. Lawler, J. K. Lenstra, A.H.G. Rinnooy Kan, and D. B. Shmoys. Sequencing and scheduling: algorithms and complexity. In Handbooks in Operations Research and Management Science, volume 4, pages 445–522. North Holland, 1976.

Speed Scaling for Maximum Lateness

15

11. J.K. Lenstra, A.H.G. Rinnooy Kan, and P. Brucker. Complexity of machine scheduling problems. Annals of Discrete Mathematics, 1:343–362, 1977. 12. A. Nemirovski, I. Nesterov, and Y. Nesterov. Interior Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics, 1994. 13. K. Pruhs, P. Uthaisombut, and G.J. Woeginger. Getting the best response for your erg. ACM Transactions on Algorithms, 4(3):article 38, 2008. 14. K. Pruhs, R. van Stee, and P. Uthaisombut. Speed scaling of tasks with precedence constraints. Theory of Computing Systems, 43(1):67–80, 2008. 15. David B. Shmoys, Joel Wein, and David P. Williamson. Scheduling parallel machines on-line. SIAM J. Comput., 24(6):1313–1331, 1995. 16. F. F. Yao, A. J. Demers, and S. Shenker. A scheduling model for reduced cpu energy. In IEEE Symposium on Foundations of Computer Science, pages 374–382, 1995.