The Online Specialization Problem

We study the online specialization problem, where items arrive in an online fashion for processing by one of n different methods. Each method has two costs: a processing cost (paid once for each item processed), and a set-up cost (paid only once, on the method's first use). There are n possible types of items; an item's type determines the set of methods available to process it. Each method has a different degree of specialization. Highly specialized methods can process few item types while generic methods may process all item types. This is a generalization of ski-rental and closely related to the capital investment problem of Y. Azar, Y. Bartal, E. Feuerstein, A. Fiat, S. Leonardi, and A. Rosen. On capital investment. In Algorithmica, 25(1):22-36, 1999.. We primarily study the case where method i+1 is always more specialized than method i and the set-up cost for a more specialized method is always higher than that of a less specialized method. We describe an algorithm with competitive ratio O(log(n)), and also show an Ω (log(n)) lower bound on the competitive ratio for this problem; this shows our ratio is tight up to constant factors.


Introduction
To motivate the online specialization problem, consider the scenario of hosting an online data archival service.Customers are expected to store many data files into the archive regularly but rarely read data from the archive.To minimize the cost of operating the archive, the host could automatically compress the data files before storing them in archive.Since the incoming files could represent text, sound, or any number of other possible types, different compression algorithms are needed for an efficient system.
As a simple example, suppose there are four different methods for processing data: method f 1 denoting no compression at all, f 2 denoting a standard dictionary coding technique good for generic unicode text, f 3 denoting a specialized encoding scheme for English prose, and f 4 an efficient compressor for sound.An English novel could be compressed very efficiently with f 3 , and less efficiently with f 2 , and not at all with f 4 .We say that f 3 is more specialized than f 2 (denoted f 2 ≺ f 3 ) because f 2 can compress anything that f 3 can compress.We also know f 1 ≺ f 2 , f 1 ≺ f 4 , and f 2 is incomparable to f 4 .
A simple model for the costs of operating the archive would be to assume each method has two costs: a set-up cost c i representing the cost of creating (or purchasing) method f i , as well as a "processing cost" p i reflecting the cost of maintaining the storage space of any compressed file produced by f i .This is an extremely oversimplified model for this scenario that assumes several things: 1) The cost of computer time for encoding and decoding is insignificant compared to the costs for creating the methods and for physical storage of the data; 2) All input files are the same size; and 3) Each method reduces the size of input files of the appropriate type by a fixed amount.Assumptions 2 and 3 imply a predetermined final size of any input file processed by method f i ; when the processing cost p i represents the cost for storing the file, it is proportional to the final compressed file size.Note that assumption 2 may not be completely unrealistic as any large file can be viewed as a sequence of many uniformly-sized smaller files.
As the input files of different types arrive in an online fashion, we need to choose between the available compression methods, incurring the processing costs for each input, as well as set-up costs during the first use of each method.Using the standard competitive analysis framework (see Borodin and El-Yaniv (1998)), our goal is to find an algorithm with low competitive ratio.This means we wish to minimize the cost that our online algorithm incurs compared to the cost for the optimal algorithm that knows all inputs in advance.We assume that the methods, their set-up and processing costs, and the ≺-relation between them are known in advance.
The original motivation for studying this problem came from the dynamic compilation problem.Dynamic compilation attempts to decrease a program's execution time by using run-time information to perform additional optimization techniques.For example, consider a program P that has a function f (x, y), and suppose P calls f (3, 4) many times.A dynamic compilation system could speed up program P by first detecting that f (3, 4) is called many times and then, next computing and storing the value, and finally performing a look-up of the value instead of recomputing for all future calls with the same parameters.This is called specializing f for x = 3 and y = 4.
Let f 1 , f 2 , and f 3 represent the unspecialized version of f , f specialized for x = 3, and f specialized for x = 3 and y = 4, respectively.Specialized version f 2 could be created by applying any standard compiler optimization (such as constant folding and loop unrolling) to the function f while assuming x = 3.
In the general problem, there are n different methods to process the inputs, and n different types of input.Furthermore, there is a specialization hierarchy defined by a partial order that represents the degree of specialization of each method.Our online algorithm must decide which specialization (if any) it should create on every input.In this paper, we concentrate primarily on the case where more specialized methods have higher set-up costs, and on the case where the graph representing the specialization hierarchy is a line.For this case we define an online algorithm that makes these specialization decisions with competitive ratio of O(log n).We also give a lower bound proof showing that every online algorithm has competitive ratio Ω(log n).

Related work
To our knowledge, no one has studied the online specialization problem before, although it is a generalization of previous work.The ski-rental problem is a simplification of our problem to the case where there are only two methods, and where all inputs can be processed with either method.It was initially proposed as an analogue to the competitive snoopy caching problem Karlin et al. (1988).
A generalization of ski-rental that is relevant to the problem we describe here is the capital investment problem studied by Azar et al. (1999).They considered the following manufacturing problem: Initially, there is a set of machines that can manufacture a certain item.Each machine costs a certain amount to buy and can produce the item at a certain cost.Furthermore, at any time, technological advances could occur which would make new machines available for purchase.These new machines have different purchase costs and lower per-item production costs.They studied the problem of designing an algorithm that minimizes the total cost of production (the sum of the cost of buying machines and the cost of producing items).
The online specialization problem is a generalization of the capital investment problem to the case where machines can be specialized, and some items which are manufactured can be produced by some machines and not others.However, the online specialization problem does not include the idea of technological advances, where the adversary could give the online algorithm completely new methods for processing the input at future points in time; in our problem all methods are assumed to be known at the beginning.
Many other ski-rental generalizations have a multi-level structure related to our problem.However, none of them adequately represent the specialization hierarchy.For instance, the online file allocation problem (see Awerbuch et al. (2003)) takes a weighted graph representing a network and files located at nodes in the network.The online input is a series of access requests to the files from particular nodes, and the algorithm is allowed to move data files from one node to another at some cost.With an appropriate network design, the processing costs in our algorithm could be modeled by access requests from one node for files located at many other nodes.However, the idea of generic specializations that can process many input types would force nodes modeling a generic specialization f 1 to always have all data files that represent methods more specialized than f 1 .In other words, the algorithm cannot decide to migrate files on a strictly individual basis; some files would be grouped into hierarchical layers so that any node with a layer i file must always have all layer j files for all j > i.
Other multi-level structure problems studied using competitive analysis include caching strategies with multi-level memory hierarchies for minimizing access time (starting with Aggarwal et al. (1987)), and online strategies for minimizing power usage of a device by transitioning between multiple available power states when idle (see Irani et al. (2005)).See Grant et al. (2000) for an experimental dynamic compilation system that motivated this work.

Algorithmic design issues
First consider the ski-rental case Karlin et al. (1988) where there are two methods f 1 and f 2 , and where c 1 = 0.An online algorithm with competitive ratio 2 is to wait until c 2 /(p 1 − p 2 ) inputs are seen before creating f 2 .
Our problem is significantly more challenging because of how the many different methods interact with one another.Consider the case with three methods f 1 , f 2 , and f 3 ; c 1 = 0; f 2 ≺ f 3 ; and p 2 > p 3 .Every input that f 3 can process can also be processed by f 2 .The defining question with specialized methods is choosing between f 2 and f 3 when the inputs can be processed with either.Creating f 3 first is better when many future inputs of type f 3 occur; however, if many future inputs can use f 2 but not f 3 , then f 2 is a better choice.We are faced with a tradeoff between highly specialized methods with low processing costs and widely applicable methods with high processing costs.Now consider the case with n increasingly specialized versions of f : We say an input has specialization level i when it can be processed by any f f i but not by any f ≻ f i .One difficulty with this problem is in measuring the actual benefit of creating f i with respect to a worst-case adversary.The following scenario illustrates that the apparent benefit of creating f i can decrease as more inputs are seen.
Suppose f 1 is the only method created so far and let inputs with specialization level i have been seen so far, and no inputs with lower specialization levels have yet been seen.The apparent benefit of creating f i is k(p 1 − p i ), representing the gain in benefit when using f i from the beginning instead of f 1 .Since this benefit exactly equals the cost c of creating f i , perhaps f i should be created.However, suppose we anticipate processing so many future inputs of specialization level i−1 that creating f i−1 is guaranteed to be a good decision.In this anticipated future scenario, the benefit of creating f i for the k inputs of specialization level i is only k(p i−1 − p i ), which may be much smaller than the apparent benefit of k(p 1 − p i ).It is not obvious when f i should be created in this scenario.

Naive extension
We end this section showing the poor performance of a simple extension of the ski-rental algorithm to the case of n increasingly specialized versions of f .This algorithm, when processing the next input I, computes what the optimal offline algorithm would do on all inputs seen so far since the beginning of the algorithm, including I. Then it processes I with the same specialization as the optimal, creating it if necessary.
Consider this algorithm's performance in the special case where the costs are This is designed so that the processing costs from p 2 to p n decrease linearly from from just below 1 n down to 1 2n .For this specific case with just one input, the online algorithm creates f 1 and uses it because c 1 = 0. Suppose two inputs with specialization level n are given.Clearly the optimal offline algorithm would use f n on those inputs to get a minimal total cost of 1 + 1/n, where 1 is the creation cost and 1/n is the processing cost for both inputs.This means the online algorithm would create and use f n when it sees the second input.Now suppose two inputs of level n are given followed by one of level n − 1.The optimal cost would use f n−1 for all three inputs; creating both f n−1 and f n is not cost effective because the relatively high creation costs do not offset the reduced processing cost.This means the online algorithm would create and use f n−1 on the third input, after already having created f n for the second input.Now suppose the total input consists of k inputs with specialization level 2 or higher, where 2 ≤ k ≤ n.The optimal offline strategy here is to just create and use f i on all the inputs where i is the minimum specialization level among the k inputs.At least one specialization in {f 2 , f 3 , ..., f n } must be created because of the high processing cost of f 1 compared to all other processing costs.No more than one can be created because the added benefit of creating a second specialization in the set is at most a 1 2n reduction in processing cost per input, and the benefit must outweigh the creation cost of 1.This means that more than 2n total inputs are necessary before the optimal algorithm could even consider creating more than one specialization in {f 2 , f 3 , ..., f n }.And since we know the optimal must use create just one specialization, the best one to create to minimize processing costs is clearly the most highly specialized one that can be applied to all inputs.
The behavior of the optimal offline algorithm above implies the following adversarial strategy for this problem.Give the online algorithm two inputs with specialization level n, followed by one input each at level from n − 1 down to 2, in the specified order.Note that n inputs are given overall.
With this strategy, the online algorithm uses f 1 for the first input.For the second input it uses f n because the optimal would have used f n when seeing just the first two inputs.On the third input, the online algorithm uses f n−1 because the optimal would use f n−1 for all three inputs.Generalizing, we see that the online algorithm uses the specializations f 1 , f n , f n−1 , f n−2 , ..., f 3 , f 2 in that order for the n inputs in the adversarial strategy.It is basically tracking the behavior of the optimal offline algorithm on a prefix of the entire input.
The total cost incurred by this online algorithm on these n inputs is thus more than n − 1 because it paid to create all specializations of level 2 and higher.In contrast, the optimal algorithm uses f 2 exclusively and pays just c 2 + np 2 = 1 + n 1 n − 1 n 2 = 1 + 1 − 1/n.This shows a ratio of n−1 2 ; this algorithm has a competitive ratio of at least Ω(n).
The lesson learned is to not aggressively create highly specialized methods, even if the optimal solution would create them based on inputs seen so far.Creating intermediate level specializations in anticipation of future inputs with specialization levels in the middle is a much better idea.
2 Problem definition and results

Online specialization problem definitions
• p : F → (0, +∞) represents the processing cost for each • ≺ is the partial order over F that determines the specialization hierarchy.
• lev(I) denotes the specialization level of input I, defined so if lev(I) = f i then any method f j such that f j f i can be used to process I.
We assume the set F, the cost functions p and c, as well as the partial order ≺ are known before the online algorithm starts.The online algorithm outputs decisions about when to create each f i as it sees the input sequence.We assume that once f i has been created, it persists and is available for use by any future input.We study the following two special cases of the online specialization problem.
• The monotone linearly-ordered specialization problem, denoted MLS(F, c, p, ≺), assumes ≺ is a total ordering on F and the following monotonicity constraint applies.
The monotonicity constraint says that more specialized methods have higher set-up costs.Note that if f j ≺ f i , c i ≥ c j , and p i ≥ p j , then no reasonable algorithm would ever use f i .This is because f j is more widely applicable than f i and it also costs less.Thus without loss of generality, in this monotone setting we assume that if f j ≺ f i , then p i < p j .The standard ordering for F based on creation costs also orders the methods by specialization level and by processing cost, so that • The equally specialized problem, denoted ES(F, c, p) assumes that all methods can be used to process any input.Furthermore, we also assume that c i > c j implies p i < p j .Note that an algorithm solving this special case is given in Azar et al. (1999), as well as in section 3.1.
We now define the costs for online algorithm A running on problem P with input σ.
• CRE P A (σ) denotes the cost for creating specializations.
• PROC P A (σ) denotes the processing cost.
• TOT P A (σ) = PROC P A (σ) + CRE P A (σ) denotes the total cost paid by A on σ.
OPT P (σ) denotes the cost of the optimal offline algorithm.Note that the optimal offline algorithm simply chooses the optimal set of methods to create at the beginning and then processes each input with the appropriate method.
In the above definitions, whenever problem P, algorithm A, or input σ is understood from context, we omit it.In the equally specialized setting, the online (and optimal) cost is determined solely by the number of inputs the adversary gives.Thus, given an ES problem, we replace the input parameter σ with an integer k representing the length of σ.
We derive performance bounds using standard competitive analysis.Let P(n) be the set of all problems with n specializations, and Σ P denote the set of all possible inputs for P. Then we say an algorithm

Results
For the MLS problem as described in section 2.1, we construct an algorithm MLSA that has competitive ratio O(log n), where n is the number of methods available.Our algorithm makes calls to ESA, a slightly modified version of the capital investment algorithm (CIA) from Azar et al. (1999).MLSA often creates the same specializations as ESA running on a related ES problem.The main idea is to partition F into "contiguous intervals" of the form {f i , f i+1 , ...f j }, and assign a worker to each interval.A worker's job is to process the inputs whose specialization level is inside its interval.Each worker runs completely independently of the other workers and makes its decisions based on the ESA algorithm running on the problem ES(F ′ , c, p), where F ′ ⊂ F. When a worker decides to quit, its interval is partitioned into smaller intervals, and more workers are created to handle the new intervals.
The two main theorems of this paper are as follows.
Theorem 2.2 Any online algorithm for the online monotone linearly-ordered specialization problem has competitive ratio at least (log n)/4.
Section 3 describes and analyzes the properties of ESA for the equally specialized setting.Section 4 describes and analyzes MLSA.Section 5 describes an adversary and a particular MLS(F, c, p, ≺) problem that shows an Ω(log n) lower bound on the competitive ratio for any online algorithm.
3 Design and analysis of ESA

Design of ESA
ESA is an online algorithm for solving ES(F, c, p); it can be thought of as a simple modification to the capital investment algorithm (CIA) that slightly delays the creation of specializations.The overall idea for ESA is to use the best already-created specialization until the total processing costs so far is roughly equal to the total optimal cost.ESA improves upon CIA because it is 6-competitive (as opposed to 7-competitive); however, it cannot handle the technological advances present in the capital investment problem Azar et al. (1999).ESA behaves as follows.
For the first input, ESA creates and uses the method f ′ that minimizes c(f ) + p(f ) over all f ∈ F. For the kth input, k > 1, let f b be the method ESA used for I k−1 .Then ESA first checks the condition (1) If condition ( 1) is false then ESA uses f b on the kth input.Otherwise, ESA creates and uses the method that minimizes p(f ) from the set (2)

Analysis of ESA
We first show that ESA is well defined, in particular, that whenever it chooses to create a new method, the set (2) above will contain a specialization that is more highly specialized than the one ESA had previously been using.We also state some lemmas about ESA that are required later in this paper for analyzing the performance of MLSA.
Lemma 3.1 Suppose condition ( 1) is true in the ESA algorithm when processing input I k .Let f * be the method used by the optimal algorithm for k inputs.Let f x be the new method from the set (2) above that ESA uses for input I k .Then we know that f * f x .
Proof: The truth of condition (1) implies that Let f ′ be the first method created by ESA.Applying p(f b ) ≤ p(f ′ ) ≤ PROC(k − 1) to the previous statement shows c(f * ) < 2PROC(k − 1).This means that f * is available for ESA to pick for processing input I k .Thus, the actual method f x that ESA picks must be either f * or another method with a lower processing cost than f * ; this shows f * f x .✷ Lemma 3.2 For any problem P = ES(F, c, p), ∀k, PROC P ESA (k) ≤ OPT P (k).
Proof: Because of the way the first method f ′ is chosen, it is clear that PROC(1) ≤ TOT(1) = OPT(1).
Assume by induction that PROC(k − 1) ≤ OPT(k − 1).Let f b be the method used on I k−1 and consider the kth input.If the condition 1 from section 3.1 is false, then we use f b on I k and know that PROC(k) ≤ OPT(k).Otherwise condition (1) is true, and we pay cost PROC(k − 1) + p(f x ) for k inputs, where f x is the method chosen from set (2).Let f * be the method used by the optimal algorithm for k inputs, so Then by our induction hypothesis and by the optimality of OPT, PROC 1) is true, ESA will create a new specialization that has a higher creation cost and lower processing cost than any specialization previously created.
Proof: Since ESA's processing cost stays below the optimal total cost (lemma 3.2), we know that condition (1) is typically false, and that it becomes true only when the current specializiation f b being used is less specialized (and has a higher processing cost) than the optimal one f * for on k inputs.Thus, when the condition becomes true, it will always be possible to switch to a better specialization.If all specializations have been created, then the condition will never become true again.✷ Lemma 3.4 Given a problem P = ES(F, c, p).Then ∀k, j > 0, Proof: Clearly the processing cost per input decreases over time as ESA creates specializations with lower processing cost.Thus processing k + j inputs together is better than processing k inputs and then restarting and processing j more.✷ Lemma 3.5 For any problem P = ES(F, c, p), let k * be the number of inputs ESA must see before it creates its second specialization.Then ∀k ≥ k * , CRE P ESA (k) ≤ 5PROC P ESA (k − 1).
Proof: Suppose ESA creates m specializations when run for k inputs.Let g i denote the ith specialization created, and k i denote the number of inputs seen when g i was created.For z > 1, g z was created because the processing cost was "about to surpass" OPT(k z ).This implies two facts (based on condition 1): , where f * is the method used by the optimal on k z inputs. Applying Fact 2 says ESA uses a less-than-optimal method for inputs up through k z − 1.For z > 2, this implies f * was not chosen at time k z−1 because its creation cost was too high at the time; Combining ( 3) and (4) and Fact 2 yields 2PROC(k z−1 − 1) < PROC(k z − 1), for z > 2. Since the processing costs must at least double with each new specialization created, we know (5) From the way that g z is chosen at time k z , Also c(g 1 ) + p(g 1 ) = OPT(1) < OPT(k 2 ) ≤ PROC(k 2 − 1) + p(g 1 ) (using condition 1), so clearly Thus, our total creation costs can be bounded as follows: (each g z used at least once) ≤ 5PROC(k − 1).
Proof: By lemmas 3.2 and 3.5, we know

✷
Lemma 3.7 Let ES(F, c, p) denote a problem where F has n specializations in the standard order, and let Furthermore, let F r = {f r , f r+1 , ..., f n }, let k * be the number of inputs that ESA must see on problem ES(F, c, p) before it creates some specialization in F r , and let k ′ be the number of inputs that ESA must see on ES(F ℓ , c, p) before it creates f r .Then Proof: Let A and A ℓ denote the execution of ESA on ES(F, c, p), and ES(F ℓ , c, p), respectively.As long as the extra specializations available in execution A are too costly to be created by ESA, A and A ℓ run in lock step and pay exactly the same costs.This is always true for k < k * ; thus equation ( 9) holds.The extra specializations in execution A may reduce the cost OPT as compared to execution A ℓ ; thus the condition (1) may become true earlier in execution A than in A ℓ , but it can never come later.This shows that k * ≤ k ′ .Once k ≥ k * , A creates and uses f * ∈ F r which may have lower processing cost than anything available to A ℓ ; this shows equation (8).By how f * is chosen and equation ( 9), From the ordering of F, we know c(f r ) ≤ c(f * ) and p(f * ) ≤ p(f r ).Applying this to equation ( 11) yields c(f r ) ≤ 2PROC ES(F ℓ ,c,p) (k * − 1) + p(f r ).This implies that after the k * th input in A ℓ , f r will be the next method created once method creation is allowed.Since this happens at input k ′ , no additional methods are created by A ℓ when running for up to k ′ − 1 inputs; this shows equation (10).✷ 4 Design and analysis of MLSA

Overview of MLSA
The two main ideas for solving MLS are partitioning F into subintervals, and using the behavior of ESA on a subset of the inputs to determine which specializations to create.MLSA consists of one manager, and many workers.The manager routes each input to a single worker who then processes the input.The manager also creates and destroys workers, as necessary.Each worker processes the inputs that it is given completely independently of all other workers.Each worker is defined by a tuple of integers, (q, r, s), where q ≤ r ≤ s.The worker only knows about specializations in the set {f i : q ≤ i ≤ s} (abbreviated [f q , f s ]); it cannot create any other specializations.The worker is only given inputs whose specialization level f i satisfies f i ∈ [f r , f s ].A worker's goal is to create f r which is typically in the middle of its interval.On the way to creating f r the worker runs ESA as a subroutine to figure out when it is necessary to create methods in [f q , f r−1 ].Once f r is created, the worker quits.
The manager maintains the partition invariant that if {(q i , r i , s i )} is the set of all current workers, then the sets [f ri , f si ] form a partition of F. With this invariant, it is a simple job to route the incoming inputs to the appropriate worker.Whenever a worker quits, the manager creates new workers in a manner that maintains the partition invariant.
Replacement workers are chosen to overcome the bad performance found in section 1.3.As an example, one of the initial workers is (1, ⌈ n+3 2 ⌉, n).This implies f ⌈ n+3 2 ⌉ is always created before any worker has a chance to create f n .Our algorithm does not create a highly-specialized method right away even when the initial inputs are all of specialization level f n .

MLSA Manager
The manager's two responsibilities are creating workers and routing inputs to the appropriate worker.The manager creates workers so that they partition {1, 2, ..., n} into contiguous subsets; a worker (q, r, s) "covers" the interval from r to s, in the following sense: every input I k is routed to the worker (q, r, s) such that lev(I k ) ∈ [f r , f s ].Whenever a worker (q, r, s) quits, the manager replaces that worker with several new ones that cover the interval from r to s, as depicted in figure 1.
Suppose the worker W = (q, r, s) has just quit, and let m = ⌈(r + s)/2⌉.The following three workers are created to replace it: (r, r, r), (r, r + 1, m), (r, m + 1, s).Note however, that the second and third workers listed are not created when the interval they cover is empty.We use the term created worker to refer to any worker created by MLSA when processing all the input (including the workers that have quit).
Initially, the manager creates f 1 .It then follows the above procedure as if worker (1, 1, n) had just quit.Note that just before a worker (q, r, s) quits, it creates f r .This fact and the way the manager creates workers imply the following invariants.
• f q is created before worker (q, r, s) is created.
• Let W be the set of current workers that have not yet quit.For all i, 1 ≤ i ≤ n, there is only one worker (q, r, s) in W such that r ≤ i ≤ s.
• A created worker (q, r, s) always satisfies either q = r = s or q < r ≤ s.
Fig. 1: Worker (q, r, s) followed by its three replacement workers.Circles represent a specialization created before the worker starts.Boxes represent the set of specialization levels handled by the worker.q r m s t t t t

MLSA Workers
Each worker uses a private array N to track the number of inputs of each specialization level, initially set to all zeroes.This array is local to the worker and not shared between workers.Let (q, r, s) be a worker.An invariant maintained is that when a worker processes input I, N [k] represents the number of inputs with specialization level f k that the worker has seen, including the current input I.
We know either (q = r = s) or q < r ≤ s.A worker with q = r = s uses f q to process all its input and never quits.A worker with q < r ≤ s processes input I of specialization level f j by first incrementing its private variable N [j].It then performs two additional steps to process I: the quit step and the process step.The quit step checks to see whether or not the worker should quit; if it decides to quit, then f r is created, and input I remains unprocessed (it is processed by one of its replacement workers).The process step decides the specialization f * to be used on I; f * is created if necessary.The decisions in both steps are made using calls to ESA with a subset of the methods available to the worker.The calls to ESA are all made with set-up costs c ′ defined so that c ′ (f q ) = 0 and c ′ (f ) = c(f ) for all f = f q .
In the quit step, the worker examines the behavior of ESA on many problems.Let S = {f : for some i, r ≤ i ≤ s, f is the method that ESA uses on ES([f q , f i ], c ′ , p) when run for If there is a specialization f ∈ S such that f r f , then the worker creates f r and then quits.
In the process step, the worker decides to use the specialization f * that ESA uses on ES([f q , f r ], c ′ , p) when run for s j=r N [j] inputs.

Analysis of MLSA
We separately bound the online processing and creation costs paid by MLSA relative to the total cost paid by the optimal offline algorithm.In particular, we prove the following theorems: These two theorems prove theorem 2.1.Our overall strategy for the processing cost bound charges the processing cost MLSA pays for each input I against the optimal cost for processing I. Let I be the set of all inputs processed with f i in the optimal solution; B = c(f i ) + |I |p(f i ) represents the cost the optimal pays for processing I .We show that the processing cost paid by MLSA for I is at most (2 log n)B.We derive this bound by separately bounding the total number of workers that could process inputs in I , and the cost contributed by any one worker for inputs in I .Our strategy for the creation cost bound is to bound the creation cost paid by each worker by a constant factor of the processing cost paid by the worker.We then reuse our bound of the processing cost relative to the optimal to get a bound of the creation cost relative to the optimal.
We now define terms for describing the subdivision of the costs involved, describe lemmas on the behavior of MLSA, and finally combine these lemmas in the proofs of the theorems.

Definitions
In describing the cost of MLSA, σ refers to the input sequence, and the P = MLS(F, c, p, ≺).Both σ and P are fixed for the rest of this section.The definitions below are abbreviated in that they implicitly depend on these two values.
We say V is an optimal interval if in the optimal algorithm, all inputs in σ whose specialization level is in V are processed by f v .
• When V is an optimal interval, I (V ) denotes the set of inputs from σ that are processed with f v by the optimal offline algorithm.
By definition, summing OPTINT(V ) over all optimal intervals yields OPT.
• If I is a subset of inputs from σ, then PROCSUB(I ) denotes the processing cost that MLSA paid for inputs in I , when MLSA was run on entire input sequence σ.
• Let W be a created worker.Then I (W ) = {I : I is in σ and W processed I}.
• CRESUB(W ) denotes the sum of the cost of the methods created by worker W .By definition, if W is the set of all created workers, then • Applying the previous definitions, if W and V are the set of all created workers and optimal intervals, respectively, then We also use the following notation to describe the ES problems that MLSA simulates.
Workers simulate many problems of this form to determine when to quit.
• If W = (q, r, s), then ES(W ) = ES(f q , f r ); this is the problem W simulates in the process step.
We now define several different categories of workers as illustrated by figure 2. We show different cost bounds on each category in order to prove our main theorems.Let V = [f v , f z ] be an optimal interval, and W = (q, r, s) be a created worker.
Cost-bearing workers of V charge some portion of their processing cost onto V in our analysis.
• We say W is a low-cost-bearing worker of V if it is cost-bearing and f q f v .We say W is a highcost-bearing worker of V if it is cost-bearing and f q ≺ f v .Unlike high-cost-bearing workers, lowcost-bearing workers pay per-input processing costs no worse than that of the optimal algorithm.
• We say W is an active worker of V if it is cost-bearing and We later show that inactive workers of V who quit are replaced by low-cost-bearing or non-cost-bearing workers, while active workers who quit can be replaced by high-cost-bearing workers.Note that active and inactive workers of V are by definition high-cost-bearing workers of V .Since our algorithm never creates workers where q = r < s, all high-cost-bearing workers of V are either active or inactive.

Processing cost bound
Let V be an optimal interval and W = (q, r, s) a cost-bearing worker of V .We bound PROCSUB(I (W )∩ I (V )) by the corresponding cost incurred by ESA on ES(W ).We then show a bound on the total cost charged against OPTINT(V ).
Lemma 4.3 Let W be a created worker and Then we know the following about PROC Seeing that ESA pays at most p(f q ) for each input yields statement 2.
For statement 3, we need to show that Assuming equation ( 12) is true, we apply lemma 3.2 and the fact that the specific method of using f v for k inputs is certainly no better than optimal method on k inputs; this yields To show equation ( 12), we consider the cases where W is inactive and W is active separately: where N refers to the value of the local array in W .Note that equality can hold only if W quits after seeing (but not processing) its k * th input.Since k counts only the inputs processed by W in [f v , f min(s,z) ], k < k * .Applying equation ( 9) of lemma 3.7 yields equation ( 12).ESA (|S|).We also know the per-input processing cost for worker W decreases over time just as it does in ESA.Thus PROCSUB(S) ≤ PROCSUB(S ′ ).✷ Lemma 4.5 Let V be an optimal interval.Let W be a non-active worker of V who quits, and let W ′ be a new worker created to replace it.Then W ′ is a low-cost-bearing or non-cost-bearing worker of V .
Proof: Since any replacement worker for W = (q, r, s) is within the interval from r to s, the replacement rules (figure 1) applied to the relevant worker types (figure 2) show the following: • Non-cost-bearing workers are replaced with non-cost-bearing workers.
• Low-cost-bearing and inactive workers are replaced with one low-cost-bearing worker and up to two additional non-cost-bearing or low-cost-bearing workers.

✷
Lemma 4.6 Let V be an optimal interval.Let W be an active worker of V .Suppose W quits and is replaced by new workers.Then one of the following conditions holds: 1.There are 2 replacement workers W 1 and W 2 , where W 1 is a non-cost-bearing worker of V , and W 2 is an inactive worker of V .
2. There are 3 replacement workers W 1 , W 2 , and W 3 .W 1 is not a non-cost-bearing worker of V , and at most one of W 2 and W 3 is an active worker of V .Furthermore, let |W | (resp. Proof: Let W = (q, r, s) be the active worker who quit, and let V = [f v , f z ] be an optimal interval.Since W is an active worker of V , This implies r < s.We now consider the manager's behavior on the remaining two cases: Case I. Suppose r + 1 = s.Since the manager sets m = ⌈(r + s)/2⌉ = r + 1, we know two workers W 1 = (r, r, r) and W 2 = (r, r + 1, r + 1) are created, and f v = f r+1 to satisfy equation ( 13).
Thus W 2 is an inactive worker of V , and W 1 is a non-cost-bearing worker of V , satisfying condition 1 of the lemma.
Case II.Suppose r + 1 < s.Then m = ⌈(r + s)/2⌉, where r < m < s.Thus three workers W 1 = (r, r, r), W 2 = (r, r + 1, m), and W 3 = (r, m + 1, s) are created, as illustrated in figure 3. We know W 1 is a non-cost-bearing worker of V .We now consider the following cases: In both IIa and IIb, we have at most one of W 2 or W 3 is an active worker of V .Thus case II satisfies condition 2, where the method of choosing m ensures the size bound.

✷
Corollary 4.7 Let V be an optimal interval.Let W be an active worker of V , and let |W | denote the number of input types W is responsible for.Then |W | ≥ 2.
Proof: For active worker (q, r, s), the proof of lemma 4.6 starts by showing r < s. ✷ Lemma 4.8 If V is an optimal interval, then there are at most 2 log n high-cost-bearing workers of V .
Proof: Lemma 4.5 says only active workers of V that quit can be replaced with high-cost-bearing workers of V , and that other workers that quit are not replaced with high-cost-bearing workers.By lemma 4.6, the initial workers created by the manager include at most 2 high-cost-bearing workers of V , at most one of which is active.Every time an active worker quits, its replacement workers include at most one active and at most one inactive worker of V .Let k be the number of active workers of V created.Let W 1 ,W 2 , ..., W k be the active workers of V , in order that they are created.There are at most k + 1 inactive workers: one for each inactive worker of V that was created at the same time as W i , and one for the inactive worker that could replace W k .Thus there are at most 2k + 1 high-cost-bearing workers of V .Now let |W i | denote the number of input types W i is responsible for.By lemma 4.6, for 1 This implies there are at most 2 log n high-cost-bearing workers of V .✷ Lemma 4.9 Let W be the set of all created workers, let V be an optimal interval.Then if n ≥ 2, |, and we get Proof: Let V be the set of all the optimal intervals, and Let W be the set of all the created workers.Now simply sum over all optimal intervals and apply lemma 4.9, obtaining ✷

Creation cost bound
Our strategy is to bound our creation cost of any created worker by the processing cost of ESA on the corresponding simulated ES problem.These processing costs then be summed to get a total creation cost bound relative to the optimal cost for the MLS problem.
Lemma 4.11 Let W be a created worker that quits.Then Proof: By the quitting condition of W , we can pick an i and an h that satisfy the following conditions: 1) r ≤ h ≤ i ≤ s and 2) ESA creates f h when run on ES(f q , f i ) for s j=i N [j] inputs.Let k * be the number of inputs ESA needs on ES(f q , f i ) to create some specialization in [f r , f i ]; let k ′ be the number of inputs ESA needs on ES(f q , f r ) to create f r .We derive the following facts.
Proof: Consider the tree where each node is a created worker, and the children of node S are the workers created to replace the worker of node S. The root of this tree represents the manager and its children are the initial workers created.Let W k denote all workers (q, r, s) at depth k in this tree that satisfy q + 1 < r.
In other words, W k are all nodes at level k in the tree that are "relevant" in that they satisfy q < i ′ < r for some integer i ′ .If W = (q, r, s) and W ′ = (q ′ , r ′ , s ′ ) are two workers in W, we can see that the intervals [q + 1, r − 1] and [q ′ + 1, r ′ − 1] are disjoint due to the way manager replaces workers.We can also see that with each successive level, the interval size decreases by a factor of two.These two facts are illustrated in figure 4.This means that there is at most one created worker on each level satisfying the conditions of the lemma, and that there are at most log n total levels in the tree.✷ Fig. 4: Relevant workers at many different levels of the tree.
Proof: Let W be the set of workers satisfying equation ( 14), and let W ∈ W . From lemma 4.11, we know that W must be a created worker that does not quit.By the way the worker simulates ESA, CRESUB(W ) = CRE ES(W ) (|I (W )|).Lemma 3.5 shows that any worker which uses at least 2 specializations and does not quit cannot belong in W ; thus W must use only one specialization.Thus CRESUB(W ) = c ′ (f k ), where f k is the specialization that W uses for its first input.Let W = (q, r, s).
Then we know k < r, because otherwise W would have quit.We also know k > q, because if k = q, then CRESUB(W ) = 0, meaning W / ∈ W .Let f i denote the first specialization where p i < c i .This means p j < c j for all j ≥ i, and p j ≥ c j for all j < i.We examine the possible workers in three cases.
Case I. i ≤ q.This implies p q < c q .Since f k was chosen to minimize the cost of processing one input, we know c k + p k < p q ; this results in the contradiction that c k < c q ; thus this case cannot occur.
Case II.q < i < r.Since the optimal must pay at least c k + p k to process one input in [f r , f s ], clearly CRESUB(W ) < OPT.Lemma 4.12 bounds the number of workers in this case by log n.
Case III.i ≥ r.This implies i > k, which means that p k > c k , and that CRESUB ESA (1), which would contradict W ∈ W . Thus, this case cannot occur.Thus there are a total of at most log n workers satisfying equation ( 14), and each one also satisfies CRESUB(W ) < OPT(σ).✷ Lemma 4.14 Let P denote MLS(F, c, p, ≺).Let W be the set of all created workers.Then if n ≥ 2, Proof: Let V be the set of optimal intervals.For any given W we can apply lemma 3.4 to get If we sum over all W ∈ W and then apply lemma 4.9, we get Proof: Let W denote all created workers W satisfying CRESUB(W ) < 5PROC

ES(W )
ESA (|I (W )|) and W ′ denote all created workers not satisfying the previous relation.Then applying lemma 4.13 followed by lemma 4.14 yields

✷ 5 Proof of lower bound
In this section, we describe an adversary that shows any online algorithm for the MLS problem has competitive ratio at least Ω(log n).We consider the case where the following is true: • ≺ is a total ordering on F, so ∀i > j, f j ≺ f i .
A description of the adversary is as follows: Start with S = F, and give inputs with specialization level f n .This continues until the online algorithm decides to create a specialization f k .f k divides S into two sets: {f 1 , f 2 ..., f k−1 }, and {f k , f k+1 , ..., f n }.Pick the larger set, and recursively apply this adversary to it.In other words, if k − 1 > n/2, then start giving inputs with specialization level f k−1 ; otherwise continue giving inputs with specialization level f n .Recursive calls continue while there are at least h methods in set S. We later choose h to maximize our lower bound.
This adversary is designed to maximize both the processing cost and the creation cost that the online algorithm pays in relation to the cost of the optimal solution.The key idea behind the adversary is that the online algorithm does not get to effectively use the specializations that it creates; any time it creates a specialization, the next input is chosen so that either that specialization does not get used, or it gets used but a more specialized version would have been better.The creation cost is maximized in that we expect the algorithm to create logarithmically many specializations.
In order to analyze the behavior of an online algorithm A on our adversary for the MLS problem, we provide the following definitions and invariants: • I k denotes the kth input given to A by the adversary.
• Define variables ℓ and r so that {f ℓ+1 , f ℓ+2 , . . ., f r } is the contiguous set of methods that the adversary tracks as the online algorithm runs; these are methods which are not yet used by A. We use ℓ k and r k to denote the values of ℓ and r, respectively, at the time just before the adversary has given the algorithm I k .Initially, ℓ 1 = 0 and r 1 = n.By definition, the adversary chooses I k to have specialization level f r k .
• Let m denote the total number of methods that A creates.
Lemma 5.1 Let A be an online algorithm given t inputs from the adversary.Let ℓ = ℓ t and r = r t be the left and right boundaries for the uncreated methods of A. Then on the t inputs, • the optimal algorithm pays at most 1 + t2 −r , and • A pays at least 2 −ℓ (t − m) + m.
Since the online algorithm must create a specialization after seeing its first input, ℓ ≥ 1.Thus we have a lower bound of 2 h−1 .
Proof: Since the competitive ratio must be at least 1, clearly the statement is true for n ≤ 2 4 .Lemmas 5.2 and 5.4 taken together imply that the competitive ratio of any online algorithm must be at least min(log⌊(n/h)⌋, 2 h )/2.Assuming n > 2 4 , choose h so that 2 h ≤ log n < 2 h+1 .This implies that 2 h > (log n)/2, and that h = ⌊log log n⌋.
Thus the competitive ratio is at least (log n)/4.✷

Conclusion
In conclusion, we present an online algorithm that decides between many different methods for processing input, where some inputs may be more specialized than others.Our algorithm is O(log n)-competitive, and we also provide an Ω(log n) lower bound on the competitive ratio for any online algorithm.We believe that our algorithm's design provides intuition for constructing online algorithms for a variety of practical problems, including the problem of dynamic compilation.There are many ways to improve our current model.The most obvious one is to eliminate the restriction on the partial ordering.Since log n is a rather poor bound for use in practice, it may be better to change the model to a statistical one in order to derive better performance bounds that may be more practical.These new models could include information on the frequency of occurrence or probability of occurrence of various input types.The worst-case scenarios represented by the adversary may not actually happen in practice.
Another future direction to explore for this problem is to consider the model where methods can expire and must be created again to be used.This can model the problems where only a limited number of methods can be active at any one time (due to limited resources), or where the machinery used to process the inputs wears out over time.In the dynamic compilation application, having a limited number of methods active corresponds to limiting the amount of memory available for specialized versions of code.Exploring these alternative models would result in a better understanding of tradeoffs inherent in this problem, and could lead to better design of algorithms for practical specialization problems.

Fig. 2 :
Fig. 2: Optimal Interval [fv, fz] followed by four different categories of created workers.vz t t non-cost-bearing t low-cost-bearing t inactive (high-cost-bearing) t active (high-cost-bearing)

✷
Lemma 4.4 Let W be a created worker, and let S be an arbitrary subset of I (W ).Then PROCSUB(S) ≤ PROC ES(W ) ESA (|S|).Proof: Let S ′ be the first |S| inputs that processed by W . Since W imitates the behavior of ESA on ES(W ), PROCSUB(S ′ ) = PROC ES(W )

Proof:
Lemma 4.4 implies the first inequality of this lemma.Let W = W 1 ∪ W 2 ∪ W 3 , where W 1 contains non-cost-bearing workers of V , W 2 contains low-cost-bearing workers of V , and W 3 contains high-costbearing workers of V .From lemma 4.3, we have a bound on PROC ES(W ) (|I (W ) ∩ I (V )|) for each W ∈ W i .Lemma 4.8 tells us |W 3 | ≤ 2 log n.We conclude that W ∈W PROCSUB(I (W ) ∩ I (V )) = 3 i=1 W ∈Wi PROCSUB(I (W ) ∩ I (V ))
Our competitive ratio is therefore at least