18.4 - Principles of local search 2 - Local search algorithms - [dsa 2] - By tim roughgarden

Let's move on to the second sense in which the generic local search algorithm is under-determined. The main y loop reads, if there is a superior neighboring solution Y, then reset the current solution to be y. But there may of course be many legitimate choices for y. For example, in the maximum cut problem, given a cut, there may be many vertices whose Switch to the opposite group yields a cut with strictley more crossing edges. So when you have multiple superior neighboring solutions, which one should you choose. Well this is a tricky question and the existing theory does not give us a clear cut answer. Indeed it's seems like the right answer is highly domain dependent. Answering the question is probably going to require some serious experimentation on your part. With the data that you're interested in. One general point I can make is that recall when you're using local search to generate from scratch an approximate solution to an approximation problem you want to inject randomness into the algorithm to coax it to explore the solution space and return as many different locally optimal solutions as possible over a bunch of independent runs. You're going to remember the best of those locally optimal solutions. Is going to return, remember, the best one at the end of the day. So this is a second opportunity, in addition to the starting point, where you can inject randomness into local search. If you have many possible improving values of y, choose 1 At random. Alternatively you could try to be clever about which y they should choose. For example, if there's multiple superior neighboring y's, you could go to the best one. For example, in the maximum cut problem, if there's lots of different vertices, who've switched to either group, will yield a superior cut Pick the vertex which increases the number of crossing edges by the largest amount. In the traveling salesman problem, amongst all neighboring tours with smaller costs, pick the one with the Minimum over all costs. So this is a perfectly sensible rule about how to choose y. It is myopic, and you could imagine doing much more sophisticated things to decided which y to go to next. And indeed, if this is a problem you really care about, you might want to work hard to figure out what are the right heuristics for choosing y, that work well in your application. A third question that you have to answer, in order to have a precisely defined local search algorithm, is what are the neighborhoods. For many problems, there is significant flexibility in how to define the neighborhoods. And theory again does not give clear-cut answers about how they should be designed. Once again this seems like a highly domain dependent issue. If you want to use local search on a problem of interest It's probably going to be up to you to empirically explore which neighborhood choices seem to lead the best performance of the local search algorithm. One issue is likely that you'll have to grapple with is figuring out how big to make your neighborhoods. To illustrate this point, let's return to the maximum cut problem. So there we defined the neighbors of a cut to be the other cuts you can reach by taking a single vertex and Moving into the other group. This means each cut has a linear number, o of n, neighboring cuts, but it's easy to make the neighborhood bigger if we want. For eaxmple, we could define the neighbors of a cut to be those cuts you can reach by taking you, 2 vertices or fewer, and switching them to the opposite groups. Now each cut is going to have a quadratic number of neighbors. More generally of course, we could allow what single local move to do k swaps of vertices between the 2 sides, and then the neighbor size would be o(n)^k. So, what are the pros and cons of enlarging your neighborhood size. We generally speaking, the bigger you make your neighborhoods, the more time you're going to have to invest searching for in improving solution in each step. For example, in the maximum cut problem if we implement things in a straight forward way If we only allow 1 vertex to be switched in an iteration then we only have to search through a linear number of options to figure out if there's an improving solution or not. On the other hand, if we allow 2 vertices to be switched in a given iteration we have to search through a quadratic number of possibilities whether or not we're currently locally optimal. So bigger the neighborhood generally speaking the longer its going to take to check whether or not you're currently locally optimal or whether there's some better solution that you're supposed to move on to. The good news about enlarging your neighborhood size is that you're going to have only fewer local optima. And in general, some of these local optima that you're pruning are going to be bad solutions that you're happy to be rid of. If you look back at the example I gave you in the previous video that demonstrated that the simple local search for maximum cut can be off by 50%. You'll see that if we just enlarge the neighborhoods to allow 2 vertices to be swapped in the same iteration, then all of a sudden on that 4 vertex example, local search would be guaranteed to produced the globally maximum cut. The bad locally optimal cuts have been pruned away. Now, even if you allow 2 vertices to be swapped in a given iteration, there's going to be more elaborate examples showing the local search might be off by 50%, but on many instances allowing this larger neighborhood will give you better performance from local search. Summarizing one high level design decision you should be clear on in your head before you apply the local search heuristic design paradigm to a computational problem is how much do you care about solution quality versus how much do you care About the computational resources required. If you care about solution quality a lot and you're willing to either wait or you're willing to throw a lot of hardware at the problem, that suggests using bigger neighborhoods. They're slower to search but you're likely to get a better solution at the end of the day. If you really want something quick and dirty, you want it to be fast, you don't care that much about the solution quality, that suggests using simpler, smaller neighborhoods. Neighborhoods that are fast to search, knowing that some of the local optima you might get might not be very good. Let me reiterate, these are just guidelines, these are not gospel. In all computational problems, but especially with local search, the way you proceed has to be guided by the particulars of your application. So make sure you code up a bunch of different approaches, see what works. Go with that. For our final two questions let's supposed you've resolved the initial three and you have a fully specified local search algorithm. So you've made a decision about exactly what your neighborhoods are, you figured out the sweet spot for you between efficient searchability and the solution quality you're likely to get at the end of the day. You've made a decision about exactly how you're going to generate the initial solution. And you've made a decision about how, when you have multiple neighboring superior solutions, which one you're going to go to next. Now, lets talk about what kind of performance guarantees you can expect, from a local search algorithm. So lets first just talk about running time, and lets begin with the modest question, is it at least the case, that this local search algorithm is guaranteed to converge Eventually. In many of the scenarios you are likely to come across, the answer is yes. Here's what I mean. Suppose you're dealing with a computational problem where the set of possible solutions is finite. And moreover, your local search is governed by an objective function. And the way you define a superior neighboring solution, is that it's one with better objective function value. This is exactly what we were doing in the maximum cut problem. There the space was finite, it was just the set of exponentially many graph Cuts and our objective function which is the number of crossing cuts. Similarly for the traveling salesman problem, the space is finite. It's just the roughly infactorial possible tours, and again, how do you decide which tour to go to next? You look one that decreases the objective function value of the tour. Total cost of the tour. Whenever you have those 2 properties, finiteness and strict improvement in the objective function, local search is guaranteed to terminate. You can't cycle, because every time you iterate you get something with a strictly better objective function value, and you can't go on forever, eventually you'll run out of the finitely many possible things to try. There is, of course, no reason to be impressed by the finite conversions of local search. After all, brute force search, equally well terminates in finite time. So this class is all about having efficient algorithms that run quickly. So the real question you want to ask is, local search guaranteed to converge, in say, polynomial Here, the answer is generally negative. When we studied the unweighted maximum cut problem, that was the exception that proves the rule. That, there, we only needed a quadratic number of iterations before finding a locally optimal solution, but, as we mentioned in passing, even if you just pass to the weighted version of. Of the maximum cut problem. There already, a local search might need, in the worst case, an exponential of iterations, before halting with a locally optimal solution. In practice, however, the situation is rather different, with local search heuristics often, finding, locally optimal solutions, quite quickly. We do not at present have a very satisfactory theory that explains, or predicts, the running time of local search algorithms. If you want to read more about it, you might search for the keyword smooth analysis. Another nice feature of local search algorithms, is that even if you're in the unlucky case, where your algorithms are going to take a long. In time, before it finds a locally optimal solution, you can always just stop it early. So when you start the search, you can just stay look, if after 24 hours you haven't found for me an local optimal solution, just tell me the best solution you've found thus far. So in addition to running time, we want to measure the performance of a local search algorithm in terms of its solution quality. So, is that going to be any good? So here the answer is definitely no. And again, the maximum cut problem was the exception that proves the rule. That's a very unusual problem that you can prove at least some kind of performance guarantee about the local, locally optimal solutions. For most of the optimization problems in which you might be tempted to apply the local search design paradigm, there will be locally optimal solutions quite far from globally optimal ones. Moreover this is not just a theoretical pathology. Local search algorithms in practice will sometimes generate extrememly lousy locally optimal solutions. So this ties back into a point we made earlier, which is if you're using local search not as a just post processing improving step, but actually to generate from scratch a hopefully near optimal solution to an optimization problem, you don't just want to run it once, because you don't know what you're going to get. Rather you want to run it many times, making random decisions along the way, either from a random starting point, or choosing random improving solutions to move to, so that you explore as best you can the space of all solutions. You want over many executions of local search to get your hands on as many different locally optimal solutions as you can, they you can remember the best one. Hopefully the best one at least will be pretty close to a globally optimal solution.