Tip:
Highlight text to annotate it
X
In this video, we'll apply the divide and conquer algorithm design paradigm to the
problem of multiplying matrices. This will culminate in the study of Strassen's
Matrix Multiplication Algorithm. And this is a super cool algorithm for two reasons.
First of all, Strassen's Algorithm is completely non trivial. It is totally
non-obvious, very clever. Not at all clear how Strassen ever came up with it. The
second cool feature, is, it's for such a fundamental problem. So computers, as long
as they've been in use, from the time they were invented, up'til today, a lot of
their cycles are spent multiplying matrics, 'cause it just come up all the
time in important applications. So let me first just make sure we're all clear on
what the, what the problem is of. Multiplying two matrices. So, we're gonna
be interested in three matrices, X, Y, and Z. They're all gonna, I'm gonna assume
they all have the same dimensions, N by N. The ideas we'll talk about are also
relevant for multiplying non square matrices, but we're not gonna discuss it
in this video. The entries in these matrices, you know, you could think of it
as whatever you want. Maybe they're integers, maybe they're rationals, maybe
they're from some [inaudible] field. It depends on the application. But the point
is, they're just entries that we can add and multiply. So how is it that you take
two N by N matrices, X and Y, and multiply them producing a new N by N matrix, Z?
Well, recall that the IJ entry of Z, that means the entry in the Ith row and Jth
column, is simply the dot product of the Ith row of X with the Jth column of Y. So
if IJ was this red square, this [inaudible] over in the Z matrix, that
would be derived from the corresponding row of the X matrix, and the corresponding
column of the Y matrix. And recall what I mean by dot product. That just means you
take the products of the individual components, and then add up the results.
So ultimately, the ZIJ entry boils down to a sum over N things, where each of the
constituent products is just the XIK entry. The [inaudible] of the matrix X
with the KJ entry, of the matrix Y, where your K is ranging from one to N. So that's
how ZIJ is defined for a given pair of indices, I and J. One thing to note is
where we often use N to denote the input size, here we're using N to note the
dimension of each of these matricies. The input size is not N. The input size is
quite a bit bigger than N. Specifically, each of these are N by N matricies and
contain N squared entries. So since presumably we have to read the input which
has size and square. Which happens to produce the output that also has size and
squared. The best we can really hope for [inaudible] is multiplication hour with
the running time n squared. So the question is how close when we get to it.
Before we talk about algorithms for matrix multiplication, let me just make sure
we're all crystal clear on exactly what the problem is. So let's just actually
spell out what would be the result of multiplying two different, two bytes of
matrices. So we can. [inaudible] 21 generic 2x2 matricies by just giving the
first one entries A, B, C, and D for these four entries could all be anything. And
then we're multiplying by a second 2x2 matrix, let's call it entries E, F, G, and
H. Now, what's the result of multiplying these, where again, it's going to be a 2x2
matrix for each entry, it's just the corresponding dot product of the relevant
row of the first matrix and column of the second matrix. So to get the upper left
entry. You take the doc product of the upper row of the first matrix and the
first column of the left column of the second matrix. So, that results in. Ae
plus BG. To get the upper right entry, we take the dot product of the top row of the
left matrix with the right column of the second matrix. So that gives us AF+BH. And
then filling in the other entries the same way, we get CE+DG and DF+DH. You know, so
that's multiplying two matrices, and we've already discussed the definition in
general. Now, suppose you had to write a program to actually compute to the result
of multiplying two N by N matrices. One natural way to do that would just be to
return to the definition and which defines each of the N squared entries in the Z
matrix as a suitable sum of products of [inaudible] entries of the X and Y
matrices. So on the next quiz, I'd like you to. Figure out what exactly would be
the running time of that algorithm as a function of the matrix dimension N where
as usual we count the addition or multiplication of two individual entries
as a constant time operation. So the correct response to this quiz is the third
answer, that the running time of the straightforward [inaudible] algorithm runs
in cubic time relative to the matrix dimension n. To see this let's just recall
what the definition of the matrix multiplication was. The definition tells
us each entry zij of the output matrix z is defined as the sum from k=1 to n of.
Xik times YKJ. That is the [inaudible] product of the [inaudible] row of the X
matric and the J column of the Y matrix. Certainly assuming that we have the
matrices represented in a way that we can access a given entry in constant time. And
under that assumption, remember each of these, each of these products. Only takes
constant time. And so then to compute ZIJ we just have to add up these end products.
So that's gonna be theta of N time to compute a given ZIJ and then there's an N
squared [inaudible] that we have to compute. There's N choices for I, N
choices for J, so that gives us N squared times N or cubic running time overall for
the natural algorithm, which is really just a triple four loop which computes
each entry of the output ray separately using the dot product. So the question as
always for the keen algorithm designer is. Can we do better? Can we beat, in cube
time, by multiplying two matrices? And we might be especially emboldened with the
progress that we've already seen in terms of multiplying two integers. We apply the
divide and conquer algorithm, to problem multiplying two end digit integers. And we
had, both a naive recursive algorithm, and a seemingly better. Algorithm due to
[inaudible], which made only three recursive calls. Now we haven't yet
analyzed the running time of that algorithm. But as we'll see later, that
does indeed beat the quadratic running time of the grade school algorithm. So
it's very natural to ask, can we do exactly the same thing here? There's the
obvious [inaudible] algorithm, which follow straight from the definition.
Perhaps analogous to [inaudible], we could have some clever divide and conquer
method, which beats cubic times. So that's what we're gonna explore next. [sound].
Let's recall the divide and conquer paradigm, what does it mean to use it.
Well, we first have to identify smaller problems. So if we want to multiply by two
NxN matricies we have to identify multiplications of smaller matricies that
we can solve recursively. Once we've figured out how we want to divide the
given problem into smaller ones, then in the conquer step we simply invoke our own
algorithm recursively that's going to recursively multiply the smaller matricies
together. And then, in general, we'll have to combine the results of the recursive
calls to get the solution for the original problem, in our case, to get the product
of the original two matricies. From the product of what ever sub matrices we
identify. So how would be apply the divide and conquer paradigm to matrices? So we're
given two end my end matrices, and we have to somehow identify smaller pairs of
square matrices that we can multiply recursively. So the idea, I think, is
fairly natural. So we start with a big end by end matrix X. And so those end rows and
end columns, we have to somehow divide into smaller pieces. Now, the first thing
you might think about is that you put it in its left half and its right half and
now it goes into what we've been doing with the rays, but then we're going to
break x into two matrices which are no longer squared which are n over two in one
dimension and have length n in the other dimension and we want to recursively call
a subroutine that multiplies square matrices. So what seems like the clear
thing to do is to divide x into quadrants. Okay, so we have four pieces of X. Each is
gonna be N over two by N over two, corresponding to the different quarters of
this matrix. So let's call these different quadrants or blocks, in matrix
terminology, A, B, C, and D. All of these are N over two by N over two matrices. As
usual, for simplicity, I'm assuming that N is even, and as usual, it doesn't really
matter. And we can do the same trick with Y. So we'll divide Y into quadrants. And
number two by N number two matrices which we'll call E, F, G and H. So one thing
that's cool about matrices, is when you split ?em into blocks, and you multiply
them, the blocks just behave as if they were atomic elements. So what I mean by
that is that the product of x and y can be expressed in terms of its quadrants and
each of its four quadrants, each of its four corners can be written as a suitable
[inaudible] expression of the quadrants of x and y. So here's exactly what those
formulas are. They are exactly analogous to when we just multiplied pair of two by
two metrics. So I'm not going to formally prove this fact. I'm sure many of you,
have seen it before or are familiar with it. And if you haven't, it's actually
quite easy to prove. It's not obvious, you can't see it off the top of your head,
necessarily. But if you go back to the definition, it's quite easy to verify. The
D, when you multiple X and Y, you can express as quadrants to blocks, in terms
of the blocks of X and Y. So what we just did is completely analogous to when
talking about integer multiplication and you wanted to multiply two integers,
little x and little y, and we broke them into pairs of N over two digits. And then
we just took the expansion and we observed how that expansion could be written in
terms of products of N over two digit numbers. Same thing going on here accept
with. Matrices. So now, we're in business, as far as a recursive approach. We wanna
multiply X and Y. They're N by N matrices. We recognize we can express that product X
times Y, in terms of the products of N over two by N over two matrices. Things
we're able to multiply recursively, plus additions. In additions [inaudible]
clearly easy to multiply two different matrices with say, N squared entries each,
it's gonna be linear in the number of entries. So it's gonna be N squared
[inaudible] add two matrices that are N by N. So this immediately leads us to our
first recursive algorithm. To describe it, let me quickly rewrite that expression we
just saw on the previous slide. And now, our first recursive algorithm is simply to
evaluate all of these expressions in the obvious way. So specifically, in step one,
we recursively compute all of the necessary products, and observe that there
are eight products that we have to compute. Eight products of N over two by N
over two matrices. There are four entries in this expansion of X times Y. Each of
the, each of the blocks is the sum of two products, and none of the products
re-occurred, they're all distinct. So, naively, if you wanna evaluate this, we
have to eight different products of N over two by N over two matrices. Once those
recursive calls complete, then all we do is do the, necessary four additions. As we
discussed, that takes time proportional to the number of entries in a matrix. So this
is gonna take quadratic time overall, quadratic [inaudible] N, linear in the
number of entries. Now, the question you should be asking is. Is this a good
algorithm? Was this good for anything, this recursive approach, splitting X and Y
into these blocks, expanding the product in terms of these blocks, the recursively
computing each of the blocks. And I want to say it's totally not obvious, it is not
clear what the running time of this recursive algorithm is. I'm going to go
ahead and give you a spoiler which is going to follow from the master method
that we'll talk about in the next lecture. But it turns out with this kind of
recursive algorithm where you do eight recursive calls, each on a problem with
dimensions half as much as what you started with, and then do quadratic
[inaudible] outside. The right time is going to be. Cubic. So exactly the same as
with the straightforward iterative algorithm that follows from the
definition. That was cubic, it turns out, and that was clearly cubic. This one,
although it's not obvious, is cubic as well. So no better, no worse than the
straightforward iterative algorithm. So in case you're feeling disappointed that we
went through all this work and this sort of seemingly clever divide and conquer
approach for matrix multiplication, and, and came out at the end no better than the
interactive algorithm. Well, there's really no reason to despair, cuz remember,
back in integer multiplication, we had a straightforward recursive algorithm where
we had to do four recursive calls, products of N over two digit numbers. But
then, we had [inaudible] trick which said, oh, if we only did more clever products
and more clever additions and subtractions, then we can get away with
only three recursive calls. And we'll see later that, that isn't even significant
savings, in the time [inaudible] multiplication. And we've done nothing
analogously. [inaudible] douse's trick, it was matrix multiplication problem. All we
did is the naive expansion in terms of sub-matrices, and the naive evaluation of
the resulting expressions. So. $64,000 question is then, can we do something
clever to reduce the number of recursive calls from eight down to something lower
and that is where Stratuson's algorithm comes in. So at a high level, Stratuson's
algorithm has two steps, just like the first recursive algorithm that we
discussed. It recursively computes some products of smaller matrices and over two
[inaudible] by two matrices. [sound] But there's only going to be seven of them.
But they will be much less straightforward, they will be much more
cleverly chosen than in the first recursive algorithm. [sound]. And step
two, then, is to actually produce the product of X and Y, produce each of those
four blocks that we saw, with suitable, additions and subtractions of these seven
products. And again, these are much less straightforward than in the first
recursive algorithm. [sound]. And so while the additions and tractions involved will
be. A little bit more numerous, then they were in the naive recursive algorithm.
It's only gonna change the work in that part of the algorithm by a constant
factor. So we'll still spend only theta even squared work adding and subtracting
things. And we get a huge win in decreasing the number of recursive calls
from eight to seven. Now, just in case you have the intuition that shaving off one of
the recursive calls. Should only decrease the running time of an algorithm by
one-eighth, by 12.5%, in fact it has a tremendously more amplified effect because
we do one less recursive call over and over and over again as we keep recursing
in the algorithm. So it makes a fundamental difference in the eventual
running time of the algorithm, as we'll explore in detail in the next set of
lectures, when we discuss the master method. So again. A bit of a spoiler
alert. What you're gonna see in the next set of lectures is indeed. [inaudible]
Rhythm does beat [inaudible]. It's better than [inaudible]. You'll have to watch the
next set of lectures just so you know what the running time is. When right now, that
[inaudible] call is what changes the [inaudible] cubic. Now, 1969 was
obviously, quite a bit before my time, but. By all accounts, from people I've
talked to who were around then, and from, you know, what the books say, Strassen's
Algorithm totally blew people's minds at the time. Everybody was assuming that
there's no way you could do better than the iterative algorithm, the cubic
algorithm. It just seemed that matrix multiplication intuitively fundamentally
required all of the calculations that are spelled out in the definition. So
Strassen's Algorithm is an early glimpse of the magic and of the power. Of clever
algorithm design. That if you really have a serious ingenuity, even for super
fundamental problems, you can come up with fundamental savings over the more
straightforward solutions. So those are the main points I wanted to talk about
Strassen's Algorithm, how you can beat cubic time by saving a recursive call with
soon to be chosen clever products and clever addition subtractions. Maybe a few
of you are wondering, you know, what are these cleverly chosen products? Can you
really do this? And I don't blame you. There's no reason to believe me, just cuz
I sort of spelled out this [inaudible] idea. It's not obvious this should work.
You might actually want to see the products. So, for those of you like that,
this last slide is for you. So here is Straus' algorithm in it's somewhat gory
detail. So let me tell you what the seven products are that we are going to form.
I'm going to label them P1 through P7 and they're all going to be defined in terms
of the blocks of the inter-matrices X and Y. So let me just remind you that we think
of X in terms of it's blocks, A B C D. And we think of Y in terms of its blocks E, F,
G, H. And remember, A through H are all N over two by N over two sub-matricies. So
here are the seven products, P1 through P7. P1 is A times quantity F minus H. P2
is quantity A plus B times H. P3 is C plus D times E. [sound]. P4 is D times G - E,
P5 is quantity A+D + quantity A+H. P six is quantity B minus D times quantity G
plus H and finally P seven is quantity A minus C E plus F. So I hope you'll agree
that these are indeed only seven products, and we could compute these with seven
recursive calls. We've preprocessed with a little bit of additions and subtractions.
We have to compute F minus H, A plus B, C plus D and so on. We compute all these new
matrices from the blocks, and we can then recursively, with seven recursive calls,
do these seven products that operate on N over two by N over two matrices. Now, the
question is, why is this useful? Why on Earth would we wanna know the [inaudible]
of these seven products? So the amazing other part of the algorithm is that from
just these seven products, we can, using only addition and subtraction, recover all
four of the blocks of. A X times Y So x times y. You'll recall we expanded because
of the blocks. So we previously computed this to be AE+BG in the upper left corner,
and [inaudible] expressions for the upper right, lower left, and lower right blocks.
So this we already know. So the content of the claim is that these four blocks also
arise from the seven products in the following way. So the claim here is that
these two different expressions for X times Y are exactly the same and they're
the same block by block. So in other words, what the claim is then this. Crazy
expression. P five plus P four minus P two plus P six. Where those were four of the
products we listed above. That is precisely A plus B G. Similarly we're,
we're claiming that P1 plus P2 is exactly AF plus BH. That's actually easy to see.
P3 plus P4 is CE plus DG. That's also easy to see, and then the other [inaudible] one
is that P1 plus P5 minus P3 minus P7 is exactly the same as CF plus DH, so all
four of those hold. So let me just so you believe me cuz I don't know why you would
believe me unless I actually showed you some of this derivation. Let's just look
at the proof of one of the cases of the upper left corner. So that is, let's just
expand out this crazy expression. P5+P4-P2+P6, what do we get? Well, from
P5, we get AE+AH+D+DH, and we add P4, so that's gonna give us, Plus DG minus DE,
then we subtract P2, so that it gives us a minus, minus DH and then we add in P6. So
that gives us a PG plus BH minus DG minus DH. Okay, so what happens next, well now
we look for cancellations. So we cancel the H's. We cancel the D.E.'s, we cancel
the D.H.'s. We cancel the DGs. We cancel the BHs. And holy cow what do we get, we
get A, E. Plus B G. That is, we get exactly what we were suppose to. In. The
upper left block of X times Y. So we just actually verified that this equation holds
for the upper left block. It's quite easy to see that it holds for the upper right
and lower left blocks and a comparable calculation verifies it for the lower
right blocks of the two. So summarizing, because this claim holds. >> Because,
actually, we can recover the four blocks of S times Y from the seven products.
Strauss' algorithm works in the following way: you compute the seven products, P1
through P7, using seven recursive calls. Then you just compute the four blocks
using some extra additions and subtractions as shown in the claim. So
seven recursive calls on a number two by number two matrices, plus. N squared work
to do the necessary additions as we'll see on the master method lecture that is
actually sufficient for. Sub humid time. Now I sympathize with you if you have the
following question. Which is how on Earth did Strauson come up with this? And
indeed, this sort of illustrates, the difference between checking somebody's
proof, and coming up with a proof. Given that I told you the magical seven products
and how you, from them, can recover the four desired blocks of x times y, it's
really just mechanical to see that it works. It's a totally different story of
how you come up with p1 through p7 in the first place. So how did Strassen come up
with them? Honestly, your guess is as good as mine.