Tip:
Highlight text to annotate it
X
.
.
.
In this particular . . . in this particular segment, we're going to talk about how to figure out whether a particular model
meets assumption of random errors, and what that means is that we have
several things which we have to look, that once we have developed the regression model, that it meets that
assumption of random errors.|These are the four things which we have to consider when we
have to see that whether a model meets the assumption of random errors.|The first thing which we have to figure out
is to see that whether the residuals are both negative as well as positive, the residuals
are the difference between the observed and the predicted values, so again, I'll mention that this is,
when we're talking about residuals, it's the difference between observed
and predicted values,
those have to be both negative as well as positive.|We also have to see that the variation
of the residuals as a function of the independent variable is random.
And then we also have to see that the residuals, whether the residuals follow a
normal distribution.|And then the third . . . the fourth part of this fourth
check is, is there any autocorrelation between the data points which have been
given to you?|So keep in mind that, although we are bundling it all as
one check, check number 4, but it does have four components to it,
one, two, three, and four components, and we're going to take an example to go through all those
four components.|Now, before I go any further, in the example which we
have chosen so far, we have only six data points, but what we are doing is that now we are choosing 22
data points.|This is mainly done to illustrate what is going on in the whole process.
So the reason why we are showing you all the 22 data
points now, as opposed to six data points which we had before, is to be able to
illustrate it a little bit better, because if we only take six data points, we will not be able to
illustrate the concept as well as before.|So there is . . .
there is, so far as the discussion of it is concerned, you can take the previous three checks which
we have made, and you can illustrate it by using these 22 data points,
the algebra will be a little bit longer, that's the only reason why I didn't show you with all the 22 data points.
So only for this check number 4 we're taking 22 data points, but you can, as a homework exercise,
take the first three checks, and deal with this particular data which is given to you here.
So again, what we are doing here is that we are
given 22 data points and we are plotting the data.|We are plotting the data, those are given by the green
dots which you are seeing there, and then the red line which you are seeing is the model itself.
So this model which you are seeing right here is nothing but this red line which is shown here.
So from that perspective, it looks . . . the straight line looks like a reasonable
approximation to the data, although you are finding out that
a parabola, or a second-order polynomial, might give it a better
estimate.|So what we are doing is now we are calculating
the residuals, which is the . . . residuals is the difference between observed values and predicted values,
and what we are finding out here is that, yes, we have negative residuals, this is a negative residual,
and we have positive residuals as well.|So that takes care of the first check
which we wanted to see that whether we are getting enough positive and negative residuals
or not, which does seem to be the case for this example.
Then what we want to be able to do is we want to be able to see that how
the histograms of the residuals look like, and what we have done is that we have basically figured out that,
hey, if we have the residual between this number and this number here,
how many times does that residual occur, and so on and so forth, and what you are finding out here is that
the residuals histogram which you are seeing here does not follow the normal distribution.
.
The normal distribution is not followed, so that would
tell you that this particular model is not adequate.
The next one, we have to figure out that, hey, let's check for whether there's any
autocorrelation.|Now, do we have any autocorrelation in this data?|In order to be able
to figure out whether we have autocorrelation in the data or not, we have
to figure out how many times does the sign of the residual change
once you go from . . . through the consecutive data points, and then
what we have to check is that whether this value, which is based, n is the number of data points which you
have, whether the number of times that the . . . that the residual is changing sign
falls in this particular range or not.|So in our case, we have 22
data points which are given to us, so when we substitute n equal to 22 into
this formula right here, we find out that q needs to be somewhere between 5.9
and 15.1.|So the q needs to be somewhere between 5.9 and 15.0 . . .
15.1, which means that is the number of times
the residual changing sign between 6 and . . . 6 and 16 . . .
6 and 15, let's suppose, is it between 6 and 15?|So let's go ahead and see that, is that the case?
That is not the case.|The reason why that is not the case is because you're finding out that,
look at this, the residual is not changing sign.|Only now the residual
is changing sign right here.|The residual is not changing sign as you are going through all these data
points here, and you are finding out now the residual is changing sign, but again, it's not changing
sign as you go down the row there.|So what that means is that the residual is changing
sign only two places, so q is equal to 2.|So what you are finding
out that the q, the value of q which you are getting is not between 5.9
and 15.08, which basically tells you that there is autocorrelation in
the data.|And that's the end of this segment.
.
.
.