18.7 - Other topics - Java - [compilers theory] by alex aiken

In this video, we're going to wrap up our discussion of Java by taking a look at a couple of additional topics and how they are integrated into the language design. Consistent with Java's dynamic nature Java allows classes to be loaded at runtime. But this means that you can actually add functionality to an executing Java program so, while it's running, by loading a new class. And this creates potential issues with type safety and security because now there is a distinction between compiled time and load time. So, type checking of the source takes place at compile time and this is the kind of type checking we discussed in earlier in earlier videos. But the, the loader, when you actually go to load a class, you're loading bytecode, you're not loading source and it's not being type checked again. And it could be, that this bytecode didn't come, you know, from a trusted source. This bytecode might not be the output of a compiler that did type checking, before it produced the bytecode. So, the bytecode might not actually satisfy the type assumptions of the Java implementation. So essentially, we have to check the bytecode again. And a, and a procedure called bytecode verification takes place when the class is loaded, alright. And, and byte code verification is really a type checking of bytecode. That's, that's essentially what it does. The procedure is a little bit different because we don't have you know, the code here is much lower level and so the algorithms look a little bit different. But what they're really doing is type checking the, thebytecode. So, now the loading policies are handled by something called the class loader. And the class loader is a special class in Java and it decides what classes can be loaded and actually early on in Java, a bunch of security problems were discovered. Aware an attacker could get control of the class loader install it's own class loader that would be much more permissive than the Java standard class loader and subve rt the system. But those issues were fixed quite awhile ago, alright. And another interesting thing about Java is that the classes may also be unloaded. So, you don't, you can not, not only load classes, you can also unload classes. And the last time I checked this was not particularly well specified in the definition and so it's a little bit unclear exactly what it meant when you unloaded the class and what happened to all the existing objects, for example, of that class. Now, I'd like to spend a few minutes talking about initialization in Java, which is quite complex, and this shouldn't be too much of a surprise because if you remember initialization in COOL was also pretty complex and Java is just a superset of COOL so it has all the initialization issues that COOL has plus much more. And now the main source of complication is concurrency but other language features also add to the complexity of initialization in Java. And, in fact you could do worse. If you want to understand a new object oriented language, then study how it does object initialization and class initialization. Because essentially what happens in initialization is that all the features of the language are going to be interacting and you have to explain what all those interactions are and how they are sorted out in order to have a well-defined initialization procedure, alright. So, now let's talk about class initialization. We won't talk about object initialization, we'll just talk about initializing classes. So, this is how the, the object that represents a class actually gets initialized when that class is first brought into the program. And so, the first thing to know is that a class is initialized when a symbol in a class is first used, okay, not when the class is loaded, alright? So, if you reference any symbol in the class at the first time that happens that will cause the class to be initialized. And the reason for doing this is if you are going to have an error in class initialization, th is will cause that error to happen in a predictable place. So, if you have an error and you run the, you, you have an error in class initialization if you run the program five times you know, that error will probably happen in the same place every time. So, it'll be repeatable and predictable where the error occurs. If instead we had, the error happened where you loaded the class at the time that you loaded the class, well, the class might be loaded at lots of different times. And, and, and so this, this, this error here the error in the initialization of the class would become non-deterministic if we didn't if we didn't delay the initialization until some deterministic point in the execution. So, now I'll discuss the procedure for initializing class objects in Java. And the first thing I should stress is that this idea of a class object is something that Java has that COOL does not have, I mentioned this on the previous slide. But just to be completely clear, what is a class object? A class object is just what it sounds like, it is the object for a class. It represents a class. Okay, this is not an instance of the class. This is an object which is the class, okay. So, this is an object which is the class, it has all the information about the class so, you know, it tells you what the type of the class is, what the fields of the class are, and everything else. So, this is used for introspection or reflection. And it's necessary in Java because of features like dynamic loading. So, if, you know, if you want, if you dynamically load a class though you want to be able to use that class, you have to have some way of querying what the, what kinds of methods and things the class has and that is what the class object is for. So, there is one object, there is one class object for each class in Java, alright. So, when you load a class, the first thing you have do is to initialize the class object. And how is that done? Well, we lock the class object for the class, alright. And if th at, if that object is already locked by another thread, then we'll simply wait on the lock, okay. So, we will wait until somebody tells us that it's okay to proceed. Now once we obtain the lock on the class, we have to do a check to see if the class is already being initialized, alright. So, and it could turn out that it is our thread, it is the same thread is already initializing the class. And how could that happen? Well, remember that a class can have fields of the same type. So, I could have a class of class called X and then it could have a field of type X in it. And the way classes are going to be initialize if we're going to have to initialize the class itself and then, and we're going to do that by recursively initializing the classes for all the fields or at least making sure of the classes for all the fields are initialize. And if we have a recursive structure here with the same class mention in a field as in a name, as the name of the enclosing class, then we will get the situation where the thread initializing the class may attempt to initialize the same class again. So, if we discover that we're already initializing this class, we simply release the lock and we turn. Now, another possibility is that the class is already initialized. So, if when we finally get the lock we discover that some other thread got in there and initialized the class before we have a chance to., well, then there's nothing to do and we just return normally, alright? Now, if neither one of these things is true, okay, if we get the lock and we discover that the class is not already initialized and that we're not already in the process of initializing the class, then we will mark the class to, to note the initialization is in progress by this thread, okay. So, we'll indicate, you know, this class is being initialized and that we are initializing it and then we'll unlock the class. Alright, the next thing that happens is we'll have to initialize the superclass and that will m ean initial, and then we'll initialize all the fields in textual order. But because Java has what are called static and final fields we will initialize those first, okay. So, static final fields will get initialized before any other fields in textual order. And, of course, we have to give every field of default value before initialization just as in COOL. So, this step, step five is very similar to what goes on in COOL. Now if there's an error during the initialization, so some part of the initialization throws an exception, then we're going to mark the class as erroneous, okay, we're going to mark this class as no good and can't be used and, and that's the best we can do. So, if there's an exception during initialization, we just have to give up on that class. And so it gets a special mark on it saying that it's erroneous. And, if there are no errors if we succeed in initializing the class and with and without any errors, then we're going to lock the class again. We will label the class as initialized, alright? And then we'll notify the threads that are waiting on the class object. So. Anybody who was locked waiting on the class object will now be alerted that the object is, is ready and then we'll unlock the class. Okay, and so that's a rough outline of how class initialization in Java works. I skipped over a few things and oversimplified it a bit. So, this isn't the complete description but these are the main points and they, and they illustrate how the various features of the language have to interact. So, you have to worry about concurrency, you have to worry about exceptions, you have to worry about static and final fields, you have to worry about inheritance. I mean, all these things have to be dealt with together in the design of a single algorithm to do class initialization. Stepping back for a moment this discussion of class initialization in Java illustrates a general point about designing complex systems. So, in any system with a certain number of featu res and every system is going to have some number of features, let's call it N, because you want to provide some functionality, obviously the thing the system's suppose to d so its going to have features to do those things. But as you add features, you get lots of interactions, potential interactions between the features and if we think about, even just the pairwise interactions. If I have N features, then I'll have, I don't know, about N^2 pairwise feature interactions. And the point there, of course, is that as I add features the number of possible interactions grows super linearly in the number of features, I mean, it grows much more quickly than the number of features. And so, adding the next feature, you're going to have to consider all of the previous features that you already have in the system and how this new feature affects them, and this is why it becomes very difficult to extend or build systems that have a lot of features, alright. And this is just the pairwise features. These are just, this is just considering pairwise interactions between one feature and another. If I have to start worrying about subsets of features, I'm thinking about how all possible subsets of features might interact with each other, well then, this step, this number of, of potential interactions will grow not just it will grow, in fact, exponentially. So, it'd be, you know, way more than quadratic. And the bottom line here is that big, feature-full systems are hard to understand. You know, this is you know, a general lesson in Computer Science and any kind of discipline that wants to design complex systems and, and this lesson applies to programming languages. It applies to every other kind of software system that you might want to build. But, and somehow it has a particular force in programming languages because these interactions between the features, you know, these are the features of the programming language, they happen at a very fine grain. And these things can be, really can b e composed arbitrarily and so you really do have to work out in language design, you know, what all the interactions are in order to have a language that people, that programmers can actually understand and use productively. Alright? And that really I think is the big, big idea that, one of the big ideas that we've talked about throughout the course. And, and I hope one of the things that you would take away from this lecture at least in particular. So, to summarize and to conclude our discussion of Java, I think Java is a, is a well-done language. By production standards, it is extremely well-done. So, it's one of the best designed and best specified languages that's in, in use today. It brought several important ideas into the main stream. So, when it was new it brought ideas that had been around for a long time but had not found their way into a production language that was very, very widely used and in particular, Java was the first language to be very widely used in, in commercial settings. They had strong sets of typing there that had real guarantees they were, you know, provided by the type-system and also there was a manage language and had a garbage collected memory. But that doesn't mean it's perfect. And it, and Java also includes some features that, at the time that it was designed, that we didn't fully understand and I would say, you know, that this are probably the rough areas where there's still some roughness in the, in, in the Java design. So, things like the way the memory semantics work in the presence of concurrency, you know, probably still has most people would agree I think, you now, has some problems and some, some little gray areas that as a program, you probably want to stay out of. And the other thing is that Java has a lot of features. And as I said before when you have a lot of features, you're going to have even more feature interactions and that leads to complexity that becomes difficult to manage.