Multi - Center mutation calling in tcga - David wheeler

David Wheeler: Okay. Thank you, Ilya [spelled phonetically]. So Multi Center Mutation Caller -- Calling. What is that? A somewhat enigmatic topic and possibly mundane were it not for the fact that the mutation calls that we generate underpin so much of what we do in TCGA. So that makes this a necessary topic, and one that has undergone itself great evolution during the course of the TCGA. So what I'm going to do for the next 15 minutes is review the approaches to sematic mutation calling to consider what it is to call mutations with one caller, talk about the early benchmarking of sematic mutation callers that we did early on in the project, look at the early trials of three center calling and adoption of standards for the three center calling and look at the current status of what's the center calling and new developments in new patient calling. So the whole trick to this game is to be able to distinguish error from real variation, the real biological variation that we seek. There are two sources of that error. Some comes from the sequencing machines itself -- themselves, and are inherent in the base callers and fortunately most of this is randomly distributed and the base calls come with calibrations of Q values that enable us to distinguish the truth from error and we're able to filter most of this, and yet some of this escapes because there's a fair amount of it across the 50 billion reads that we take for a exome. And when these errors happen to be coincident, they're found then at allele fractions that might be similar to what we're looking for in the very heterogeneous tumor environments that we have. That's been described in other talks; I'm not going to go into that in this talk. But then there is systematic error that comes from mapping and alignment ambiguities. There's a lot of difficulty with hundred based reads and the structure of the genome that we have that makes this a very tricky problem, and this leads to high quality errors that are much more -- or high quality base differences that actually are reflective of true base differences, but the read just happens to be in the wrong place and so there -- they can easily be mistaken for mutations. So the way the current callers work is that they all have some what I'll call for lack of a better term a truth engine that distinguishes the real variation from sequencing error, and these formulations, which are largely Bayesian or log-ance [spelled phonetically] based formulations give or output tens of thousands to hundreds of thousands of events, which then have to be filtered, and it's at this point that heuristics come into play, and each calling center applies heuristics in a slightly different way, and most of the variation, or a very large fraction of the variation that passes those so called truth filters then get filtered out. The best documentation of this is in the paper by my colleague Chris Subulskis [spelled phonetically] at the Broad [spelled phonetically] when they did their Mutech [spelled phonetically] publication. And what this shows is that at each sequencing depth, you're filtering a significant fraction of the variance away due to these variance characteristics, which are being heuristically applied. This amounts to up to 90 percent of the variation that came through the first step is now disappearing, and this is going to be important as we go along because I believe this is where most of the variation between callers emerges. So with the mutations that we collect from a single caller, we get actually very nice profiles of significantly mutated genes, this is one example from colorectal cancer, but now we have 10 or 12 tumors in which we've collected similar profiles as this, and the profiles end up making a lot of sense. You can place these mutations into pathways that describe what's going on in a tumor in -- actually in detail that we've never seen before. But unfortunately, if you go back and compare the calls that different callers are making on one or two tumors, you see a picture that is at first somewhat disturbing, and this is where we were with the first benchmark proposed by David Hausler [spelled phonetically] back about in February of 2011, where here we have on the left what were the sematic calls, passing all the filters from each of three centers, and you can see that the number of variance in the center is fairly small compared to the total variation discovered by the callers in the union set, and in particular a lot more seemed to be going on in the variation that was unique to any given called than there was in the overlap. So then onto benchmark two, things didn't look a whole lot better there; 310 was the intersect from four callers here, and again you can see that going around the outside the number of events that were unique to all these different callers was much larger, and so this left the sinking feeling that there was a lot of true variation possibly being left on the table and a lot going unanalyzed. David took this another interesting step further and from that -- from the next benchmark did a analysis where for each center that was being considered in a Venn diagram, he looked at the top 100 calls from each patient that was being studied here, and looking in the top 100 calls of just the UCSC caller, there were ten calls that the UCSC caller was considering high quality that no other caller was seeing. And going around the horn here, we go over to the Wash-U caller, looking at their best calls, there were 143 that nobody else had seen. For Baylor here, there was about 1000. These are unvalidated calls, so even though the center is saying they're high quality, we don't really know. But -- and here we're 55 from the Broad, so potentially there was a lot not being turned up in the analysis of any given cohort by a single caller. So -- okay. Similar results were obtained in a set of colorectal samples where UCSC and Broad made calls. You can see a lot more going on in the unique calls, and similarly the high quality variance were numerous in the unique calls. So the conclusion from this was obviously the discordance between callers was high, and that was pretty dismaying, and the high quality calls is defined by one caller were missed -- were being missed by the others, so that particular discordancy was distressing. So the suggestion was that at least if we did multicenter calling, we would ameliorate some of the possibilities of false negatives. Okay. By the way, it's also a fact that if you were to apply callers just to diploid analysis, where here you have the tremendous benefit of an expectation of about 50-50 in their reference and variant alleles, which is a very powerful constraint on the data. If you look at the results from five different callers on diploid genomes, they only agree amongst themselves by 57 percent, so it's not just in sematic mutation calling that there can be difficulties, but even in this business as a whole, and my sense is that a lot of this, as I said before, is coming from the fact that we use heuristic callers to help filter the data and that those differences are -- we're basically sampling a very large multi-variant space and choosing different components of that space. So nonetheless going forward, we then began using three center calling on each of the cancers, and the results coming out of that where we now were able to superimpose validation data on it, showed that in the -- at least in the overlap, what was being called was extremely highly accurate. So in the three-center overlap shown here, 99 percent of the variance validated, and even when there were just two centers making the call, the percentage of validation was very high. Out in the uniques, the validation rates are much lower. Nonetheless, validation was occurring in the unique so that the three center calls at least can pick up false negatives, at least they're on the table for scientists to consider and accept or reject. Here's a similar analysis of lung adenocarcinoma data done by the Broad, and once again, a very high, 100 percent validation rate in the three center overlap, lower validation rates in the -- around the outside, and more recently in the -- this is the kidney clear cell project where we had 500 patients, this was the largest cohort at this time, and looking at validation within 177 of the cases, this shows the number of mutations that were validated and the percentage that were valid in each segment of the overlap, and you can see a very high validation in the three -- in the -- at least if two centers made a call, and a much lower validation rate in the unique data. So this led to two developments. The first was a medi-caller, developed by Terry Speed [spelled phonetically], and this medi-caller, he demonstrated was very highly accurate, and so it leads to a recipe for being able to take the multi-center calling data and make it highly accurate. In order to achieve that high accuracy though, it's necessary to have validation data to calibrate each of the callers that are involved, but once that's done; you can make very accurate calls from the data. And so one possibility, unfortunately prior to the marker papers coming out, we don't have the validation from all the centers, and so one thing that could be done retrospectively now is to go back and with the validation data available, and the calls from each caller, apply his method and generate even more comprehensive mutation data sets. The other thing this did was lead to a formalization of the multi-center calling, and with the recognition that the multi -- that the mutation callers are improving overall, and different callers detect different events, the validation cycles were taking too long to lead to expeditious publication of the marker papers, and so we began using multicenter calling in order to provide a significantly mutated gene list that's based on the calibrated accuracies of calls made by two or more centers. So this gave us a path forward and broke the conundrum of the problem we ran into with reviewers who never wanted to take at face value the mutation calls that were being provided without validation. So we accelerate the submission of the marker papers now, and -- Okay. We don't abandon validation, validation is still brought in, but that can go on while the paper is under review. Validation requiring a second independent sequencing event. Okay. So where are we now? The other thing that the multi-center calling does is enable other potentially interested researchers to add their callers in and experiment with the development of new methods, and so now we have for the adrenocortical carcinoma project, which is underway, and nearing a paper, I think earlier in this -- earlier in yesterday's session we saw a review of the work with this tumor, but now we've got five centers calling. You can see from this that there are still large numbers of unique calls, but now in the center, through the -- I think through the advancement and the improved sophistication of the calling, the center is the most heavily weighted, so the overlap is looking much better. The fact that we still have a number of calls on the outside and preliminary data that I -- sorry, I can't show you here, is that among these calls in the unique section, we look in the RNA seek data now and there are hundreds of events out here that are part of -- that appear to be being expressed and therefore are probably valid sematic mutations. So again, we continue to sample from a large space, and every new logic that is applied picks up more of that sampling. Most of what's out here are events that are very low allele fractions, so they're sub-clonal events in general, although not all of them are. Some of them have just escaped one or more of the heuristic parameters that are used in filtering. So now we have a second generation of mutation callers coming on and one of them is being developed by the genome center, in collaboration with MD Anderson, which measures a distance per position per sample to reflect a mutation evolution, and uncertainty estimates are based on a Bayesian-Markov model,, and therefore this method will come with a calibrated certainty akin to a Q value, a method is now being refined called viper by Wash-U and there's a Mutech version two on the drawing board. So the next round of mutation callers is going to have even better accuracy and better sensitivity at low allele fraction. I don't think we'll see these outer unique sections decrease at all, and that's a good thing, because that is going to be pulling in these very low allele fraction, sub-clonal mutations that we want to have. These new callers are being tested in the DREAM challenge as we go through DREAM 1, 2, and now we're in 3. The callers from the sequencing centers, you can see are at the very top of the list in what is probably a statistical dead heat. I think that speaks very well of the TCGA sequencing centers, and so we'll be looking forward to actually the final rendition of the dream challenge, which will use real data instead of synthetic data, so. In conclusion, the TCGA paradigm for mutation discovery is improved by multi-center calling. This enables us to decrease the false negative rates; it delivers a set of sematic SNVs of calibrated accuracy, accelerates submission of marker papers, and stimulates development of new mutation callers by providing benchmarking on the fly. A formal meta-caller was developed, which may be useful in retrospectively refining mutation calls from TCGA tumor sets. And finally, one parting thought that I didn't talk about is that we're now starting to use validation in -- or mutation calling in a lot of different contexts now. We have RNA fusions, we have structural variation, all of those mutation modes are likely to experience a similar phenomenon as we see in the SNV, and so that needs to be checked and multi-algorithm calling will be required there too. So I'll just end with my acknowledgements of my colleagues who all contributed to this talk. Thank you. [applause] Male Speaker: Are there -- okay, one question, one burning question please, we're running behind. Male Speaker: Just we have quick questions. For us, the modi-callers, are you using same underlined alignment, like with a ray or bow tie? David Wheeler: These are -- yes. They're using the same set of bam files. Male Speaker: Okay. We also -- there are some softer files for like BW, bow tie, if you use different software alignment. There maybe have some difference, mutation call or mismatch. David Wheeler: Yes. Well, yeah. That's going to be important for comparing mutation profiles across tumors, and so it is necessary to go back, and that's being done in the ICGCTCGA whole genome analysis now. But that is a very important point. Male Speaker: So TCGA is recommended BWA for overall alignment? David Wheeler: I'm sorry, TCGA what? Male Speaker: Basically I saw lots of data was using BWA, so is this like -- David Wheeler: Well this -- I was talking about whole exome here, so I'm sorry, yeah. So yeah, we were using BWA. Everybody's using BWA, but in different ways, actually. Male Speaker: Okay, thanks David. Okay. I'd like to welcome our next speaker, Kjong-Van Lehmann, who is from Memorial Sloan-Kettering Cancer Center, will talk about extensive trans and cis-QTL's revealed by large-scale cancer genome analysis. Please. [end of transcript]